1 Overview

This report summarises the comparison between the original review and the direct replication for SYRCLE risk-of-bias judgements. It includes the full traffic plots for each review, the domain-level summary shown in the manuscript, and the paired comparison analyses restricted to studies included in both reviews.

2 Direct replication

The figure below shows the study-level risk-of-bias judgements for the direct replication.

3 Original study

The figure below shows the study-level risk-of-bias judgements reconstructed from the original review.

4 Combined summary

The summary below compares the distribution of low, unclear, and high risk-of-bias judgements across SYRCLE domains in the original review and in the direct replication. Unlike the paired comparison analyses shown later, this summary is based on the full datasets for each review.

[1] “report_tables/rob_summary_combined.csv”

5 Heatmap of differences

The heatmap below compares paired judgements for studies included in both reviews. Scores were calculated as direct replication minus original review on the ordered scale low, unclear, and high. Positive values indicate that the replication assigned a higher risk judgement; negative values indicate a lower risk judgement.

[1] “report_tables/rob_diff_matrix.csv”

6 Distribution of judgment changes

The figure below summarises the distribution of paired judgement changes across SYRCLE domains for studies shared between the two reviews.

[1] “report_tables/rob_diff_stacked.csv”

7 Agreement metrics

The table below reports agreement between the original review and the direct replication for studies included in both datasets. The main summary measures are percent agreement, prevalence-adjusted bias-adjusted kappa (PABAK), and Cohen’s kappa.

## [1] "report_tables/rob_agreement_metrics.csv"
Agreement between Original and Replication (RoB domains)
Category Percent agreement PABAK Cohen's κ (unweighted) Weighted κ (linear) Weighted κ (quadratic)
1) Sequence generation 100% 1.000 NA NA NA
2) Baseline characteristics 65% 0.475 0.146 0.146 0.146
3) Allocation concealment 100% 1.000 NA NA NA
4) Random housing 100% 1.000 NA NA NA
5) Blinding caregivers 70% 0.550 0.000 0.000 0.000
6) Random outcome assessment 100% 1.000 NA NA NA
7) Blinding outcome assessment 55% 0.325 0.000 0.000 0.000
8) Incomplete outcome data 50% 0.250 0.153 0.085 0.000
9) Selective outcome reporting 80% 0.700 0.000 0.000 0.000
10) Other 50% 0.250 0.231 0.300 0.353