1 Overview

This report summarises the comparison between the original review and the direct replication for SYRCLE risk-of-bias judgements. It includes the full traffic plots for each review, the domain-level summary shown in the manuscript, and the paired comparison analyses restricted to studies included in both reviews.

2 Direct replication

The figure below shows the study-level risk-of-bias judgements for the direct replication.

3 Original study

The figure below shows the study-level risk-of-bias judgements reconstructed from the original review.

4 Combined summary

The summary below compares the distribution of low, unclear, and high risk-of-bias judgements across SYRCLE domains in the original review and in the direct replication. Unlike the paired comparison analyses shown later, this summary is based on the full datasets for each review.

[1] “report_tables/rob_summary_combined.csv”

5 Heatmap of differences

The heatmap below compares paired judgements for studies included in both reviews. Scores were calculated as direct replication minus original review on the ordered scale low, unclear, and high. Positive values indicate that the replication assigned a higher risk judgement; negative values indicate a lower risk judgement.

[1] “report_tables/rob_diff_matrix.csv”

6 Distribution of judgment changes

The figure below summarises the distribution of paired judgement changes across SYRCLE domains for studies shared between the two reviews.

[1] “report_tables/rob_diff_stacked.csv”

7 Agreement metrics

The table below reports agreement between the original review and the direct replication for studies included in both datasets. The main summary measures are percent agreement, prevalence-adjusted bias-adjusted kappa (PABAK), and Cohen’s kappa.

## [1] "report_tables/rob_agreement_metrics.csv"

Category	Percent agreement	PABAK	Cohen's κ (unweighted)	Weighted κ (linear)	Weighted κ (quadratic)
Agreement between Original and Replication (RoB domains)
1) Sequence generation	100%	1.000	NA	NA	NA
2) Baseline characteristics	65%	0.475	0.146	0.146	0.146
3) Allocation concealment	100%	1.000	NA	NA	NA
4) Random housing	100%	1.000	NA	NA	NA
5) Blinding caregivers	70%	0.550	0.000	0.000	0.000
6) Random outcome assessment	100%	1.000	NA	NA	NA
7) Blinding outcome assessment	55%	0.325	0.000	0.000	0.000
8) Incomplete outcome data	50%	0.250	0.153	0.085	0.000
9) Selective outcome reporting	80%	0.700	0.000	0.000	0.000
10) Other	50%	0.250	0.231	0.300	0.353