The judge script does not compare every possible pair of reviews. It intelligently finds the two main "factions" of opinion and selects the single best review (the "Champion") from each to send to the judge. Here is how it works.
This example shows a standard conflict where one review (the "outlier") disagrees with the majority.
Input Rows from Discrepancy File:
| # | Model Combination | Score | Recommendation |
|---|---|---|---|
| 1 | gemini/gemini-2.5-flash, openai/gpt-4o-mini |
40.4 | Reject |
| 2 | openai/gpt-5, deepseek/deepseek-reasoner |
64.0 | Revise and Resubmit |
| 3 | openai/gpt-4o-mini, deepseek/deepseek-reasoner |
62.0 | Revise and Resubmit |
1. Faction Analysis:
The script counts the recommendations:
Revise and Resubmit: 2 reviewsReject: 1 reviewThe script identifies two factions to compare: 'Revise and Resubmit' (Group A) and 'Reject' (Group B).
2. Champion Selection (Highest Score Wins):
Compares Row 2 (Score 64.0) vs. Row 3 (Score 62.0).
Winner: Row 2 (Score 64.0)
Only one review (Row 1) is in this group.
Winner: Row 1 (Score 40.4)
3. Final Adjudication:
champion-row class highlights these rows in the table above.
This example shows how the logic gracefully handles a 2-vs-2 tie by ensuring a valid conflict is still judged.
Input Rows from Discrepancy File:
| # | Model Combination | Score | Recommendation |
|---|---|---|---|
| 1 | model-X, model-A |
45.0 | Reject |
| 2 | model-Y, model-B |
41.0 | Reject |
| 3 | model-P, model-C |
65.0 | Revise and Resubmit |
| 4 | model-Q, model-D |
68.0 | Revise and Resubmit |
1. Faction Analysis:
The script counts the recommendations:
Reject: 2 reviewsRevise and Resubmit: 2 reviewsThe script identifies two factions to compare: 'Reject' (Group A) and 'Revise and Resubmit' (Group B).
2. Champion Selection (Highest Score Wins):
Compares Row 1 (Score 45.0) vs. Row 2 (Score 41.0).
Winner: Row 1 (Score 45.0)
Compares Row 3 (Score 65.0) vs. Row 4 (Score 68.0).
Winner: Row 4 (Score 68.0)
3. Final Adjudication:
champion-row class highlights these rows in the table above.