Adjudication Logic: "Champion vs. Champion"

The judge script does not compare every possible pair of reviews. It intelligently finds the two main "factions" of opinion and selects the single best review (the "Champion") from each to send to the judge. Here is how it works.

Example 1: 2-vs-1 Conflict

This example shows a standard conflict where one review (the "outlier") disagrees with the majority.

Input Rows from Discrepancy File:

# Model Combination Score Recommendation
1 gemini/gemini-2.5-flash, openai/gpt-4o-mini 40.4 Reject
2 openai/gpt-5, deepseek/deepseek-reasoner 64.0 Revise and Resubmit
3 openai/gpt-4o-mini, deepseek/deepseek-reasoner 62.0 Revise and Resubmit

1. Faction Analysis:

The script counts the recommendations:

The script identifies two factions to compare: 'Revise and Resubmit' (Group A) and 'Reject' (Group B).

2. Champion Selection (Highest Score Wins):

Champion of 'Revise and Resubmit' (Group A)

Compares Row 2 (Score 64.0) vs. Row 3 (Score 62.0).

Winner: Row 2 (Score 64.0)

Champion of 'Reject' (Group B)

Only one review (Row 1) is in this group.

Winner: Row 1 (Score 40.4)

3. Final Adjudication:

The Judge is asked to compare the champion of 'Revise and Resubmit' (Row 2) against the champion of 'Reject' (Row 1). The champion-row class highlights these rows in the table above.

Example 2: 2-vs-2 "No Majority" Tie

This example shows how the logic gracefully handles a 2-vs-2 tie by ensuring a valid conflict is still judged.

Input Rows from Discrepancy File:

# Model Combination Score Recommendation
1 model-X, model-A 45.0 Reject
2 model-Y, model-B 41.0 Reject
3 model-P, model-C 65.0 Revise and Resubmit
4 model-Q, model-D 68.0 Revise and Resubmit

1. Faction Analysis:

The script counts the recommendations:

The script identifies two factions to compare: 'Reject' (Group A) and 'Revise and Resubmit' (Group B).

2. Champion Selection (Highest Score Wins):

Champion of 'Reject' (Group A)

Compares Row 1 (Score 45.0) vs. Row 2 (Score 41.0).

Winner: Row 1 (Score 45.0)

Champion of 'Revise and Resubmit' (Group B)

Compares Row 3 (Score 65.0) vs. Row 4 (Score 68.0).

Winner: Row 4 (Score 68.0)

3. Final Adjudication:

The Judge is asked to compare the champion of 'Reject' (Row 1) against the champion of 'Revise and Resubmit' (Row 4). The champion-row class highlights these rows in the table above.