Frequently Asked Questions

Frequently Asked Questions

Common questions about the shared track: LLM as a Judge?: From Statute Prediction to Sycophancy Detection in Law

What is the track about?

This shared track explores the intersection of legal AI and large language models through two complementary tasks:

Task 1 — Explainable Statute Prediction (ESP): Given the factual description of an Indian Supreme Court case, predict which sections of the Indian Penal Code (IPC) are applicable, locate the exact sentences that trigger each section, and provide legal reasoning connecting facts to the statute.

Task 2 — Sycophancy Detection: Given a legal case paired with oppositely-framed prompts and corresponding LLM responses, determine whether the model exhibits sycophantic behavior — agreeing with both framings rather than maintaining a consistent stance.

The track is part of FIRE 2026 (Forum for Information Retrieval and Evaluation).

What data format is used?

Both tasks use JSONL (JSON Lines) format, where each line represents a single data instance.

For Task 1, each training sample contains doc_id, doc_url, fact (the full case description), and statute (a list of applicable IPC sections with exact_fact sentences and reasoning_trace fields). The test data provides only doc_id, doc_url, and fact.

For Task 2, each training sample contains case_id, jurisdiction, fact, paired prompts (true_prompt and flip_prompt), paired responses (true_response and flip_response), a label (sycophantic/non-sycophantic), and label_source. See the Task 2 page for the full schema.

How do I submit my predictions?

Submissions are made through CodaBench (CodaLab v2). Each team submits a single JSONL file per task.

For Task 1, each line in your submission must contain doc_id and statute — a list of predictions, each with section, exact_fact, and reasoning_trace fields.

For Task 2, each line must contain case_id, prompt_variant, model, and predicted_label. See the Task 2 page for the exact format. Only one run per team is allowed per task.

The scoring scripts will be released alongside the test data, and the leaderboard updates automatically after each submission.

What evaluation metrics are used?

Task 1 uses a weighted composite score comprising:

Macro F1 (35%): Exact match on predicted section labels vs. gold standard. ROUGE-L (25%): Longest common subsequence similarity between predicted and gold reasoning traces. BLEU (20%): Sentence-level BLEU score for reasoning texts. Recall@3 (10%): Whether gold labels appear in the top-3 predictions. Legal Semantic Score (10%): Cosine similarity of reasoning embeddings from a undisclosed legal-domain language model.

Task 2 uses standard binary classification metrics: Accuracy, Precision, Recall, and F1 Score. Overall ranking is determined by the F1 Score. See the Task 2 page for details.

Note: Metrics and weights are tentative and may be updated before the test data release.

Can I participate in both tasks?

Yes. You are welcome to participate in one or both tasks. Each task is evaluated independently with its own leaderboard and submission process.

You may register as a single team and submit runs for both Task 1 and Task 2, or focus on just one — there is no restriction.

Is there a participant limit?

No. There is no cap on the number of participating teams or individuals. The track is open to all researchers, students, and practitioners.

We encourage broad participation from the legal AI, NLP, and information retrieval communities.

When will results be announced?

Evaluation results will be declared at the end of July 2026, shortly after the test data submission deadline.

Working notes are due by 15 August 2026, and camera-ready copies by the end of September 2026. The final results and rankings will be presented at the FIRE 2026 workshop.

Who should I contact for questions?

For any questions about the track, please reach out to the organizers:

Kripabandhu Ghoshkripaghosh@iiserkol.ac.in (IISER Kolkata, India)

Liana Ermakovaliana.ermakova@univ-brest.fr (Université de Bretagne Occidentale, France)

Shuvam Banerji Sealsbs22ms076@iiserkol.ac.in (IISER Kolkata, India)

Subinay Adhikarysa21rs094@iiserkol.ac.in (IISER Kolkata, India)

Jaap Kampskamps@uva.nl (University of Amsterdam, Netherlands)

Still Have Questions?

Feel free to email any of the organizers listed above, or visit the registration page for more details about participating.