Task 01

Explainable Statute Prediction

Part of the track: LLM as a Judge?: From Statute Prediction to Sycophancy Detection in Law

Kripabandhu Ghosh

IISER Kolkata, India

kripaghosh@iiserkol.ac.in

Liana Ermakova

Université de Bretagne Occidentale, France

liana.ermakova@univ-brest.fr

Shuvam Banerji Seal

IISER Kolkata, India

sbs22ms076@iiserkol.ac.in

Subinay Adhikary

IISER Kolkata, India

sa21rs094@iiserkol.ac.in

Jaap Kamps

University of Amsterdam, Netherlands

kamps@uva.nl

§ 01 · OVERVIEW

Overview

Given the factual description of an Indian Supreme Court case, participants must:

Identify which sections of the Indian Penal Code (IPC) are applicable
Locate the exact sentence(s) from the case facts that trigger each applicable section
Explain the legal reasoning connecting each fact sentence to the applicable IPC section

This task tests both legal knowledge (which sections apply?) and interpretability (why do they apply, grounded in the facts?).

§ 02 · TASK DESCRIPTION

Task Description

The task requires predicting applicable IPC sections from case facts, grounding each prediction in specific sentences, and providing legal reasoning that connects the facts to the statute. This is a multi-output structured prediction problem combining classification, extraction, and generation.

§ 03 · JURISDICTION & SCOPE

Jurisdiction & Scope

Included

Indian Supreme Court only
Indian Penal Code (IPC) sections

Not Included

Bharatiya Nyaya Sanhita (BNS)
CrPC, Evidence Act
U.S. statutes

Legal categories: Criminal, Civil, Constitutional, Tax, Labor, Commercial, Revenue, Administration, Environmental

§ 04 · TRAINING DATA

Training Data

525

Cases

JSONL

Format

~2,583

Avg. Chars

Release date: 15 June 2026 → 20 June 2026

Each line in the training set has the following structure:

{
  "doc_id": "2013.INSC.228.txt",
  "fact": " The facts which are essential to be stated for adjudication of this appeal are that an FIR was lodged by Prem Singh, PW-2...",
  "explanation": {
    "The facts which are essential to be stated for adjudication of this appeal are that an FIR was lodged by Prem Singh, PW-2, alleging that about 00 p.m.": "IPC 147",
    "on 1992, on hearing a gunshot sound and simultaneously the cry of his brother, Gopal Singh, PW-1, that he was being assaulted and his life was in danger, he rushed to the shop of Gopal Singh and found that accused Gopal Singh and his brother Puran Singh were beating him with hands, fists and stones.": "IPC 147"
  },
  "statute": ["IPC 147"]
}

Field	Type	Description
`doc_id`	string	Unique case identifier (e.g., "2013.INSC.228.txt")
`fact`	string	Full factual description of the case (~2,583 chars average)
`explanation`	dict	Sentence-to-statute mapping: each key is a sentence from the fact; each value is the IPC section code it supports
`statute`	list[string]	IPC section codes applicable to the case (e.g., ["IPC 302", "IPC 201"])

A single case may have multiple IPC sections in its statute list. The explanation field maps each sentence in the fact to the specific IPC section it supports. Multiple sentences may map to the same statute.

§ 05 · TEST DATA

Test Data

105

Cases

20 Jul

Release Date

Format: JSONL — only doc_id and fact are provided. No statute labels, no explanation sentences.

{
  "doc_id": "2002.INSC.274.txt",
  "fact": "By the impugned judgment the High Court while convicting the respondent of the offence under Section 376 of the Indian Penal Code reduced his sentence..."
}

Participants must predict all three outputs: statute section labels, explanation sentences, and reasoning traces.

§ 06 · SUBMISSION FORMAT

Submission Format

Participants submit a single JSONL file. The submission format must contain doc_id and statute (list of predictions):

{
  "doc_id": "2002.INSC.274.txt",
  "statute": [
    {
      "section": "IPC 376",
      "exact_fact": "She recognized the respondent Kishanlal and asked him as to why he had come. He said that he had come to have sexual intercourse with her.",
      "reasoning_trace": "Section 376 IPC applies because the facts establish that the respondent committed sexual intercourse with the prosecutrix against her will, satisfying the ingredients of rape under Section 375 IPC."
    },
    {
      "section": "IPC 457",
      "exact_fact": "While going to Ramlila her husband had bolted the house from outside. At about 11-12 O' clock at night she woke up as someone opened the door.",
      "reasoning_trace": "Section 457 IPC applies because the respondent entered the house as a trespasser with intent to commit an offence, having opened the bolted door to gain entry at night."
    }
  ]
}

Submission Rules

Single run per team
No team size limit
JSONL format only, one line per test case
Each line must contain doc_id and statute (list of predictions)
Each statute entry must contain section, exact_fact, and reasoning_trace

§ 07 · EVALUATION

Evaluation

Note: Metrics and weights are tentative and may be updated before the test data release.

* Final details — including metric weights, prompt templates, and evaluation pipeline — will be confirmed after 15 June 2026 → 20 June 2026. Minor updates to prompt specifications may occur.

Metric	Weight	Description
Macro F1	35%	Exact match on predicted section labels vs. gold standard
ROUGE-L	25%	Longest common subsequence similarity between participant's reasoning and gold reasoning
BLEU	20%	Sentence-level BLEU score between reasoning texts
Recall@3	10%	Whether gold section labels appear in the participant's top-3 predictions
Legal Semantic Score (LSS)	10%	Cosine similarity of reasoning embeddings from a legal-domain language model (model not disclosed)

Composite Score = weighted sum of above metrics.

The Legal Semantic Score uses a pre-trained legal-domain embedding model. The specific model will NOT be disclosed to participants, preventing gaming of this metric.

Evaluation will be conducted on CodaBench (CodaLab v2). The scoring script will be released with the test data. The leaderboard will be updated automatically after each submission.

Schedule update: All dates in the timeline below (§08) have been postponed by 5 days. The original (struck-through) date is shown next to the new date. Please plan accordingly.

§ 08 · TIMELINE

Timeline

Date	Milestone
15 May 2026 → 20 May 2026	Track website opens, training data released
15 June 2026 → 20 June 2026	Training data release (525 cases)
20 July 2026 → 25 July 2026	Test data release (100 cases)
30 June 2026 → 5 August 2026	Run submission deadline
15 July 2026 → 20 August 2026	Track results declared
30 August 2026 → 4 September 2026	Working notes due
30 September 2026 → 5 October 2026	Camera-ready copies
December 2026	FIRE 2026 Conference

§ 09 · BASELINES & ETHICS

Baseline Systems

At least one baseline system will be provided to participants. Potential baselines include:

Zero-shot LLM

Prompt with facts only — no examples provided

Few-shot LLM

5 examples in prompt for in-context learning

BERT Classifier

Fine-tuned BERT-based model trained on the training set

Ethical Considerations

All case data is from publicly available Indian Supreme Court judgments
No personally identifiable information is included
The task is designed to advance legal AI interpretability, not to provide legal advice