Explainable Statute Prediction
Part of the track: LLM as a Judge?: From Statute Prediction to Sycophancy Detection in Law
Overview
Given the factual description of an Indian Supreme Court case, participants must:
- Identify which sections of the Indian Penal Code (IPC) are applicable
- Locate the exact sentence(s) from the case facts that trigger each applicable section
- Explain the legal reasoning connecting each fact sentence to the applicable IPC section
This task tests both legal knowledge (which sections apply?) and interpretability (why do they apply, grounded in the facts?).
Task Description
The task requires predicting applicable IPC sections from case facts, grounding each prediction in specific sentences, and providing legal reasoning that connects the facts to the statute. This is a multi-output structured prediction problem combining classification, extraction, and generation.
Jurisdiction & Scope
Included
- Indian Supreme Court only
- Indian Penal Code (IPC) sections
Not Included
- Bharatiya Nyaya Sanhita (BNS)
- CrPC, Evidence Act
- U.S. statutes
Legal categories: Criminal, Civil, Constitutional, Tax, Labor, Commercial, Revenue, Administration, Environmental
Training Data
Release date: 15 June 2026
Each line in the training set has the following structure:
{
"doc_id": "2004.INSC.496.txt",
"doc_url": "http://www.liiofindia.org/in/cases/cen/INSC/2004/496.html",
"fact": "Full case factual description...",
"statute": [
{
"section": "Section 302 IPC",
"exact_fact": "The accused was found in possession of the victim's blood-stained knife at the scene.",
"reasoning_trace": "Section 302 IPC applies because the facts establish intentional causing of death with a weapon. The presence of the accused with the weapon at the scene and the forensic evidence linking the knife to the victim satisfy the elements of murder under Section 300 IPC, making Section 302 the appropriate penal provision."
},
{
"section": "Section 201 IPC",
"exact_fact": "After the incident, the accused was seen washing blood-stained clothes at the community well.",
"reasoning_trace": "Section 201 IPC applies because the accused deliberately destroyed evidence by washing blood-stained clothing, constituting causing disappearance of evidence of an offence committed."
}
]
}
| Field | Type | Description |
|---|---|---|
doc_id |
string | Unique case identifier (e.g., "2004.INSC.496.txt") |
doc_url |
string | Source URL of the judgment |
fact |
string | Full factual description of the case (~693 words average) |
statute |
list[dict] | List of applicable IPC sections with exact_fact and reasoning_trace |
A single case may have multiple statute entries (one per applicable IPC section). Each statute entry maps to a specific sentence in the facts and includes a reasoning trace.
Test Data
Format: JSONL — only doc_id, doc_url, and fact are provided. No statute labels, no exact_fact sentences, no reasoning traces.
{
"doc_id": "2016.INSC.210.txt",
"doc_url": "http://www.liiofindia.org/in/cases/cen/INSC/2016/210.html",
"fact": "Full case factual description..."
}
Participants must predict all three outputs: section labels, exact_fact sentences, and reasoning traces.
Submission Format
Participants submit a single JSONL file. The submission format must be identical to the training data format:
{
"doc_id": "2016.INSC.210.txt",
"statute": [
{
"section": "Section 302 IPC",
"exact_fact": "The accused were five in number and they caused injuries to Ashok Kumar with sword and knife.",
"reasoning_trace": "Section 302 IPC applies because the facts establish that the accused caused death of the deceased by inflicting injuries with deadly weapons (sword and knife), satisfying the ingredients of murder under Section 300 IPC."
},
{
"section": "Section 149 IPC",
"exact_fact": "Companions of Kallu came to the factory and murdered Ashok Kumar.",
"reasoning_trace": "Section 149 IPC applies because the accused were members of an unlawful assembly, and the murder was committed in prosecution of the common object of such assembly."
}
]
}
Submission Rules
- Single run per team
- No team size limit
- JSONL format only, one line per test case
- Each line must contain
doc_idandstatute(list of predictions)
Evaluation
Note: Metrics and weights are tentative and may be updated before the test data release.
* Final details — including metric weights, prompt templates, and evaluation pipeline — will be confirmed after 15 June 2026. Minor updates to prompt specifications may occur.
| Metric | Weight | Description |
|---|---|---|
| Macro F1 | 35% | Exact match on predicted section labels vs. gold standard |
| ROUGE-L | 25% | Longest common subsequence similarity between participant's reasoning and gold reasoning |
| BLEU | 20% | Sentence-level BLEU score between reasoning texts |
| Recall@3 | 10% | Whether gold section labels appear in the participant's top-3 predictions |
| Legal Semantic Score (LSS) | 10% | Cosine similarity of reasoning embeddings from a legal-domain language model (model not disclosed) |
Composite Score = weighted sum of above metrics.
The Legal Semantic Score uses a pre-trained legal-domain embedding model. The specific model will NOT be disclosed to participants, preventing gaming of this metric.
Evaluation will be conducted on CodaBench (CodaLab v2). The scoring script will be released with the test data. The leaderboard will be updated automatically after each submission.
Timeline
| Date | Milestone |
|---|---|
| 15 June 2026 | Training data release (500 cases) |
| 20 July 2026 | Test data release (100 cases) |
| End July 2026 | Evaluation results declared |
| 15 August 2026 | Working notes due |
| End September 2026 | Camera-ready copies |
Baseline Systems
At least one baseline system will be provided to participants. Potential baselines include:
Zero-shot LLM
Prompt with facts only — no examples provided
Few-shot LLM
5 examples in prompt for in-context learning
BERT Classifier
Fine-tuned BERT-based model trained on the training set
Ethical Considerations
- All case data is from publicly available Indian Supreme Court judgments
- No personally identifiable information is included
- The task is designed to advance legal AI interpretability, not to provide legal advice