Task 01

Explainable Statute Prediction

Part of the track: LLM as a Judge?: From Statute Prediction to Sycophancy Detection in Law

Kripabandhu Ghosh
IISER Kolkata, India
Liana Ermakova
Université de Bretagne Occidentale, France
Shuvam Banerji Seal
IISER Kolkata, India
Subinay Adhikary
IISER Kolkata, India
Jaap Kamps
University of Amsterdam, Netherlands

§ 01 · OVERVIEW

Overview

Given the factual description of an Indian Supreme Court case, participants must:

  1. Identify which sections of the Indian Penal Code (IPC) are applicable
  2. Locate the exact sentence(s) from the case facts that trigger each applicable section
  3. Explain the legal reasoning connecting each fact sentence to the applicable IPC section

This task tests both legal knowledge (which sections apply?) and interpretability (why do they apply, grounded in the facts?).


§ 02 · TASK DESCRIPTION

Task Description

The task requires predicting applicable IPC sections from case facts, grounding each prediction in specific sentences, and providing legal reasoning that connects the facts to the statute. This is a multi-output structured prediction problem combining classification, extraction, and generation.


§ 03 · JURISDICTION & SCOPE

Jurisdiction & Scope

Included

  • Indian Supreme Court only
  • Indian Penal Code (IPC) sections

Not Included

  • Bharatiya Nyaya Sanhita (BNS)
  • CrPC, Evidence Act
  • U.S. statutes

Legal categories: Criminal, Civil, Constitutional, Tax, Labor, Commercial, Revenue, Administration, Environmental


§ 04 · TRAINING DATA

Training Data

500
Cases
JSONL
Format
~693
Avg. Words

Release date: 15 June 2026

Each line in the training set has the following structure:

Training Sample
{
  "doc_id": "2004.INSC.496.txt",
  "doc_url": "http://www.liiofindia.org/in/cases/cen/INSC/2004/496.html",
  "fact": "Full case factual description...",
  "statute": [
    {
      "section": "Section 302 IPC",
      "exact_fact": "The accused was found in possession of the victim's blood-stained knife at the scene.",
      "reasoning_trace": "Section 302 IPC applies because the facts establish intentional causing of death with a weapon. The presence of the accused with the weapon at the scene and the forensic evidence linking the knife to the victim satisfy the elements of murder under Section 300 IPC, making Section 302 the appropriate penal provision."
    },
    {
      "section": "Section 201 IPC",
      "exact_fact": "After the incident, the accused was seen washing blood-stained clothes at the community well.",
      "reasoning_trace": "Section 201 IPC applies because the accused deliberately destroyed evidence by washing blood-stained clothing, constituting causing disappearance of evidence of an offence committed."
    }
  ]
}
Field Type Description
doc_id string Unique case identifier (e.g., "2004.INSC.496.txt")
doc_url string Source URL of the judgment
fact string Full factual description of the case (~693 words average)
statute list[dict] List of applicable IPC sections with exact_fact and reasoning_trace

A single case may have multiple statute entries (one per applicable IPC section). Each statute entry maps to a specific sentence in the facts and includes a reasoning trace.


§ 05 · TEST DATA

Test Data

100
Cases
20 Jul
Release Date

Format: JSONL — only doc_id, doc_url, and fact are provided. No statute labels, no exact_fact sentences, no reasoning traces.

Test Input
{
  "doc_id": "2016.INSC.210.txt",
  "doc_url": "http://www.liiofindia.org/in/cases/cen/INSC/2016/210.html",
  "fact": "Full case factual description..."
}

Participants must predict all three outputs: section labels, exact_fact sentences, and reasoning traces.


§ 06 · SUBMISSION FORMAT

Submission Format

Participants submit a single JSONL file. The submission format must be identical to the training data format:

Submission Example
{
  "doc_id": "2016.INSC.210.txt",
  "statute": [
    {
      "section": "Section 302 IPC",
      "exact_fact": "The accused were five in number and they caused injuries to Ashok Kumar with sword and knife.",
      "reasoning_trace": "Section 302 IPC applies because the facts establish that the accused caused death of the deceased by inflicting injuries with deadly weapons (sword and knife), satisfying the ingredients of murder under Section 300 IPC."
    },
    {
      "section": "Section 149 IPC",
      "exact_fact": "Companions of Kallu came to the factory and murdered Ashok Kumar.",
      "reasoning_trace": "Section 149 IPC applies because the accused were members of an unlawful assembly, and the murder was committed in prosecution of the common object of such assembly."
    }
  ]
}

Submission Rules

  • Single run per team
  • No team size limit
  • JSONL format only, one line per test case
  • Each line must contain doc_id and statute (list of predictions)

§ 07 · EVALUATION

Evaluation

Note: Metrics and weights are tentative and may be updated before the test data release.

* Final details — including metric weights, prompt templates, and evaluation pipeline — will be confirmed after 15 June 2026. Minor updates to prompt specifications may occur.

Metric Weight Description
Macro F1 35% Exact match on predicted section labels vs. gold standard
ROUGE-L 25% Longest common subsequence similarity between participant's reasoning and gold reasoning
BLEU 20% Sentence-level BLEU score between reasoning texts
Recall@3 10% Whether gold section labels appear in the participant's top-3 predictions
Legal Semantic Score (LSS) 10% Cosine similarity of reasoning embeddings from a legal-domain language model (model not disclosed)

Composite Score = weighted sum of above metrics.

The Legal Semantic Score uses a pre-trained legal-domain embedding model. The specific model will NOT be disclosed to participants, preventing gaming of this metric.

Evaluation will be conducted on CodaBench (CodaLab v2). The scoring script will be released with the test data. The leaderboard will be updated automatically after each submission.


§ 08 · TIMELINE

Timeline

Date Milestone
15 June 2026 Training data release (500 cases)
20 July 2026 Test data release (100 cases)
End July 2026 Evaluation results declared
15 August 2026 Working notes due
End September 2026 Camera-ready copies

§ 09 · BASELINES & ETHICS

Baseline Systems

At least one baseline system will be provided to participants. Potential baselines include:

Zero-shot LLM

Prompt with facts only — no examples provided

Few-shot LLM

5 examples in prompt for in-context learning

BERT Classifier

Fine-tuned BERT-based model trained on the training set

Ethical Considerations

  • All case data is from publicly available Indian Supreme Court judgments
  • No personally identifiable information is included
  • The task is designed to advance legal AI interpretability, not to provide legal advice