[CfP] Evaluation window is open for EXPLAINITA @ EVALITA 2026

4 views

Skip to first unread message

Alessandro Bondielli

unread,

Nov 28, 2025, 9:50:26 AMNov 28

to ai...@aixia.it

[apologies for cross-posting]

Call For Participation - Evaluation Window and Submissions NOW OPEN 🟢

EXPLAINITA: EXPlanation of Latents for Auto-INterpretability in ITAlian

The evaluation window for EXPLAINITA is live and submissions are OPEN.

Participants can download the evaluation data and submit their system outputs.

All information, data, guidelines, and submission instructions are available on the Task Website: https://sites.google.com/unipi.it/explainita/home

WHAT IS EXPLAINITA?

EXPLAINITA aims to push forward mechanistic interpretability (MechInterp) for Italian LMs.

The shared task addresses important challenges:

- Scaling explanation generation beyond manual annotation

- Evaluating the quality of explanations in the absence of gold human labels

- Extending existing MechInterp work (which is often English-centric) to Italian

Two subtasks:

- Task 1. Explanation generation: Given a latent dimension (in a Sparse Autoencoder) summarized by strongly activating tokens and contextual information, produce a natural language explanation describing the concept(s) captured by that latent, when an interpretable concept exists.

- Task 2. Explanation scoring: Given an explanation plus some text examples, decide whether each example “activates” the latent described by the explanation, or not.

Participation

We welcome participation from both academic and industry teams.

Participants can submit systems for either or both tasks.

No need to register, but we suggest to fill the expression of interest form on the task website.

Basic Rules:

- No cap on the number of submissions per task, but one per task must be chosen as "Primary"

- Final rankings will be based on the Primary submissions only

- Commercial AI systems (e.g., GPT, Claude, Gemini, Grok) CANNOT be part of the final system, i.e. no API calls to generate/classify data; however, they CAN be used during development (e.g., for data augmentation)

Evaluation:

- Task 1: BERTScore against manually written explanations

- Task 2: Accuracy

Baselines: Zero-shot generation/classification with LLaMA-3.1-8B-instruct-AWQ-INT4. Scripts for loading and calling the model on the data, as well as the prompts used, will be made available.

Timeline

27th November - 4th December 2025: 🟢 Evaluation Window — NOW OPEN

15th December 2025: assessments returned to participants

9th January 2026: final reports (from participants) due to task organizers

16th January 2026: final reports (from task organizers) due to EVALITA chairs

7th February 2027: review due to participapnts

16th February 2026: camera-ready version deadline

26 – 27th February 2026: final workshop in Bari

Organisers

Alessandro Bondielli, University of Pisa

Lucia Passaro, University of Pisa

Serena Auriemma, University of Pisa

Luca Capone, University of Pisa

Martina Miliani, University of Pisa

Alessandro Lenci, University of Pisa

Contacts

e-mail: alessandro...@unipi.it

Reply all

Reply to author

Forward

0 new messages