Hi all,
Today we once more updated the web page of the Retrieval-Augmented Debating task:
https://touche.webis.de/clef25/touche25-web/retrieval-augmented-debating.html
As part of this, we added 100 (simulated) example debates, one for each of the 100 claims we released earlier.
Each debate consists of 5 user messages and 5 system messages (“responses”). We manually judged each response for its debate quality, for which we adopted Grice’s maxims:
These are the quality criteria we settled on for our evaluation, using a binary judgment for each. A response can thus score between 0 (no maxim fulfilled) and 4 points (all maxims fulfilled).
When you develop a debate system (sub-task 1), keep these criteria in mind.
When you develop an evaluation system (sub-task 2), you can use our labels as training set. Note that your evaluation system can address a single or up to all four criteria/maxims. We will score the evaluation systems for each criterion independently.
We are nearly done with setting up the submission system. In case you can’t wait, you can already take a look at basic systems (without generation) that we prepared in Python [1] and JavaScript [2] and that can serve as a starting point for developing your system. Note that they might change slightly in the next few days.
That’s it for now. We are looking forward to seeing your approaches, and please ask questions if you have some,
Johannes