ValueEval'24 dataset release (version 2024-02-13)

121 views
Skip to first unread message

Johannes Kiesel

unread,
Feb 13, 2024, 7:28:45 AMFeb 13
to valu...@googlegroups.com
Hi everyone,

Today we released a new version of the ValueEval'24 dataset [1]:

https://zenodo.org/doi/10.5281/zenodo.10396293

The data for ValueEval'24 contains (so far) 67,352 sentences in 9
languages, annotated for 19 values. The data was created as part of the
ValuesML project [2]. Huge thanks to everyone involved!

We now continue to improve the data (some documents are still missing,
bit of cleaning, automated English translations), but you can now work
on the (nearly) final data for the deadline in May [3]. If you have
questions on the data, do not hesitate to write to this list or contact
me directly.

We will soon open our submission system, but make sure to already check
out the information we have on our web page [4] (especially the
starter's code).

We are looking forward to your submission!
Johannes

[1] https://twitter.com/ValueEval/status/1757380478020043067
[2]
https://knowledge4policy.ec.europa.eu/projects-activities/valuesml-unravelling-expressed-values-media-informed-policy-making_en
[3] https://valueeval.webis.de
[4]
https://touche.webis.de/clef24/touche24-web/human-value-detection.html#submission

--
Johannes Kiesel

Bauhaus-Universität Weimar
Bauhausstr. 9a, Room 106
99423 Weimar, Germany

Phone: +49 (0)3643 - 58 3720

Johannes Kiesel

unread,
Feb 15, 2024, 3:14:09 AMFeb 15
to valu...@googlegroups.com
Hi everyone,

We just released a new version that fixed some problems with MAC OS line
endings in the source files.

As always, you can download the new version here:

https://zenodo.org/doi/10.5281/zenodo.10396293

Regards,
Johannes

Johannes Kiesel

unread,
Apr 3, 2024, 4:49:55 AMApr 3
to valu...@googlegroups.com
Hi everyone,

We released a new version with automated translations to English [1] for
most languages (still working on Hebrew as we need a different service
for that).

As always, you can download the new version here:

https://zenodo.org/doi/10.5281/zenodo.10396293


There are still a few documents missing, so you can expect even more
data added next week (but do not wait for it, the data is nearly
complete as-is). We will then also open the submission system.

Regards,
Johannes


[1] Via DeepL, https://www.deepl.com/

Johannes Kiesel

unread,
Apr 16, 2024, 5:15:49 PMApr 16
to valu...@googlegroups.com
Hi everyone,

We released a new (and now complete) version of our dataset:

https://zenodo.org/doi/10.5281/zenodo.10396293

Moreover, check https://valueeval.webis.de for:
- How to submit via TIRA [1]
- Our example approaches (random, random via Jupyter notebook, BERT,
Ollama/LLM) to help you figure out how to make things work [2]
- Our evaluator [3], which produces an HTML-based report, including
ROC curves and tables for error analysis

If you use the boilerplate code from our example approaches, your
approach can be easily Dockerized for easy sharing, full reproducibility
of results, and running it as an HTTP-server. The latter allows you to
set up a Demo like we did for ValueEval'23 [4] for your own approach! In
fact, we are already working on a web interface for such Demos for this
year. So that you all get a cool web interface interface for your
approach without extra work.

We also have some capacity for assisting you in Dockerizing, since we
know that sometimes it is not clear how to Dockerize a specific approach
or how to even get started.

In case you want to reach out for help regarding Dockerization or for
other questions, please ask in our TIRA forum:

https://www.tira.io/c/touche/

Regards,
Johannes

[1]
https://touche.webis.de/clef24/touche24-web/human-value-detection.html#submission
[2]
https://github.com/touche-webis-de/touche-code/tree/main/clef24/human-value-detection/approaches
[3]
https://github.com/touche-webis-de/touche-code/tree/main/clef24/human-value-detection/evaluator
[4] https://values.args.me/

Johannes Kiesel

unread,
Apr 30, 2024, 4:10:06 PMApr 30
to valu...@googlegroups.com
Hi everyone and apologies for cross-posting.

In case you have not yet registered for ValueEval'24 but still want to
submit (deadline May 6th), note that you can still register with TIRA
and participate [1]. CLEF has a separate deadline for registration they
use for statistics, but we have our submission system open for new
registrations right until our deadline.

We also wanted to let you know that we created a step-by-step guide on
how to use very big models on A100 GPUs [2] in TIRA. See also our BERT
example approach on how to use this specifically for ValueEval [3].

If your approach depends on a large language model or a large multimodal
model that fits into an A100 GPU (40 GB), TIRA now enables its efficient
use.

This workflow further enables what is known at Hugging Face as Auto
Classes [4]: local development (e.g., on your laptop) with small models
and cloud-based evaluation with large models.

Best regards and have fun with big models!
Johannes

[1]
https://www.tira.io/task-overview/valueeval-2024-human-value-detection/valueeval24-2024-04-15-test-20240415-test
[2]
https://www.tira.io/t/how-to-use-very-big-models-e-g-large-language-models-from-hugging-face-on-a100-gpus-in-tira
[3]
https://github.com/touche-webis-de/touche-code/tree/main/clef24/human-value-detection/approaches/bert-baseline#tira-usage
[4] https://huggingface.co/docs/transformers/model_doc/auto
Reply all
Reply to author
Forward
0 new messages