Questions about the DADC shared task

21 views

Skip to first unread message

Venelin Kovachev

unread,

Apr 28, 2022, 3:29:14 PM4/28/22

to dadc-w...@googlegroups.com

Dear DADC team,

Some of my colleagues and I want to participate in the DADC shared task (we have already registered our team).

We had a meeting yesterday to discuss strategies and timeline and we came up with quite a few questions about the format/requirements of the task.

Could you please try and answer some of those.

The instructions say that for task 1 we have to submit 100 QA examples.

How are those questions and answers spread: is it 100 questions for 100 paragraphs? is it 100 questions for 20 paragraphs? do we choose the question/paragraph ratio that we see fit? Are we allowed to “skip” paragraphs?
Assuming a fixed number of paragraphs (100), are those the same for all participants or selected at random?

If I understand correctly the instructions, for task 1, the moment we start submitting (using our official account), those entries are added to the competition until we hit 100. Is that correct?

In the timeline you have a call for “System Description Paper (Optional) Submission Deadline” – is this only a call for papers describing the systems in task 3, or is it possible to write a submission about strategies used in tasks 1 and 2 (if we consider some systematic approach(es) to improving the data rather than just writing difficult questions)?

In task 1, can the answer be a discontinuous scope (i.e. multiple sections of the paragraph)?

Related to Q4, can we include multiple questions in a single query?

For tasks 2 and 3, the instructions say that the evaluation will be done on the data from task 1.

Is this the data from all participants (I assume) or just ours?
Do we get some validation set from it or do we have to work “blind” (or with previously existing resources)

Thank you in advance.

Best regards,

Dr. Venelin Kovatchev, PhD

Postdoctoral Researcher
University of Texas at Austin

Max Bartolo

unread,

Apr 29, 2022, 1:03:20 PM4/29/22

to Dynamic Adversarial Data Collection (DADC) Workshop

Dear Dr. Kovatchev,

Many thanks for your interest and participation! Replies to your questions below:

1a. You may ask as many questions as you like per paragraph - there will probably be some sweet spot where you gain information about how the model handles each paragraph with each question asked, but with each additional question it becomes harder to come up with new challenging questions. You may not ask duplicate questions or questions that would be considered close paraphrases of ones you have already fooled the model with. You are allowed to "skip" paragraphs using the "Switch to next context" button.

1b. Paragraphs are selected at random.

2. Yes, that is correct. You are not allowed to retract questions, so every question you submit from your "DADC-affiliated" account counts.

3. It is also possible (and encouraged) to write about strategies used in tracks 1 and 2 - particularly for track 2 but if you identify any interesting research outcomes from track 1 then that is also encouraged.

4. No, the answer has to be a single continuous span of text from the paragraph. We plan to expand on this in the future, but for the time being we are sticking with the standard single-span extractive QA setting. For more information on what is and is not acceptable, please refer to: https://dadcworkshop.github.io/shared-task.html#validation-instructions

5. The base criteria we will use for evaluation is that "a human who reads the question should select the same answer you did". I'm not sure exactly what you mean by multiple questions, but you can definitely ask multi-hop questions e.g. "Who is the father of the father of X?"

6a. This is the data gathered from all participants. It will be the same evaluation set for all models trained on the track 2 data.

6b. You have to work "blind" - for the actual task we will use part of the track 1 data for validation and model selection. I would suggest that a combination of the SQuAD v1.1 dev set and the AdversarialQA dev set will probably be most representative of the expected question distribution from track 1.

Hope this helps and let us know if you have any other questions!

Thanks,

Max

Reply all

Reply to author

Forward

0 new messages