Licensing questions for Track 2

Damian Romero

unread,

Apr 21, 2022, 12:29:04 AM4/21/22

to Dynamic Adversarial Data Collection (DADC) Workshop

As we start gathering resources for `Track 2: Better Training Data` we were wondering if there are any regulations regarding the dataset that we'll submit.

For example, if there are limitations for using datasets licensed under `CC by` instead of a `CC0` license, etc. to generate our 10K training examples).

We also wanted to ask under what license would the generated training examples will be made available later if at all.

Would we also be required to submit the code we used to generate such questions if any?

pedro rodriguez

unread,

Apr 21, 2022, 4:49:33 PM4/21/22

to Dynamic Adversarial Data Collection (DADC) Workshop

Hi Damian,

Great question! Our plan is to follow a couple of principles:

Wherever possible, we want to release data with a permissive license. For submitted data with no prior license obligations, we request that submitters either license the data under CC-BY-SA or allow us to do so.
However, if the data has an existing license, we'll have to abide by the terms of that license. My inclination is the minimum bar on the license would be allowing data to be publicly viewable/verifiable.

So given that, our plan is to release all the data that is licensable under CC-BY-SA in one group and to the extent possible release the data that may have other licenses (e.g., either host the data or link to it).

For code, we'd love it for teams to release code under a permissive license as well (e.g., MIT/Apache 2/etc), but understand that this is sometimes challenging so can try to accommodate if necessary.

Thanks!

Damian Romero

unread,

Apr 21, 2022, 7:37:59 PM4/21/22

to Dynamic Adversarial Data Collection (DADC) Workshop

Understood. Thank you for the clarification!

Reply all

Reply to author

Forward