Hello Genevieve.
could you please be more specific on type of external data it is forbidden to use?
Citing from the task description, under B subtask, following is written:
"Critically, no external resources may be used that contain information
from after the rumour's resolution. To control this, we will specify
precise versions of external information that participants may use. This
is important to make sure we introduce time sensitivity into the task
of veracity prediction."
My questions are:
1. This does apply for the subtask A as well, right?
2. Where is the boundary between external data and non-external?
For example word embeddings are already trained on external data (and they are allowed). My point is, can we use pretrained language representation models (ULMFiT, OpenAI-GPT ...), can we augment the data, and can we augment them using some paraphrasing system (again trained on "external data"), can we use NER/POS/DEP extraction systems etc. or pretrain system on similar task?
Thank you for your answer.
Dňa pondelok, 5. novembra 2018 10:58:58 UTC+1
g.go...@sheffield.ac.uk napísal(-a):