Anyone that has managed to obtain the full train and eval data and is willing to send it over?

63 views
Skip to first unread message

Tamara

unread,
Feb 6, 2022, 4:50:57 PM2/6/22
to semeval-2022-task-8-multilingual-news

Hi everyone,

I've read about the copyright issue in the previous threads. To respect that, please no links or data here. I am now downloading the training data and there seem to be plenty that come up as "cannot download", although the crawling is not yet finished (about 60% atm).

I suppose that more of these articles were available earlier when the data was first released. I'd appreciate if someone contacts me directly and sends the data over, both training and evaluation. I also read about the advice to try the crawler multiple times and that this won't create duplicates, which I will of course try.

I will be doing this project as a term project in my NLP Uni course, that is why I am starting so late with it.

Thank you for reading and considering this in advace!

Best,
Tamara

Tamara

unread,
Feb 7, 2022, 5:40:53 AM2/7/22
to semeval-2022-task-8-multilingual-news
After a couple of runs, there are the articles that cannot be downloaded from the train data. Again, I'd appreciate some help aroud this  a lot.
inaccessible_urls_20220207102403.csv

Daniel Haile

unread,
Apr 5, 2022, 5:26:28 PM4/5/22
to semeval-2022-task-8-multilingual-news
Hi Tamara, 

did you by any chance get feedback or solution to your question? I'm facing the exact same problem. 
Best, 
Daniel
Reply all
Reply to author
Forward
0 new messages