Hi everyone,
I've read about the copyright issue in the previous threads. To respect that, please no links or data here. I am now downloading the training data and there seem to be plenty that come up as "cannot download", although the crawling is not yet finished (about 60% atm).
I suppose that more of these articles were available earlier when the data was first released. I'd appreciate if someone contacts me directly and sends the data over, both training and evaluation. I also read about the advice to try the crawler multiple times and that this won't create duplicates, which I will of course try.
I will be doing this project as a term project in my NLP Uni course, that is why I am starting so late with it.
Thank you for reading and considering this in advace!
Best,
Tamara