Hello,
I have a few questions, if you can please answer them for me it would be greatly appreciated.
How much porn is there in the common crawl corpus data set?
Is there a ton of porn in there?
I am looking for recipe data, with name, description and images.
What were the root url that started the crawl? and what was the depth of the crawl?
On this page [1], I am not able to see if there are images in teh corpus at all, and I really need that porn, I mean recipe data with images.
Thank you for all the tremendous work and efforts on common crawl.