--
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.
To post to this group, send email to common...@googlegroups.com.
Visit this group at http://groups.google.com/group/common-crawl.
For more options, visit https://groups.google.com/d/optout.
Stephen,
Thank you very much for responding so quickly and for all of your work on Common Crawl. I don’t want to speak for all of us, but given the feedback I’ve gotten so far from some of the dev communities, I think we would very much appreciate the chance to be tested on a monthly basis as part of the regular Common Crawl process.
I think we’ll still want to run more often in our own sandbox(es) on the slice of CommonCrawl we have, but the monthly testing against new data, from my perspective at least, would be a huge win for all of us.
In addition to parsing binaries and extracting text, Tika (via PDFBox, POI and many others) can also offer metadata (e.g. exif from images), which users of CommonCrawl might find of use.
I’ll forward this to some of the relevant dev lists to invite others to participate in the discussion on the common-crawl list.
Thank you, again. I very much look forward to collaborating.
Best,
Tim