Download Wet Files of Web Pages with a Given Tag

42 views
Skip to first unread message

Dakila

unread,
Oct 2, 2017, 5:04:01 PM10/2/17
to Common Crawl
Hello,

With my previous question resolved so quickly, and thank you again, I am encouraged to ask the main question I am trying to answer: Is it possible to identify and download the Wet file of a web page containing a tag? The tag is " lang=tl ". How may this be accomplished? Thank you again. 

Sebastian Nagel

unread,
Oct 4, 2017, 10:54:21 AM10/4/17
to common...@googlegroups.com
Hi Dakila,

> Is it possible to identify and download the Wet file of a web page
> containing a tag? The tag is " lang=tl ".

Unfortunately not. Neither HTML meta data nor the identified language of the text
are contained in the WET files.

This topic has been discussed a couple of times before in this group.
Please check the archives.

Thanks,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> common-crawl...@googlegroups.com <mailto:common-crawl...@googlegroups.com>.
> To post to this group, send email to common...@googlegroups.com
> <mailto:common...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/common-crawl.
> For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages