Hi Shawn,
Common Crawl is targeted to programmers, data scientists, researchers
working with web data. The focus on "web data" explains why Common Crawl archives only the HTML.
Without the page dependencies (images, videos, JavaScript, CSS) which determine the visual
appearance, the
page captures are not really useful for "end user" browsing the archives. Instead we try to support
"data users" and provide
software libraries and code examples to process the data inside
the page captures. Web archiving in the sense of preserving cultural heritage wasn't the initial
objective of Common Crawl. However, we share a lot with the web archiving community, esp. the WARC
format and all tools around it.
Best,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>
common-crawl...@googlegroups.com <mailto:
common-crawl...@googlegroups.com>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/common-crawl/9b48f4e0-260a-4764-b225-895d71fcc024%40googlegroups.com
> <
https://groups.google.com/d/msgid/common-crawl/9b48f4e0-260a-4764-b225-895d71fcc024%40googlegroups.com?utm_medium=email&utm_source=footer>.