How to use apache hive with Common crawl tables??

52 views
Skip to first unread message

Creative Subdivide

unread,
Mar 11, 2020, 12:01:09 PM3/11/20
to Common Crawl
I am very very new to hive. I want to know that where to get the full index tables of common crawl and how to crawl them with apache hive?? Basically I want to know that is there any tutorial or anythin??

Sebastian Nagel

unread,
Mar 12, 2020, 6:17:53 AM3/12/20
to common...@googlegroups.com
Hi,

I'm not aware of any tutorial how to use Common Crawl data (in particular, the columnar index) with
Apache Hive.

Hive runs on top of Hadoop. So the first steps would be to become familiar with Hadoop. However,
it's much easier to use a managed NoSQL engine such as Amazon Athena.

Best,
Sebastian
> --
> You received this message because you are subscribed to the Google Groups "Common Crawl" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
> common-crawl...@googlegroups.com <mailto:common-crawl...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/common-crawl/98ce2612-a56d-46b6-be36-2b6068de3007%40googlegroups.com
> <https://groups.google.com/d/msgid/common-crawl/98ce2612-a56d-46b6-be36-2b6068de3007%40googlegroups.com?utm_medium=email&utm_source=footer>.

Reply all
Reply to author
Forward
0 new messages