Apache Solr based search engine using Common Crawl data, please add to examples

35 views
Skip to first unread message

Common Screens

unread,
Jan 8, 2023, 2:48:54 PM1/8/23
to Common Crawl
created php code to index WARC files into solr in cloud mode solr can handle 2.5 billion documents per shard, preview available at https://visualsearch.org/ it will take approximately 6 months to index all 3.5 billion urls. check it out let me know if you are interested in the code behind it.

Can you please add this to common crawl examples

Sebastian Nagel

unread,
Feb 14, 2023, 8:50:18 AM2/14/23
to common...@googlegroups.com
Hi,

> https://visualsearch.org/ <https://visualsearch.org/>
>
> Can you please add this to common crawl examples
>

Thanks for sharing! I've added it to our list of examples:

https://commoncrawl.org/the-data/examples/

Best,
Sebastian

On 1/8/23 20:48, Common Screens wrote:
> created php code to index WARC files into solr in cloud mode solr can
> handle 2.5 billion documents per shard, preview available at
> https://visualsearch.org/ <https://visualsearch.org/> it will take
> approximately 6 months to index all 3.5 billion urls. check it out let
> me know if you are interested in the code behind it.
> https://visualsearch.org/ <https://visualsearch.org/>
Reply all
Reply to author
Forward
0 new messages