You may like my project, common-crawler
--
You received this message because you are subscribed to a topic in the Google Groups "Common Crawl" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/common-crawl/naAhBHpkjso/unsubscribe.
To unsubscribe from this group and all its topics, send an email to common-crawl+unsubscribe@googlegroups.com.
To post to this group, send email to common...@googlegroups.com.
--
--
It will fetch common index from the server and store in your local db for future crawling using varios cdx-api filter with help to crawll perticular page and index.
It's quite easy to get what you want from some of the pages. It will take a long time to crawl whole index but it can save money and time in my case. thats why I started developing this things.
For more please keep following project. thanks.
On Thursday, 18 January 2018 12:21:20 UTC+5:30, Tom Morris wrote:On Wed, Jan 17, 2018 at 10:23 PM, Vallabh Kansagara <vrkan...@gmail.com> wrote:You may like my project, common-crawlerCould you explain a little bit more about what your project is and how it relates to CommonCrawl, which it seems to be attempting to associate itself with?Tom
--
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl+unsubscribe@googlegroups.com.
To post to this group, send email to common...@googlegroups.com.
On Thu, Jan 18, 2018 at 9:19 AM, Vallabh Kansagara <vrkan...@gmail.com> wrote:
It will fetch common index from the server and store in your local db for future crawling using varios cdx-api filter with help to crawll perticular page and index.Using the Common Crawl Index service for bulk access (e.g. *.co.uk) is an abuse of the service that will negative affect casual interactive users. It's also slow.You should be downloading the index files (listed in e.g. https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2017-51/cc-index.paths.gz) and accessing them locally.Tom
--
It's quite easy to get what you want from some of the pages. It will take a long time to crawl whole index but it can save money and time in my case. thats why I started developing this things.
For more please keep following project. thanks.
On Thursday, 18 January 2018 12:21:20 UTC+5:30, Tom Morris wrote:On Wed, Jan 17, 2018 at 10:23 PM, Vallabh Kansagara <vrkan...@gmail.com> wrote:You may like my project, common-crawlerCould you explain a little bit more about what your project is and how it relates to CommonCrawl, which it seems to be attempting to associate itself with?Tom
You received this message because you are subscribed to the Google Groups "Common Crawl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to common-crawl...@googlegroups.com.