URL Frontier 0.4 released

6 views
Skip to first unread message

Julien Nioche

unread,
Sep 10, 2021, 10:44:03 AM9/10/21
to crawler...@googlegroups.com
Hi, 

I have just released version 0.4 of URL Frontier, see http://urlfrontier.net for more details. This latest release contains quite a few bug fixes and performance improvements.

For those of you who crawl in Java, the client code is on Maven

<dependencies>
<dependency>
<groupId>com.github.crawler-commons</groupId>
<artifactId>urlfrontier-API</artifactId>
<version>0.4</version>
</dependency>
</dependencies>


and if you use StormCrawler, it already has a module to use URLFrontier.

I have been using StormCrawler and URLFrontier at scale in the context of a Fed4Fire+ experiment and have fetched 300M URLs and discovered another billion URLs. (Note - this is running with a single Frontier instance). The content we crawl is stored in the WARC format and will be donated to our friends at CommonCrawl. 

We are entering the final stages of the project with NLNet but I am considering applying for a second round of funding to add more functionalities and improvements to it. I will keep you posted on this but in the meantime, please use https://github.com/crawler-commons/url-frontier/discussions for sharing your feedback, questions or suggestions.

Have fun!

Julien
--

Lewis John Mcgibbney

unread,
Sep 10, 2021, 11:51:02 AM9/10/21
to crawler...@googlegroups.com
Congratulations Julien. Thanks for opening this up and sharing.
Lewis 

--
You received this message because you are subscribed to the Google Groups "crawler-commons" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crawler-commo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/crawler-commons/CA%2B-fM0s-2WiMOetcf-OvfUMvSFL1NNsPhky4OVFZhWT%3DrOKFbw%40mail.gmail.com.
--
Lewis
Dr. Lewis J. McGibbney Ph.D, B.Sc
Skype: lewis.john.mcgibbney



Reply all
Reply to author
Forward
0 new messages