URL Frontier 0.4 released

23 views

Skip to first unread message

Julien Nioche

unread,

Sep 10, 2021, 10:44:03 AM9/10/21

to crawler...@googlegroups.com

Hi,

I have just released version 0.4 of URL Frontier, see http://urlfrontier.net for more details. This latest release contains quite a few bug fixes and performance improvements.

For those of you who crawl in Java, the client code is on Maven

<dependencies>
<dependency>
<groupId>com.github.crawler-commons</groupId>
<artifactId>urlfrontier-API</artifactId>
<version>0.4</version>
</dependency>
</dependencies>

and if you use StormCrawler, it already has a module to use URLFrontier.

I have been using StormCrawler and URLFrontier at scale in the context of a Fed4Fire+ experiment and have fetched 300M URLs and discovered another billion URLs. (Note - this is running with a single Frontier instance). The content we crawl is stored in the WARC format and will be donated to our friends at CommonCrawl.

We are entering the final stages of the project with NLNet but I am considering applying for a second round of funding to add more functionalities and improvements to it. I will keep you posted on this but in the meantime, please use https://github.com/crawler-commons/url-frontier/discussions for sharing your feedback, questions or suggestions.

Have fun!

Julien

Open Source Solutions for Text Engineering

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble

Lewis John Mcgibbney

unread,

Sep 10, 2021, 11:51:02 AM9/10/21

to crawler...@googlegroups.com

Congratulations Julien. Thanks for opening this up and sharing.

Lewis

--
You received this message because you are subscribed to the Google Groups "crawler-commons" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crawler-commo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/crawler-commons/CA%2B-fM0s-2WiMOetcf-OvfUMvSFL1NNsPhky4OVFZhWT%3DrOKFbw%40mail.gmail.com.

Lewis

Dr. Lewis J. McGibbney Ph.D, B.Sc

Skype: lewis.john.mcgibbney

Reply all

Reply to author

Forward

0 new messages