URL Frontier 1.0 released

13 views
Skip to first unread message

Julien Nioche

unread,
Oct 18, 2021, 7:52:39 AM10/18/21
to crawler...@googlegroups.com
Hi, 

I am pleased to announce the release of URLFrontier 1.0 (https://urlfrontier.net/).

This marks the end of the initial phase of the project funded by NLNet. The final milestone of the project was to run a large-scale crawl, which we did thanks to another grant from Fed4Fire+.  This has been quite successful, I fixed a few bugs and added various improvements to the code but also generated WARC files for CommonCrawl to host and redistribute later.

I presented the project and the large-scale crawling experiment to the Open Search Symposium last week, the slides can be found at [1]. 

What's next? I have applied for a 2nd round of funding from them, we might get lucky again. This would allow me to work on improving the service further. 

Now that we have something to run and play with, I am hoping to see more users/adopters/contributors. To that effect, I have put together a few ideas of how people can contribute, see [2]. 

As usual, please do get in touch if you have comments or questions on [3].

Have fun and happy crawling!

Julien




Lewis John Mcgibbney

unread,
Oct 18, 2021, 1:40:35 PM10/18/21
to crawler...@googlegroups.com
Excellent work Julien.
Lewis 

--
You received this message because you are subscribed to the Google Groups "crawler-commons" group.
To unsubscribe from this group and stop receiving emails from it, send an email to crawler-commo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/crawler-commons/CA%2B-fM0uiw%2BkrbnBdqQwmQh5gRXh06tqexj2jyAdVdfOYZ65UfQ%40mail.gmail.com.
--
Lewis
Dr. Lewis J. McGibbney Ph.D, B.Sc
Skype: lewis.john.mcgibbney



Sebastian Nagel

unread,
Oct 19, 2021, 2:51:48 AM10/19/21
to crawler...@googlegroups.com
Hi Julien,

great to hear!

Sebastian

On 10/18/21 1:52 PM, Julien Nioche wrote:
> Hi,
>
> I am pleased to announce the release of URLFrontier 1.0 (https://urlfrontier.net/ <https://urlfrontier.net/>).
>
> This marks the end of the initial phase of the project funded by NLNet. The final milestone of the project was to run a large-scale crawl,
> which we did thanks to another grant from Fed4Fire+.  This has been quite successful, I fixed a few bugs and added various improvements to
> the code but also generated WARC files for CommonCrawl to host and redistribute later.
>
> I presented the project and the large-scale crawling experiment to the Open Search Symposium last week, the slides can be found at [1].
>
> What's next? I have applied for a 2nd round of funding from them, we might get lucky again. This would allow me to work on improving the
> service further.
>
> Now that we have something to run and play with, I am hoping to see more users/adopters/contributors. To that effect, I have put together a
> few ideas of how people can contribute, see [2].
>
> As usual, please do get in touch if you have comments or questions on [3].
>
> Have fun and happy crawling!
>
> Julien
>
>
> [1] https://drive.google.com/uc?export=download&id=1hrJPLB3hwg47nqVIDu83n5hiolu-UsQl
> <https://drive.google.com/uc?export=download&id=1hrJPLB3hwg47nqVIDu83n5hiolu-UsQl>
> [2] https://github.com/crawler-commons/url-frontier/wiki/Ways-to-help <https://github.com/crawler-commons/url-frontier/wiki/Ways-to-help>
> [3] https://github.com/crawler-commons/url-frontier/discussions <https://github.com/crawler-commons/url-frontier/discussions>
>
>
> --
> You received this message because you are subscribed to the Google Groups "crawler-commons" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to crawler-commo...@googlegroups.com
> <mailto:crawler-commo...@googlegroups.com>.
> <https://groups.google.com/d/msgid/crawler-commons/CA%2B-fM0uiw%2BkrbnBdqQwmQh5gRXh06tqexj2jyAdVdfOYZ65UfQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Avi Hayun

unread,
Oct 19, 2021, 2:58:34 AM10/19/21
to crawler...@googlegroups.com
Brilliant and inspiring

Thank you Julien, I really hope you will get another round of funding, this money goes to a right cause !

To unsubscribe from this group and stop receiving emails from it, send an email to crawler-commo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/crawler-commons/12cfaa81-6429-9705-633f-4d7e5f4d9c73%40googlemail.com.
Reply all
Reply to author
Forward
0 new messages