Recommendations

4 views
Skip to first unread message

Julius Hamilton

unread,
Jul 20, 2024, 11:50:51 AM7/20/24
to frontera
Hey,

I'm considering a project where I crawl a certain academic blog for the sake of digital preservation.

Frontera looks interesting. Is it recommended or are there other tools to consider? How does it compare to Apache Nutch?

I am interested in a library which provides a very sophisticated crawling strategy, so I don't have to reinvent the wheel. I am thinking it needs to build a model of the site structure so it can infer where to crawl, when it does.

Thanks,
Julius
Reply all
Reply to author
Forward
0 new messages