Recommendations

4 views

Skip to first unread message

Julius Hamilton

unread,

Jul 20, 2024, 11:50:51 AM7/20/24

to frontera

Hey,

I'm considering a project where I crawl a certain academic blog for the sake of digital preservation.

Frontera looks interesting. Is it recommended or are there other tools to consider? How does it compare to Apache Nutch?

I am interested in a library which provides a very sophisticated crawling strategy, so I don't have to reinvent the wheel. I am thinking it needs to build a model of the site structure so it can infer where to crawl, when it does.

Thanks,

Julius

Reply all

Reply to author

Forward

0 new messages