Saving state on shutdown

25 views

Skip to first unread message

Tom

unread,

Nov 10, 2021, 8:24:12 PM11/10/21

to Abot Web Crawler

I'm using the AbotX parallel crawler engine, within the confines of a dotnet worker app (i.e. uses the Microsoft.NET.Sdk.Worker SDK to run "services" like my crawler).

When the service is stopped, I would like to dump out the list of URLs that have been collected for future crawling (i.e. what the crawler has discovered when crawling a site), so that the next time I start the crawler, it can pick right up where it left off.

I don't see a clear way to do this, am I missing something?

I see the Pause/Resume option, but it doesn't seem to be designed for complete shutdown and resuming later.

Poking around in the code, I see in the IScheduler interface, there is a GetNext() method to return a PageToCrawl, so I've thought about trying to just loop through that and save to disk... is that going to work? Is/should there be a better way?

Reply all

Reply to author

Forward

0 new messages