Let's say this
example.com has been crawled today. How to only crawl the new and updated pages next time the crawler is run?
I can preserve the Last-Modified and ETag headers in a storage to check if a page is new or updated but there is a case that puzzles me and that is when a page has not been updated but there might be links within that page pointing to pages with new and updated contents. How to handle this case?
Thanks for the awesome package!