Dpsearch discover new content in reindexing all pages already in its database. The period is controlled by Period and PeriodByHops commands which may set on per Server basis.
Also dpsearch supports sitemaps introduced by Google a while ago to speedup reindexing massive sites. It is enabled with “sitemaps yes” command (and you need to have robots.txt support enabled as well).
Best regards,
Maxim Zakharov
I'd like to know how dpsearch can discover new content from the sites given in the Server list?If ten new articles go up each day to the site, but the site itself is massive. how is it going to recognise new content?
--
You received this message because you are subscribed to the Google Groups "DataparkSearch Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataparksearc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Iain,
The value for last-modified-date is taken from Last-modified HTTP header returned by remote server. If this header is not present, then the Date header is taken which usually gives the time of indexing.
Most “modern” site driving systems neglect setting up correct Last-modified header, so it is mostly not correct in giving actual last modified date.