Hi Shuheng,
in addition to Stephane's comment...
The crawler collection the CC-NEWS collection relies on news feeds and news sitemaps
and skips over news items with an publication date older than 30 days. However, this
requires that the feed/sitemap indicates the publication dates and that the dates
are correct. Otherwise outdated news or pages may slip into the collection.
Because there are millions of feeds and sitemaps followed the crawler cannot revisit
every feed/sitemap even daily. Instead, it adapts to the change frequency of a feed/sitemap
and revisits a frequently changing feed/sitemap every 90 minutes. If there are no changes
the interval may grow to 90 days which also avoids that significant resources are spent
to stale feeds/sitemaps.
Best,
Sebastian