In their urls they seem to identify articles by a number matching:
Provided you use this identifier when you store articles in a database, you can write a spider middleware that queries the db to determine if you already have the article and decide to allow the request iff you don't. To improve performance for rejection of articles you can cache (on open_spider()) all the article identifiers from the previous day. For the complement, approving articles for scraping, think of a workaround, eg, I would guess their identifier is generated by a sequence, use sorting and don't look further back in the past than a few days before the current session.
Also, look at
http://www.bbc.com/news/10628494 , you can parse the date from the feed. Depending on the site, you may miss some articles if erroneous dates are stated.