Thanks for the info.
I have been trying to make a simple reader app that:
- Fetches new daily papers around when they show up on 'recent' ( e.g
https://arxiv.org/list/cs.CV/recent )
- Allows going back in time to check a certain day in case I forget to read the recent page for that day
I am having some trouble finding the right API to use for this.
I will try to summarize my current understanding:
• RSS / Atom feeds seem to update first relative to the recent page ( within hours? )
• Search API updates later, sometimes next day
• Search API entry dates for 'published' & 'updated' sometimes don't match when it was released on 'recent', and are possibly set to the older submitted date ( see example below )
• while RSS+Atom published/updated dates do appear to match
• There is no way to query the Search API by published date ( where published date is the date the paper becomes available on recent page )
Example of stale dates in search API:
A paper that first shows up for today Jan 14 2025 on the cs.CV recent page (
https://arxiv.org/list/cs.CV/recent )
is this one:
http://arxiv.org/abs/2501.06336v1If I perform search for a date range with some wiggle room from the 12th - 15th
https://export.arxiv.org/api/query?search_query=cat:cs.CV+AND+submittedDate:%5B20250112+TO+20250115%5D&start=0&max_results=1000&sortBy=submittedDate&sortOrder=descendingThe paper does not show up. Which makes sense because it was submitted on the 10th
It does show up in the date range of Jan 10th
https://export.arxiv.org/api/query?search_query=cat:cs.CV+AND+submittedDate:%5B20250109+TO+20250111%5D&start=0&max_results=1000&sortBy=submittedDate&sortOrder=descending..but the 'published' date in the results shows as the 10th:
<entry>
<id>
http://arxiv.org/abs/2501.06336v1</id>
<updated>2025-01-10T20:43:33Z</updated>
<published>2025-01-10T20:43:33Z</published>
<title>MEt3R: Measuring Multi-View Consistency in Generated Images</title>
...
It is currently showing up in the Atom/RSS feed also
The Atom feed fetched today (
https://rss.arxiv.org/atom/cs.CV ) shows the published & updated dates as the 14th:
<entry>
<id>oai:arXiv.org:2501.06336v1</id>
<title>MEt3R: Measuring Multi-View Consistency in Generated Images</title>
<updated>2025-01-14T05:00:14.455470+00:00</updated>
<published>2025-01-14T00:00:00-05:00</published>
<arxiv:announce_type>new</arxiv:announce_type>
....
Some questions:
- Was this paper actually published on the 10th and publicly available somewhere ? ( e.g. mailing list or via unlisted URL )
- I'm fairly sure it did not show up in the search API from the 10th-13th since I was querying during those times
- If not, what does the 'published' date of the 10th in the search API refer to ?
- Does the search API index eventually update the 'published' date to match the RSS date / date it was released on recent?
- Should I always defer to the RSS published date as the more up to date one?
Is there currently, or are there plans to have an API that:
- Has a parameter such as 'publishedDate' we could specify ( where 'published' is the date it showed up on 'recent' ) ?
- Has a latency similar to the recent page / RSS feeds?
Thanks for any help