What is the best way to get articles for the current day

198 views
Skip to first unread message

Konstantin Pavlov

unread,
Dec 19, 2024, 7:47:30 AM12/19/24
to arXiv API Discussion
Hi!
I'm using Java and trying to figure out what's the best way to get all articles for the current date. I tried urls like that - http://export.arxiv.org/api/query?search_query=submittedDate:[2024-12-17T00:00:00 TO 2024-12-19T23:59:59]&start=0 - but it didn't retrun anything.
Maybe someone has encountered such a problem and knows how to solve it in the best way.
Thanks in advance!

Charles Frankston

unread,
Dec 19, 2024, 11:41:42 AM12/19/24
to arXiv API Discussion, konstp...@gmail.com
Konstantin --

You've chosen to use a proper ISO 8601 date representation, which is a very reasonable assumption. Unfortunately, it's not actually what this API requires. Try  export.arxiv.org/api/query?search_query=submittedDate:[202412170000 TO 202412192359]&start=0


The API provides one date filter, submittedDate, that allow you to select data within a given date range of when the data was submitted to arXiv. The expected format is [YYYYMMDDTTTT+TO+YYYYMMDDTTTT] were the TTTT is provided in 24 hour time to the minute, in GMT. We could construct the following query using submittedDate.

https://export.arxiv.org/api/query?search_query=au:del_maestro&submittedDate:[202301010600+TO+202401010600]


xiewen wei

unread,
Jan 3, 2025, 1:04:46 AMJan 3
to arXiv API Discussion, Charles Frankston, konstp...@gmail.com
hi Charles:
i want to know when the "current day" paper updated on the API way?  what is the update frequency? why cant the latest papers be found through API?
 ARXIV LOCAL TIME  Fri, 03 Jan 2025 01:04 EST
Now i set the search query  =(cat:cs.AI) AND submittedDate:[202501010600 TO 202501030600] or lastUpdate

```
<?xml version="1.0" encoding="UTF-8"?> <feed xmlns="http://www.w3.org/2005/Atom"> <link href="http://arxiv.org/api/query?search_query%3D%28cat%3Acs.AI%29%20AND%20submittedDate%3A%5B202501010600%20TO%20202501030600%5D%26id_list%3D%26start%3D0%26max_results%3D1000" rel="self" type="application/atom+xml"/> <title type="html">ArXiv Query: search_query=(cat:cs.AI) AND submittedDate:[202501010600 TO 202501030600]&amp;id_list=&amp;start=0&amp;max_results=1000</title> <id>http://arxiv.org/api/+L1TGdJELR6CU9BAV6ExVZxGlXA</id> <updated>2025-01-03T00:00:00-05:00</updated> <opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:totalResults> <opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex> <opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">1000</opensearch:itemsPerPage> </feed>
```

THERE is no data...

But  i try https://rss.arxiv.org/atom/cs.AI. this rss feed shows about 500+ paper today@20250103

CC Laan

unread,
Jan 6, 2025, 10:19:51 AMJan 6
to arXiv API Discussion, xiewen wei, Charles Frankston, konstp...@gmail.com
I'm also trying to find out how to get the latest papers matching what is appearing on the recent page. I guess the API is just delayed  ?  
Even if I search for an exact paper that is showing on the recent page & RSS feed ( without any date restrictions ) I get zero results from the API.  

arXiv API Discussion

unread,
Jan 7, 2025, 1:05:01 PMJan 7
to arXiv API Discussion
Most of this depends on knowing some of the inner workings on the timing around how the mailings are constructed. This information is public,  available here: 


with this in mind, you can reconstruct any given mailing by knowing the date boundaries involved. For example, if you wanted to see everything that would have been included in Monday night's mailing for cs.AI, that would look like: 


Note that using the & delimiter treats the query as a separate value to pass to it, where as if you want to bound the submittedDate element within a category, you have to use the +AND+cat:.... where .... is the category you want to use, etc. 

The search API is indexed in parallel with the creation of the mailing, but it's not available instantaneously. For best results, we usually suggest if you're trying to harvest new papers that you do so after midnight Eastern USA time. 

Regards, 

Jake Weiskoff
Project Manager, arXiv.org

Konstantin Pavlov

unread,
Jan 13, 2025, 8:04:30 AMJan 13
to a...@arxiv.org

Hi! 

Is it correct that no articles were published today (Jan 13, 2025)? I tried this URL: https://export.arxiv.org/api/query?search_query=submittedDate:[202501130000%20TO%20202501140000]&start=0.

Am I correct in assuming that articles for the current date are published according to a specific UTC time?


ср, 8 янв. 2025 г. в 00:05, arXiv API Discussion <a...@arxiv.org>:
--
You received this message because you are subscribed to the Google Groups "arXiv API Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api+uns...@arxiv.org.
To view this discussion visit https://groups.google.com/a/arxiv.org/d/msgid/api/73522df3-9b3d-4d87-8ebe-49eb30ff519en%40arxiv.org.

CC Laan

unread,
Jan 14, 2025, 11:26:05 AMJan 14
to arXiv API Discussion, arXiv API Discussion

Thanks for the info.

I have been trying to make a simple reader app that:
 - Fetches new daily papers around when they show up on 'recent' ( e.g https://arxiv.org/list/cs.CV/recent )
 - Allows going back in time to check a certain day in case I forget to read the recent page for that day

I am having some trouble finding the right API to use for this.
I will try to summarize my current understanding:

• RSS / Atom feeds seem to update first relative to the recent page ( within hours? )
• Search API updates later, sometimes next day
• Search API entry dates for 'published' & 'updated' sometimes don't match when it was released on 'recent', and are possibly set to the older submitted date ( see example below )
  • while RSS+Atom published/updated dates do appear to match
• There is no way to query the Search API by published date ( where published date is the date the paper becomes available on recent page ) 


Example of stale dates in search API:  

A paper that first shows up for today Jan 14 2025 on the cs.CV recent page ( https://arxiv.org/list/cs.CV/recent )
 is this one: http://arxiv.org/abs/2501.06336v1

If I perform search for a date range with some wiggle room from the 12th - 15th
https://export.arxiv.org/api/query?search_query=cat:cs.CV+AND+submittedDate:%5B20250112+TO+20250115%5D&start=0&max_results=1000&sortBy=submittedDate&sortOrder=descending
The paper does not show up. Which makes sense because it was submitted on the 10th 

It does show up in the date range of Jan 10th
https://export.arxiv.org/api/query?search_query=cat:cs.CV+AND+submittedDate:%5B20250109+TO+20250111%5D&start=0&max_results=1000&sortBy=submittedDate&sortOrder=descending

..but the 'published' date in the results shows as the 10th:

<entry>
    <id>http://arxiv.org/abs/2501.06336v1</id>
    <updated>2025-01-10T20:43:33Z</updated>
    <published>2025-01-10T20:43:33Z</published>
    <title>MEt3R: Measuring Multi-View Consistency in Generated Images</title>
...

It is currently showing up in the Atom/RSS feed also  
The Atom feed fetched today (  https://rss.arxiv.org/atom/cs.CV ) shows the published & updated dates as the 14th:

<entry>
    <id>oai:arXiv.org:2501.06336v1</id>
    <title>MEt3R: Measuring Multi-View Consistency in Generated Images</title>
    <updated>2025-01-14T05:00:14.455470+00:00</updated>    
    <published>2025-01-14T00:00:00-05:00</published>    
    <arxiv:announce_type>new</arxiv:announce_type>
....


Some questions:
- Was this paper actually published on the 10th and publicly available somewhere ? ( e.g. mailing list or via unlisted URL )
    - I'm fairly sure it did not show up in the search API from the 10th-13th since I was querying during those times

- If not, what does the 'published' date of the 10th in the search API refer to ?
- Does the search API index eventually update the 'published' date to match the RSS date / date it was released on recent?
- Should I always defer to the RSS published date as the more up to date one?


Is there currently, or are there plans to have an API that:
- Has a parameter such as 'publishedDate' we could specify ( where 'published' is the date it showed up on 'recent' ) ?  
- Has a latency similar to the recent page / RSS feeds?


Thanks for any help


Lakshay

unread,
Feb 4, 2025, 3:56:58 PMFeb 4
to arXiv API Discussion, CC Laan, arXiv API Discussion
Hi, even I'm trying to get the latest articles for the day and the api is really lagging behind, neither do the papers show up in :
https://export.arxiv.org/api/query?search_query=cat:cs.CV+AND+submittedDate:%5B20250201+TO+20250203%5D&start=0&max_results=1000&sortBy=submittedDate&sortOrder=descending
and 
https://export.arxiv.org/api/query?search_query=cat:cs.AI OR cat:cs.CL OR cat:cs.CV OR cat:cs.RO&start=0&max_results=50&sortBy=submittedDate&sortOrder=descending&include_cross_list=true
just returns the last papers from 31st jan
what's wrong?
Any help would be appreciated 

Reply all
Reply to author
Forward
0 new messages