Debugging out of date API query results

202 views
Skip to first unread message

Pratiksha Thaker

unread,
Mar 19, 2024, 12:15:19 PM3/19/24
to arXiv API
Hello,

I've been trying to write a small client to query the arXiv API for titles and abstracts of papers published in the last day. I'm finding that the API is currently retrieving titles several days out of date, e.g. the query

misses this paper

that shows up in the same search in the online UI 

I'm wondering if the issue is with the API or the way I'm constructing the query. I did try to follow a suggestion in another thread to check the RSS feed for the latest updates and then build an id_list to query for the titles and abstracts, but unfortunately, the API does not return results for the most recent papers even if I specify their ID (it does work for older IDs that do show up in the API query). I guess an alternative is to get the ID from the RSS feed, go to the paper's page, and scrape the title and abstract from there directly, but that seems painful.

If this is a known property of the API, how many days out of date should I expect it to be in general (so that I know how many days to query to catch up on results)?

Thanks!
Pratiksha

Pratiksha Thaker

unread,
Mar 27, 2024, 11:27:14 AM3/27/24
to arXiv API
(I see my message just got approved - probably the specific link/search result is outdated now but I'm still having this issue.)

Thanks,
Pratiksha

Gareth Spanglett

unread,
Jun 24, 2024, 1:51:46 PM6/24/24
to arXiv API
Hi Pratiksha,

I am having the same issue. Did you ever get this resolved?

Thanks,

Gareth.

Jake Weiskoff

unread,
Jun 24, 2024, 2:00:25 PM6/24/24
to arxi...@googlegroups.com

When there are inclusions that seem odd in the API response there’s usually a couple of reasons it can happen, usually one of these:

  1. there were changes to the metadata for that item and it’s been reindexed because of this; or,
  2. the oai or api indexing started before those papers were completed as part of the mailing processing. When this occurs the papers are then queued as part of the next day.

-- 

Jake Weiskoff

Project Manager, arXiv.org

Cornell Tech

ja...@arxiv.org

 

 

--
You received this message because you are subscribed to the Google Groups "arXiv API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arxiv-api+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arxiv-api/97ec832f-84d0-4f99-84d4-34ab3813f350n%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages