API randomly returns empty responses

468 views
Skip to first unread message

Philipp Göldner

unread,
Nov 12, 2019, 9:54:48 AM11/12/19
to arXiv API
Hello,

when I try to acces the API with this request:

The API sometimes returns this:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
<link href="http://arxiv.org/api/query?search_query%3Dcat%3Acs.LG%26id_list%3D%26start%3D43000%26max_results%3D5" rel="self" type="application/atom+xml"/>
 
<title type="html">ArXiv Query: search_query=cat:cs.LG&amp;id_list=&amp;start=43000&amp;max_results=5</title>
 
<id>http://arxiv.org/api/sEISc9LHf13LafIZNAtkHlgJQo8</id>
 
<updated>2019-11-12T00:00:00-05:00</updated>
 
<opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">43956</opensearch:totalResults>
 
<opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">43000</opensearch:startIndex>
 
<opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">5</opensearch:itemsPerPage>
</feed>

and sometimes returns this: (I shortened the response)

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
<link href="http://arxiv.org/api/query?search_query%3Dcat%3Acs.LG%26id_list%3D%26start%3D43000%26max_results%3D5" rel="self" type="application/atom+xml"/>
 
<title type="html">ArXiv Query: search_query=cat:cs.LG&amp;id_list=&amp;start=43000&amp;max_results=5</title>
 
<id>http://arxiv.org/api/sEISc9LHf13LafIZNAtkHlgJQo8</id>
 
<updated>2019-11-12T00:00:00-05:00</updated>
 
<opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">43956</opensearch:totalResults>
 
<opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">43000</opensearch:startIndex>
 
<opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">5</opensearch:itemsPerPage>
 
<entry>
   
<id>http://arxiv.org/abs/1905.01489v2</id>
   
<updated>2019-08-16T20:40:58Z</updated>
   
<published>2019-05-04T13:14:12Z</published>
   
<title>WoodScape: A multi-task, multi-camera fisheye dataset for autonomous
  driving
</title>
   
<summary>  Fisheye cameras are commonly employed for obtaining a large field of view in
surveillance, augmented reality and in particular automotive applications. In
spite of their prevalence, there are few public datasets for detailed
evaluation of computer vision algorithms on fisheye images. We release the
first extensive fisheye automotive dataset, WoodScape, named after Robert Wood
who invented the fisheye camera in 1906. WoodScape comprises of four surround
view cameras and nine tasks including segmentation, depth estimation, 3D
bounding box detection and soiling detection. Semantic annotation of 40 classes
at the instance level is provided for over 10,000 images and annotation for
other tasks are provided for over 100,000 images. With WoodScape, we would like
to encourage the community to adapt computer vision models for fisheye camera
instead of using naive rectification.
</summary>
   
<arxiv:comment xmlns:arxiv="http://arxiv.org/schemas/atom">Accepted for Oral Presentation at IEEE International Conference on
  Computer Vision (ICCV) 2019. Please refer to
  https://github.com/valeoai/woodscape for release status and updates
</arxiv:comment>
   
<link href="http://arxiv.org/abs/1905.01489v2" rel="alternate" type="text/html"/>
   
<link title="pdf" href="http://arxiv.org/pdf/1905.01489v2" rel="related" type="application/pdf"/>
   
<arxiv:primary_category xmlns:arxiv="http://arxiv.org/schemas/atom" term="cs.CV" scheme="http://arxiv.org/schemas/atom"/>
   
<category term="cs.CV" scheme="http://arxiv.org/schemas/atom"/>
   
<category term="cs.AI" scheme="http://arxiv.org/schemas/atom"/>
   
<category term="cs.LG" scheme="http://arxiv.org/schemas/atom"/>
   
<category term="cs.RO" scheme="http://arxiv.org/schemas/atom"/>
   
<category term="stat.ML" scheme="http://arxiv.org/schemas/atom"/>
 
</entry>
</feed>

This seems to happen randomly when using (relatively) large numbers for the start parameter.
I assume that this is a bug because the documentation doesn't specifiy an upper limit for start.
Is there any known workaround for this issue?

Best regards,
Philipp

Thorsten

unread,
Nov 12, 2019, 10:22:48 AM11/12/19
to arXiv API

the manual mentions an upper limit of 30,000 records


Because of speed limitations in our implementation of the API, the maximum number of results returned from a single call (max_results) is limited to 30000 in slices of at most 2000 at a time, using the max_results and start query parameters. For example to retrieve matches 6001-8000: http://export.arxiv.org/api/query?search_query=all:electron&start=6000&max_results=8000

Large result sets put considerable load on the server and also take a long time to render. We recommend to refine queries which return more than 1,000 results, or at least request smaller slices. For bulk metadata harvesting or set information, etc., the OAI-PMH interface is more suitable. A request with max_results >30,000 will result in an HTTP 400 error code with appropriate explanation. A request for 30000 results will typically take a little over 2 minutes to return a response of over 15MB. Requests for fewer results are much faster and correspondingly smaller.



To allow for stable pagination, the entire resultset has to be cached in order to return slices at certain offsets. I think your query simply excceeds those internal limitations

Cheers
T.

Philipp Göldner

unread,
Nov 12, 2019, 10:45:00 AM11/12/19
to arXiv API
Ah thank you. I didn't know that this limitation also applies to start and not only to max_value. (this could be better explained in the documentation)
For consistency the api should also return a HTTP 400 error code if start > 30000 so users are not wondering why they sometimes get a result and sometimes not.

Best regards,
Philipp

Ion Freeman

unread,
Apr 21, 2020, 10:25:07 PM4/21/20
to arXiv API
I'm getting an empty resultset with any value of start. Does that mean that my openSearch:totalResults value is too big?

Thorsten S

unread,
Apr 21, 2020, 10:32:54 PM4/21/20
to arXiv api

--
You received this message because you are subscribed to the Google Groups "arXiv API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arxiv-api+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arxiv-api/a484cc57-1bb9-4025-908b-5aa27783e12a%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages