Bug report: `opensearch:totalResults` for empty results

101 views
Skip to first unread message

Lukas Schwab

unread,
Apr 4, 2021, 1:12:12 PM4/4/21
to arxi...@googlegroups.com
Hello!

I spotted what I believe to be a bug in the API today. When there aren't any results for a particular query––
  • Expected: the `opensearch:totalResults` value is 0.
  • Actual: the `opensearch:totalResults` value is 1.
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <link href="http://arxiv.org/api/query?search_query%3D%26id_list%3D0000.0000%26start%3D0%26max_results%3D100" rel="self" type="application/atom+xml"/>
  <title type="html">ArXiv Query: search_query=&amp;id_list=0000.0000&amp;start=0&amp;max_results=100</title>
  <id>http://arxiv.org/api/oqaUC9OB2SccftmJHZqrxySXrao</id>
  <updated>2021-04-03T00:00:00-04:00</updated>
  <opensearch:totalResults xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">1</opensearch:totalResults>
  <opensearch:startIndex xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>
  <opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">100</opensearch:itemsPerPage>
</feed>
Cheers,
Lukas

Lukas Schwab

unread,
Apr 4, 2021, 1:12:15 PM4/4/21
to arxi...@googlegroups.com
Actually, I have another pagination bug to report.

For some serieses of paginated requests, the API returns an empty feed at a point when more results are expected. This may correlate with high page sizes (e.g. 1000), though the User Manual suggests the true maximum page size is 2000. Additional notes:
  • This behavior is flaky: using the same client, two fetches for an identical query may fail after different numbers of pages.
  • These pages don't fail persistently; retrying the failed request often yields the expected page of results.
  • This doesn't seem to be related to request rate limiting.
A GitHub issue opened by some users of my client library, with some extended analysis/an explanation of how to reproduce the issue: https://github.com/lukasschwab/arxiv.py/issues/43#issuecomment-812318598
I'm adding request retry logic for now.

I realize the main focus may be on replacing the Atom API, though 😁
Reply all
Reply to author
Forward
0 new messages