Hi all!
I have been attempting to utilise the API to identify papers for a systematic review. This involves a more complex query with >50 search terms.
When attempting to retrieve the arXiv identifiers for the papers returned by the search, I find that many are missing (only around 130 are returned).
The same problem also occurs when doing simpler searches involving a single term. The below screenshot includes the Python code of such a simple search.
- A search for "safe*" returns ~15,000 results.
- The for loop at the bottom of the code snippet aims to collect the arXiv identifiers in batches of 1,000.
- This is successful until 10,000 identifiers are collected. From here several of the queries return no values and the count of identifiers does not increase.
- 12,000 of the 15,000 expected results are retrieved.
If anyone is able to suggest what might be going wrong and anything that can be done to retrieve all the desired arXiv identifiers, I would be very grateful!
All the best,
Jacob