API returns subset of valid results for an author search?

143 views
Skip to first unread message

Roger Rohrbach

unread,
Jul 6, 2019, 6:37:49 PM7/6/19
to arXiv API
I am attempting to use the arXiv API for the first time, and am utterly bewildered by the meagre documentation, and the disparity between the results obtainable by an ordinary arXiv search and those obtained via the API.

To present a concrete example, the API User's Manual illustrates an example search by author: in order to find all articles by the author Adrian DelMaestro (sic), we are instructed to supply

  au:del_maestro

as the value for the search_query parameter.

No explanation is given as to:
  • why the name "Del Maestro" must be presented as del_maestro
  • whether it is possible to search for an author's full name, or only last name
  • what the format of such an author name query should be
The example search:


returns 10 results, all of which have Adrian Del Maestro listed as an author.

This search (which, along with the ones below, I guessed at):


returns the same results.

This search:


returns 6 of those results, and 4 different, valid results.

This search:


returns 6 of the first query's results, 3 of the second query's, and one new, valid result.

In contrast, a search for "Del Maestro, Adrian" on the arXiv Search page produces 39 results with Adrian Del Maestro listed as author.  The results returned from the API in the above examples are all subsets of this result set.

I've detailed this in a Google Sheet.

If anyone can enlighten me as to how I use the API to retrieve all the valid results returned from an arXiv author search, I would be most grateful.


Roger Rohrbach

unread,
Jul 12, 2019, 10:09:16 AM7/12/19
to arXiv API
Having received no response here, I contacted arXiv and received a helpful reply from Jake Weiskoff, who explained that the default number of results per page is 10. My error; this is clearly documented.  I was confused because the various permutations of the author name produced different results; I'm told this is likely related to the age of the items in the search index.

Jake also explained:

Author names are generally constructed as LastName_First Initial ... (except in the case of multi-word surnames such as Del Maestro, which get treated as you indicate).

which I humbly submit ought to be made explicit in the User's Manual.

Reply all
Reply to author
Forward
0 new messages