Different results with Advanced Search and API

AKSHAY SUBRAMANIAN

unread,

Jun 14, 2020, 1:37:42 PM6/14/20

to arXiv API

Hi,

I am trying to get COVID-19 related papers from arxiv.

I'm specifically interested in getting papers that are seen in https://arxiv.org/search/advanced?advanced=&terms-0-operator=AND&terms-0-term=COVID-19&terms-0-field=title&terms-1-operator=OR&terms-1-term=SARS-CoV-2&terms-1-field=abstract&terms-3-operator=OR&terms-3-term=COVID-19&terms-3-field=abstract&terms-4-operator=OR&terms-4-term=SARS-CoV-2&terms-4-field=title&terms-5-operator=OR&terms-5-term=coronavirus&terms-5-field=title&terms-6-operator=OR&terms-6-term=coronavirus&terms-6-field=abstract&classification-physics_archives=all&classification-include_cross_list=include&date-filter_by=all_dates&date-year=&date-from_date=&date-to_date=&date-date_type=submitted_date&abstracts=show&size=200&order=-announced_date_first&source=home-covid-19

The above advanced search displays a total of 1304 papers currently.

I'm trying to replicate the results using the arxiv API. I used the following query.

ti:COVID-19+OR+abs:SARS-CoV-2+OR+abs:COVID-19+OR+ti:SARS-CoV-2+OR+ti:coronavirus+OR+abs:coronavirus
Surprisingly, this query gives me only 473 results instead of 1304.

Is there something I am missing while trying to replicate the advanced search query?

Thanks

Eric Lease Morgan

unread,

Jun 15, 2020, 3:51:21 PM6/15/20

to arxi...@googlegroups.com

At the risk of muddying the waters, you might consider taking advantage of a data set called CORD:

https://www.semanticscholar.org/cord19

The data set was created from many sources, and it includes about 100,000 records. The data set includes at least two parts: 1) a zip file containing the full texts of articles all on COVID-19, and 2) a metadata (CSV) file providing bits of metadata describing the full texts. One of the fields in the metadata file is arXiv_id. Thus, you can get a list of articles coming from arXiv.

HTH.

--
Eric Morgan
University of Notre Dame

r baum

unread,

Jun 19, 2020, 4:26:43 PM6/19/20

to arXiv API

We discovered the same problem today. We did the same search query and we received 487 records instead of the 1,358 shown on the main page. Afterwards we searched for a missing title (Exploratory Analysis of a Social Media Network in Sri Lanka during the COVID-19 Virus Outbreak) from the first query via the api and we found the record. I would guess there is a restriction in the amount of results the api gets from the abstracts. We also tried to change the starting point to get the missing abstracts. But that did not work.

RB

Burger

unread,

Jun 19, 2020, 5:47:02 PM6/19/20

to arXiv API

I believe the issue is due to the '-' character in the search query. Running the simple query through the api: http://export.arxiv.org/api/query?search_query=ti:COVID-19 only returns 59 results which is clearly wrong. Their is probably a difference on how the advance search query and the api interpret this character. I would recommend to try and modify you search query to use the space character instead of '-' or alternatively use the AND. Example: http://export.arxiv.org/api/query?search_query=ti:%22COVID+AND+ti:19. This should at least help you to get more results until the issue can be solved?

Reply all

Reply to author

Forward