Hi Unpaywall Team
I want to say, great product you have here, impressive amount of data collated together!
I have a question about how best to query the Unpaywall dataset to return all papers that reside in a set of journals/pre-print repositories.
Locations I am interested in at the moment are biorxiv, medrxiv and chemrxiv, and I may want to extend this list in future.
One query I have been using is something like this:
"oa_locations.repository_institution:(*biorxiv* OR *medrxiv* OR *chemrxiv*)"
Another query I have been using is like so:
"query": "(oa_locations.repository_institution:(*biorxiv* OR *medrxiv* OR *chemrxiv*) OR oa_locations.url_for_pdf:(*biorxiv* OR *medrxiv* or *chemrxiv*) OR oa_locations.url_for_landing_page:(*biorxiv* OR *medrxiv* or *chemrxiv*))"
this second query produces x3 times the amount of papers than the first query....
I hoping someone can help explain the difference between these queries and why there is such a difference in the amount of results returned, and help me to understand what would be the best / recommended query to get all papers from particular journals/repositories.
Thanks in advance
Tom