Re: Details on oaDOI

49 views
Skip to first unread message

Rajath

unread,
May 12, 2020, 2:11:59 PM5/12/20
to Richard Orr, Unpaywall discussion, te...@ourresearch.org
Hi Richard and Unpaywall Team,

Hope this email finds you well,

First of all thanks a lot for the details on earlier mail and all the help/support you have been providing,

I was going through the implementation details of Unpaywall and had the below questions before I finally integrate the API,

1. Can you confirm that for the free version of the Unpaywall API, the requests limit is 100,000 per day?

2. How often are the Snapshots updated? I assume its 2 months once.

3. Finally, what is the cost of the enterprise version of the API?
The article here speaks about $1000/year, I assume it is for Unpaywall journals. But can you let me know if there exists a subscription version of Unpaywall API or data feeds too?


Thanks a lot in advance!

Regards,
Rajath


On Tue, May 12, 2020 at 11:15 PM Rajath <rajath...@gmail.com> wrote:
Wow, thank you so much.

Frankly, this explanation is the best gift I have received I can say. This literally details me on what OAI-PMH is all about or how it works. 

Thanks much, Richard!!!

Thanks & Regards,
Rajath C S


On Tue, May 12, 2020 at 11:01 PM Richard Orr <sup...@unpaywall.org> wrote:
Hi Rajath,

Not a naïve question at all. I'll see if I can help.

The basic pattern in OAI-PMH harvesting is calling GetRecords with some set of filters, then using the provided token to get all the records that match your filters. For example, if you call


You'll get records modified between 2020-01-01 and 2020-01-10. They don't fit into one page, so the response includes the token

<resumptionToken completeListSize="6947" cursor="0">
2020-01-07T03:05:30Z!2020-01-10!!oai_dc!1172!6947!oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:36938
</resumptionToken>

Calling http://union.ndltd.org/OAI-PMH/?verb=ListRecords&resumptionToken={TOKEN} until there are no more tokens will get all the results. Sickle does this automatically when you do

sickle = Sickle('http://union.ndltd.org/OAI-PMH/')
records = sickle.ListRecords(**{'metadataPrefix': 'oai_dc', 'from': '2020-01-01', 'until': '2020-01-10'})


You can then iterate over the records by calling records.next(). Sickle parses each XML result into header and metadata dictionaries. You can see how we extract the record data here: https://github.com/ourresearch/oadoi/blob/master/pmh_record.py#L166 (the pmh_input_record argument is an element of a Sickle record set). If you're making the requests yourself without Sickle you'll need to use an XML library to do this.

From there, it all depends what you need to do with the data. If you want to find repository landing pages, they are most often linked in <dc:identifier> elements. Scrolling through a few pages of records should give you an idea of what else you can find.

Best,

Richard Orr
Lead Developer, Unpaywall
OurResearch: We build tools to make scholarly research more open, connected, 
and reusable—for everyone.

On Mon, 11 May at 1:33 AM , Rajath <rajath...@gmail.com> wrote:
Hi Richard,
 
Hope this mail finds you well,
 
I have seen your code using Sickle python library to work with OAI-PMH,
 
I am new to OAI-PMH and I was figuring out how to download content using an OAI-PMH XML. Example: I have the NDLTD PMH (http://union.ndltd.org/OAI-PMH/) how can I query this XML to download/use the complete contents of NDLTD?
 
I know the above is a very naïve question but any pointer to understanding the mechanism of OAI-PMH will be of great help.
 
Thanks & Regards,
Rajath


On Tue, Apr 28, 2020 at 9:28 PM Rajath <rajath...@gmail.com> wrote:
Thanks a lot for this detailed information and explanation Richard. I will certainly explore this.

This is very helpful, thanks much again for the great work you all are doing at Unpaywall and hope you all are keeping well amid the COVID-19 situation,


Regards,
Rajath C S


On Tue, Apr 28, 2020 at 9:17 PM Richard Orr <sup...@unpaywall.org> wrote:
Hi Rajath,

Thanks for getting in touch! Yes, covers thousands of institutional repositories. A full list is available here: https://unpaywall.org/sources. Some articles have links to PDFs, others only link to a landing page. A quick count shows that of the 21 million repository locations we know about, 14 million have PDF URLs. You can tell which is which by whether url_for_pdf is set in the oa_locations field in the data dump:

  "oa_locations": [
{
"endpoint_id": "ca8f8d56758a80a4f86",
"evidence": "oa repository (via OAI-PMH doi match)",
"host_type": "repository",
"is_best": true,
"license": null,
"pmh_id": "oai:arXiv.org:2003.00121",
"repository_institution": "Cornell University - arXiv",
"updated": "2020-04-26T11:38:48.101905",
"url": "http://arxiv.org/pdf/2003.00121",
"url_for_landing_page": "http://arxiv.org/abs/2003.00121",
"url_for_pdf": "http://arxiv.org/pdf/2003.00121",
"version": "submittedVersion"
}
  ],

You can get statistics about repository coverage from the locations as well. endpoint_id and repository_institution provide unique and human-readable repository names on which you can aggregate whatever metrics you'd like. We posted a fresh public snapshot yesterday, so if you grab that you'll be working with the latest data.

Best,

Richard Orr
Lead Developer, Unpaywall
OurResearch: We build tools to make scholarly research more open, connected, 
and reusable—for everyone.

On Mon, 27 Apr at 9:38 AM , Rajath <rajath...@gmail.com> wrote:
Hi Jason,

Hope this mail finds you well,

As I was studying on the implementation of oaDOI using its data dump, I had a quick question and thought I will reach out,

Does oaDOI cover open access articles at institutional repositories(IR)? I mean to ask, does oaDOI provide one-click PDF access for those OA articles that reside in IRs?

If yes, is it possible to provide statistics around that? Like which all IRs are covered and other metrics?

Thanks & Regards,
Rajath
1098:1048800

Richard Orr

unread,
May 13, 2020, 1:51:36 PM5/13/20
to Unpaywall discussion
Hey Rajath,

I just answered via sup...@unpaywall.org, but to briefly repeat here: the API limit is 100k requests/day, snapshots are updated at least every 6 months, and there is a subscription data feed for unpaywall.

Thanks!

Rajath

unread,
May 17, 2020, 3:25:27 AM5/17/20
to Richard Orr, Unpaywall discussion
Hi Richard,

Does unpaywall also cover open access theses and dissertations?

Regards,
Rajath

--
You received this message because you are subscribed to the Google Groups "Unpaywall discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unpaywall+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unpaywall/af36b288-6b2c-4839-a0f9-4e3fb22b262c%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages