Wow, thank you so much.Frankly, this explanation is the best gift I have received I can say. This literally details me on what OAI-PMH is all about or how it works.Thanks much, Richard!!!Thanks & Regards,Rajath C SOn Tue, May 12, 2020 at 11:01 PM Richard Orr <sup...@unpaywall.org> wrote:1098:1048800Hi Rajath,Not a naïve question at all. I'll see if I can help.The basic pattern in OAI-PMH harvesting is calling GetRecords with some set of filters, then using the provided token to get all the records that match your filters. For example, if you callYou'll get records modified between 2020-01-01 and 2020-01-10. They don't fit into one page, so the response includes the token<resumptionToken completeListSize="6947" cursor="0">
2020-01-07T03:05:30Z!2020-01-10!!oai_dc!1172!6947!oai:union.ndltd.org:DRESDEN/oai:qucosa:de:qucosa:36938
</resumptionToken>Calling http://union.ndltd.org/OAI-PMH/?verb=ListRecords&resumptionToken={TOKEN} until there are no more tokens will get all the results. Sickle does this automatically when you dosickle = Sickle('http://union.ndltd.org/OAI-PMH/')
records = sickle.ListRecords(**{'metadataPrefix': 'oai_dc', 'from': '2020-01-01', 'until': '2020-01-10'})You can then iterate over the records by calling records.next(). Sickle parses each XML result into header and metadata dictionaries. You can see how we extract the record data here: https://github.com/ourresearch/oadoi/blob/master/pmh_record.py#L166 (the pmh_input_record argument is an element of a Sickle record set). If you're making the requests yourself without Sickle you'll need to use an XML library to do this.From there, it all depends what you need to do with the data. If you want to find repository landing pages, they are most often linked in <dc:identifier> elements. Scrolling through a few pages of records should give you an idea of what else you can find.Best,Richard OrrLead Developer, UnpaywallOurResearch: We build tools to make scholarly research more open, connected,and reusable—for everyone.On Mon, 11 May at 1:33 AM , Rajath <rajath...@gmail.com> wrote:Hi Richard,Hope this mail finds you well,I have seen your code using Sickle python library to work with OAI-PMH,I am new to OAI-PMH and I was figuring out how to download content using an OAI-PMH XML. Example: I have the NDLTD PMH (http://union.ndltd.org/OAI-PMH/) how can I query this XML to download/use the complete contents of NDLTD?I know the above is a very naïve question but any pointer to understanding the mechanism of OAI-PMH will be of great help.Thanks & Regards,Rajath
On Tue, Apr 28, 2020 at 9:28 PM Rajath <rajath...@gmail.com> wrote:
Thanks a lot for this detailed information and explanation Richard. I will certainly explore this.This is very helpful, thanks much again for the great work you all are doing at Unpaywall and hope you all are keeping well amid the COVID-19 situation,Regards,Rajath C S
On Tue, Apr 28, 2020 at 9:17 PM Richard Orr <sup...@unpaywall.org> wrote:
Hi Rajath,Thanks for getting in touch! Yes, covers thousands of institutional repositories. A full list is available here: https://unpaywall.org/sources. Some articles have links to PDFs, others only link to a landing page. A quick count shows that of the 21 million repository locations we know about, 14 million have PDF URLs. You can tell which is which by whether url_for_pdf is set in the oa_locations field in the data dump:"oa_locations": [
{
"endpoint_id": "ca8f8d56758a80a4f86",
"evidence": "oa repository (via OAI-PMH doi match)",
"host_type": "repository",
"is_best": true,
"license": null,
"pmh_id": "oai:arXiv.org:2003.00121",
"repository_institution": "Cornell University - arXiv",
"updated": "2020-04-26T11:38:48.101905",
"url": "http://arxiv.org/pdf/2003.00121",
"url_for_landing_page": "http://arxiv.org/abs/2003.00121",
"url_for_pdf": "http://arxiv.org/pdf/2003.00121",
"version": "submittedVersion"
}
],You can get statistics about repository coverage from the locations as well. endpoint_id and repository_institution provide unique and human-readable repository names on which you can aggregate whatever metrics you'd like. We posted a fresh public snapshot yesterday, so if you grab that you'll be working with the latest data.Best,Richard OrrLead Developer, UnpaywallOurResearch: We build tools to make scholarly research more open, connected,and reusable—for everyone.On Mon, 27 Apr at 9:38 AM , Rajath <rajath...@gmail.com> wrote:Hi Jason,
Hope this mail finds you well,
As I was studying on the implementation of oaDOI using its data dump, I had a quick question and thought I will reach out,
Does oaDOI cover open access articles at institutional repositories(IR)? I mean to ask, does oaDOI provide one-click PDF access for those OA articles that reside in IRs?
If yes, is it possible to provide statistics around that? Like which all IRs are covered and other metrics?
Thanks & Regards,
Rajath
--
You received this message because you are subscribed to the Google Groups "Unpaywall discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unpaywall+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unpaywall/af36b288-6b2c-4839-a0f9-4e3fb22b262c%40googlegroups.com.