Download specific category articles

232 views
Skip to first unread message

raGhav Sri Harsha

unread,
Mar 2, 2017, 12:07:32 AM3/2/17
to arXiv api
Hi All

For my research project I need all of physics, computer science articles. How can I download them per category? 

I am able to download bulk using s3, but they are not category specific.

Thanks
Sri Harsha

Thorsten S

unread,
Mar 2, 2017, 12:17:28 AM3/2/17
to arXiv api

you don't specify whether you need metadata, source, or PDF, but I assume you mean PDF.

bulk access to PDF is chunked by date. you will have to filter for desired categories on your end after downloading the bulk files via S3.

you can determine categories for any given identifier from a local copy of metadata obtained via OAI-PMH or via individual API query for a given identifier.

Cheers
T.

--
You received this message because you are subscribed to the Google Groups "arXiv api" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arxiv-api+unsubscribe@googlegroups.com.
To post to this group, send email to arxi...@googlegroups.com.
Visit this group at https://groups.google.com/group/arxiv-api.
For more options, visit https://groups.google.com/d/optout.

raGhav Sri Harsha

unread,
Mar 2, 2017, 12:18:25 PM3/2/17
to arXiv api
Hi Thorsten

Thank you for the clarity. As you have mentioned I need PDFs only. I am not clear on the metadata part? I downloaded the metadata from s3 bucket but is not full I guess. 
Where can I get the full xml file?

Thanks
Sri Harsha



On Thursday, 2 March 2017 10:47:28 UTC+5:30, Thorsten wrote:

you don't specify whether you need metadata, source, or PDF, but I assume you mean PDF.

bulk access to PDF is chunked by date. you will have to filter for desired categories on your end after downloading the bulk files via S3.

you can determine categories for any given identifier from a local copy of metadata obtained via OAI-PMH or via individual API query for a given identifier.

Cheers
T.
On Wed, Mar 1, 2017 at 8:08 PM, raGhav Sri Harsha <kiran.s...@gmail.com> wrote:
Hi All

For my research project I need all of physics, computer science articles. How can I download them per category? 

I am able to download bulk using s3, but they are not category specific.

Thanks
Sri Harsha

--
You received this message because you are subscribed to the Google Groups "arXiv api" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arxiv-api+...@googlegroups.com.

Thorsten S

unread,
Mar 2, 2017, 12:32:24 PM3/2/17
to arXiv api

Dear Sri Harsha,

please see https://arxiv.org/help/bulk_data and specifically the OAI-PMH part.

The manifest file for PDF download does not itemize categories, and for new style arXiv identifiers the primary-category cannot be derived from the identifier/file name of the PDF. Therefore and to also include cross-listed papers you will have to determine the categories of those entries from arXiv metadata or via API queries -- you can find discussions on that in previous postings here.

Cheers
T.

To unsubscribe from this group and stop receiving emails from it, send an email to arxiv-api+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages