Bulk dataset download || Full text

218 views

Skip to first unread message

Gourav Jha

unread,

Nov 22, 2022, 12:47:54 AM11/22/22

to arXiv API

Hi,

Can anyone help me out in getting the api for downloading the bulk dataset ,also when I am trying to copy from s3 it says access denied . Is there any paid subscription that we need to take ? we already have a paid AWS account

Many Thanks

Jim Entwood

unread,

Nov 22, 2022, 8:30:12 AM11/22/22

to arXiv API

Hi,

Full text options are:

AWS https://arxiv.org/help/bulk_data_s3 with the requestor pays bucket
Kaggle https://www.kaggle.com/Cornell-University/arxiv
Crawling our export service. https://arxiv.org/help/bulk_data#harvest
- This is recommended for new content or subset of content. Otherwise the AWS or Kaggle data sets are preferred. Rapid harvesting will likely trigger a block.

If you also need metadata the options are:

arXiv API https://arxiv.org/help/api
DataCite API https://support.datacite.org/docs/api using provider-id = arxiv
Kaggle https://www.kaggle.com/Cornell-University/arxiv

Best,

Jim

Reply all

Reply to author

Forward

0 new messages