Full text options are:
AWS https://arxiv.org/help/bulk_data_s3 with the requestor pays bucket
Crawling our export service. https://arxiv.org/help/bulk_data#harvest
This is recommended for new content or subset of content. Otherwise the AWS or Kaggle data sets are preferred. Rapid harvesting will likely trigger a block.
arXiv API https://arxiv.org/help/api
DataCite API https://support.datacite.org/docs/api using provider-id = arxiv