Hi all, we are interested in using the text from creative commons-licensed arxiv articles as part of a dataset for machine learning research. We have been able to obtain the articles through the requester-pays bulk data access S3 buckets (
https://arxiv.org/help/bulk_data_s3) but the files in these buckets do not specify the license of a given article. The OAI API endpoint seems to return the license, but AFAICT it is rate-limited, and getting the license for every article on arxiv would take a very long time. Is there any bulk data access (via a requester-pays bucket) to arxiv article metadata, including the license?