At the risk of muddying the waters, you might consider taking advantage of a data set called CORD:
https://www.semanticscholar.org/cord19
The data set was created from many sources, and it includes about 100,000 records. The data set includes at least two parts: 1) a zip file containing the full texts of articles all on COVID-19, and 2) a metadata (CSV) file providing bits of metadata describing the full texts. One of the fields in the metadata file is arXiv_id. Thus, you can get a list of articles coming from arXiv.
HTH.
--
Eric Morgan
University of Notre Dame