Hi Thorsten,
I appreciate the difficulty in getting complete and correct information for all authors and for all papers, but for the purpose of visualising new submissions, I don't need that much.
What I'm essentially doing is visiting the 'new' page of an archive (like
https://arxiv.org/list/astro-ph/new) and plotting those papers on a map based on the affiliation of the first author.
Looking down that list of new submissions, the vast majority of the papers submitted were submitted by the first author. If the first author submitted the paper, then they have an arxiv account and they have filled in the required "organization" field that arxiv makes new users fill out when creating an account. That's the information I'd like to have access to through the API.
I'm happy to trust that the majority people will update this field if they are active on arxiv and I'm okay with there being a few errors or outdated affiliations --the purpose of my visualisation is an alternative way to browse new arxiv submissions, and I don't need absolutely correct information.
Unfortunately, checking authors manually against ORCID, the majority of them do not have ORCIDs.
For now I think exposing the "organization" field from user's accounts would work best. It would also save me having to parse PDFs for author affiliations, even though most (every) paper submitted to arXiv has affiliation information.
The only other option I could see is parsing the source TeX files. If there's a way to get just the TeX (without the associated images) then that should be efficient with the bandwidth.
Cheers,
Mohammed