Downloading tex files from s3 *only* for a specific category: e.g. cs.CC

57 views
Skip to first unread message

Johnathan Mercer

unread,
Jul 4, 2023, 7:33:41 PM7/4/23
to arXiv API
Hi,

I'm trying to get tex source files for all papers in a specific category. 

From what I've read, we are required to download the tar files and do the filtering on our end. 

Can someone confirm this if this is true? Any suggestions - if possible- of a more efficient way to only download all tex files for category==cs.CC?

Best,
John

Jake Weiskoff

unread,
Jul 5, 2023, 9:39:54 AM7/5/23
to arxi...@googlegroups.com
Hi John,

The S3 buckets aren't really intended for a by-category harvest (it's not their intent), but you could populate a list of the papers you need to download and look for them within the buckets... or you could use the export.arxiv.org system and harvest the source from there. The export nodes are setup specifically for custom programmatic harvests. See: 


for some more specific information. 

Regards,
-Jake Weiskoff

arXiv Technical Support Manager 


--
You received this message because you are subscribed to the Google Groups "arXiv API" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arxiv-api+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arxiv-api/ef278acd-a023-4d57-869c-1b6cbdade82fn%40googlegroups.com.

Johnathan Mercer

unread,
Jul 5, 2023, 3:03:37 PM7/5/23
to arXiv API
Thanks Jake!
Reply all
Reply to author
Forward
0 new messages