It does makes sense that changing the batchsize on your end makes no real difference, as OpenAlex has a hard cap of max 50 ids per query. This can also be found in the source code for oa_fetch, by the way.However, 3 hours is way slower than the max API rate: because that's a generous 10 requests per second (max 100k requests a day) -- with 50 items per request, that's 500 items per second, so if you have zero delay/latency/processing time it would take you 10 minutes to retrieve your data from OpenAlex. For a more realistic reference: I just pulled 30k works from the API in about 6 minutes which extrapolates to 45 minutes for 240k works. My script runs on python and stores the results locally in mongodb, which are definitely not the fastest solutions possible; so higher speeds should definitely be possible!So, I suggest to look into optimizing your code! Asynchronous api calls & data storage help a lot in speeding things up, and if storing/ingesting data is a bottleneck for you maybe try limiting the returned fields to only those you need, using a more performant data storage solution like duckplyr to use duckdb as a drop-in replacement for the standard dplyer stack; etc.Cheers,Samuel
--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-community/af9f5d47-e323-4284-b649-59065b97efcfn%40googlegroups.com.