Managing a very large call - using Library(openalexR)

50 views
Skip to first unread message

Laura Bredahl

unread,
Mar 13, 2025, 3:21:12 PMMar 13
to OpenAlex Community
Hello,

I am about to run a call that I expect to pull about 1M-1.6M works. 

I've tried using "cursor" and "page" in the the oa_fetch() filters but I am pretty sure that openalexr doesn't support this. 

Does anyone have some advice on how to page through results so they don't crash my system when running the call(s).

Alternative, should I just send it, and cross my fingers...maybe 1.6M isn't that bad? I think it would equate to 5000-8000 calls to the openalex api. 


Best,

Laura

Rainer M. Krug

unread,
Mar 13, 2025, 3:37:06 PMMar 13
to Laura Bredahl, OpenAlex Community
Hi Laura

Been there - 4.5 million in my case. I used a modified version of oa_request() which stored each page individually. 

But there might be an easier solution: oa_fetch() has a new allowed value gir output, namely raw. This returns the raw json returned. This is the least memory intensive option. I would just try that option and if it doesn’t work let me know and I can give you my modified version

Cheers

Rainer 

Ps I am trying for a long time to convince the op  Ed nakexR maintainers to include the option to save the individual responses but no luck so far. 



Sent from my iPhone

On 13 Mar 2025, at 20:21, Laura Bredahl <laura....@gmail.com> wrote:

Hello,
--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/openalex-community/1456cd2f-9e9c-4656-be5c-8c1aff912b22n%40googlegroups.com.

Rainer M. Krug

unread,
Mar 13, 2025, 3:47:03 PMMar 13
to Laura Bredahl, OpenAlex Community
The ps again - sorry about the typos

Ps I am trying for a long time to convince the openalexR maintainers to include the option to save the individual responses but no luck so far. 

Sent from my iPhone

Gabor Schubert

unread,
Mar 14, 2025, 3:35:14 AMMar 14
to OpenAlex Community
Hi Laura,

I am usually dealing with smaller chunks of data, so I can use the REST API with scripts, but as I see you are trying to get 10 years of data with one call (from_publication_date = "2015-01-01", to_publication_date = "2024-12-31") . It would be probably safer to to run 10 calls after each other: one for each year and then merge the results locally. Then you don't have to deal with a million works only ca. 100k per call.

Gabor

Trang Le

unread,
Mar 14, 2025, 6:05:39 PMMar 14
to OpenAlex Community
Hi Laura —

I'm one of openalexR maintainers. Thank you for using openalexR!

First off, what you want is the "pages" argument in oa_fetch, not "page". That will fix your error.

Rainer had a good point that specifying either output = "list" or output = "raw" might be faster in your case because then you don't need to convert each result to tibble every time you try to write out a result. If you still want the tibble result, you can combine the result files later and call oa2df(). Also agree with Gabor that chunking the call by year may be helpful.

One lesser known function is oa_generate, which gives you a generator for making request to OpenAlex API Returns one record at a time. The documentation provides some examples for you to try out. This vignette shows how you can save the output in batches with oa_generate.

I hope that helps! Please don't hesitate to submit an issue directly on our GitHub page. (Best if you can provide a small enough example for us to test out what you were expecting to achieve.)

Cheers,
Trang
Reply all
Reply to author
Forward
0 new messages