Hi
I would like to announce a new package for working with OpenAlex: openalexPro. It supplements the really good openalexR package. You might ask "Why do we need a second package to access OpenAlex if the existing one is really good?", and this is a valid question.
openalexR works perfectly for smaller datasets downloaded from OpenAlex (into the thousands), but breaks down in performance and finally crashes with large dataswets as it is doing all processing in memory. openalexPro takes a different approach, in that it is completely based on file based on disc processing - therefore is the size of retrieved datasets effectively only limited by drive space, and I used a pre-alpha version to retrieve 4.5 million records over more then a day without problems.
It also leaves the user more freedom, in that it saves all responses received from the OpenAlex API into json files, which are then furthar processed and at the end stored in a parquet database. For details see the documentation on github or on r-universe.
The plan is to submit it to ropensci and to publish it there. Before doing this, I would like to get some feedback and thoughts on the functionality and workflows. Ideas, suggestions, feature suggestions are welcome, although the plan is to submit ti with this basic functionality to ropensci and afterwards implement more features.
So please submit issues and feature request here: https://github.com/rkrug/openalexPro/issues or start discussions here https://github.com/rkrug/openalexPro/discussions
One additional point for further development: the plan is to develop a kind of a small ecosysstem of add on packages to openalexPro for e.g. plotting (openalexPlot), conversion into other formats (e.g. bibtex) (copenalexConvert), etc. The aim is to keep these packages compatible with openalexR format as well.
So - please try it out, let me know what you think, if there is functionality you are missing, if things do not work, etc.
Looking forward to the discussions and comments.
Rainer