Changing 10M records so Unpaywall data is same as OpenAlex

36 views
Skip to first unread message

Casey Meyer

unread,
Nov 24, 2025, 5:47:04 PM (13 days ago) Nov 24
to Unpaywall announcements

We’re changing up to 10 million Unpaywall records on Monday, Decemeber 1st, so that Unpaywall fully matches what you see in OpenAlex. Right now Unpaywall does some minor calculations to get open access status, journal information, and locations. We’re removing the last bit of that, so that Unpaywall pulls directly from OpenAlex and is a very light transformation on top of OpenAlex works. The changes you will see are:

  • 500k changes to oa_status or is_oa, with most going from gold to hybrid, or closed to green
  • 500k changes to either journal_name, issns, journal_is_oa, or journal_is_in_doaj
  • 1M changes to best_oa_location due to taking best_oa_location directly from OpenAlex
  • 8M changes to oa_locations due to including datacite and DOAJ, and accepting the default sort from OpenAlex
  • Around 2M changes to is_paratext due to taking this field directly from OpenAlex

This change is necessary so we can keep things simple. Our goal is that Unpaywall is a subset of OpenAlex data, with the data in that subset matching exactly in the two systems. The only notable exception is that “diamond” open access in OpenAlex will be “gold” in Unpaywall. Let us know if you have any questions!

Thanks,
Casey

Reply all
Reply to author
Forward
0 new messages