As of today, the Unpaywall API and dataset come from a new, completely rewritten codebase. For most users, there’s no action you need to take.
What do you need to do?If you’re just using the API: nothing.
If you’re using the data feed: download and reload this new snapshot, overwriting your old data with the new dataset. We’re sorry for the inconvenience, but there are too many changes small to fit in a normal changefile.
What schema changes are there?Just two tiny changes:
Both keys are still there (for now), but the value is always set to the string “deprecated”.
Why is this happening?We wrote Unpaywall nearly a decade ago, and over that time we accrued a lot of technical debt. The interest on that debt got too high:
The point of the rewrite is to fix these issues.
What data changes are there?Overall, the data isn’t changing much…we put a lot of effort into making sure that on aggregate, metrics from the dataset (like percent of works that are gold open access, or number of works with licenses, and so fort) change less than 5%—and if they change, it’s mostly in the direction of higher accuracy. However, 5% of 150M is still a big number, so you will see a lot of individual changes.
The reality is that Unpaywall is always building on metadata with, shall we say, “diverse levels of accuracy.” We’re the most accurate OA index out there because we work hard to be, but as Daniel Day Lewis might say, There Will Be Bugs. The great thing about the new system is: it’ll let us to address these bugs more quickly. Which brings us to:
What’s the future of Unpaywall?The new system cuts the average API response time by 90%, from 500ms to 50ms. ⚡👍
As always, we’d love to hear your feedback, and thanks for your support!
Best,
Jason