OpenAlex API features, improvements, and changes

182 views
Skip to first unread message

Casey Meyer

unread,
Aug 25, 2022, 11:24:13 AM8/25/22
to OpenAlex users
Good morning,

We've been BUSY improving OpenAlex and I have several improvements to announce. We really appreciate all the feedback, so keep it coming!

Features

OpenAlex now has fulltext search across 57.3 million works, courtesy of n-grams (words, numbers, phrases in an article) provided by The General Index. We have a blog post coming soon, but here are some details.

Fulltext is already available via the main search parameter:

https://api.openalex.org/works?search=dna

With this method it receives less weight than title and abstract. But it's going to find words or phrases deep within articles, that title and abstract sometimes do not cover. Want to search only the works that contain fulltext? Go for it! You can also filter works to see only those that contain fulltext.

What about the n-grams themselves? Those are available too! Each work has an ngrams_url field that will lead you to n-grams if they exist for that work:

https://api.openalex.org/works/W2063450941/ngrams

You can look up n-grams by DOI as well:

https://api.openalex.org/works/10.1016/s0022-2836(75)80083-0/ngrams

This is a huge improvement for search, often boosting search results by 30x. Coverage across works is discussed here if you want to learn more.

Improvements

We have added these new convenience filters to works:

As well as these improvements:
  • The filter raw_affiliation_string.search now searches within strings, so queries such as “france” return records with "parisFrance".
  • You can filter by ID fields in works, such as "openalex", which is useful for querying multiple IDs with the OR operator. We are adding this to the other entities soon.
  • We fixed a bug in authors autocomplete that affected queries with two periods in a name.
  • We fixed a bug to ensure sort order remains the same when searching and paging with cursors.
Changes
  • The default sort order for works is now cited_by_count descending, rather than publication_date descending
  • The author hint for the author autocomplete endpoint is now sorted by most cited rather than most recent. We're still working on this and will likely tweak it to prefer newer, highly-cited works.
We care about API stability and try to limit changes to existing features. Our goal is to announce these kinds of changes in advance moving forward.

Hope you are enjoying OpenAlex!

Thanks,
Casey

--
Casey Meyer
Developer - OpenAlexUnpaywall
OurResearch: We build tools to make scholarly research more open, connected, and reusable—for everyone.

Sol Lederman

unread,
Aug 25, 2022, 11:41:34 AM8/25/22
to OpenAlex users
This is a great development, Casey. Thank you! Will ngrams be available in OpenAlex dumps, or only via the API? I get that that would make the dumps a lot bigger and I can imagine many use cases of mining that data in bulk. Thanks!

Casey Meyer

unread,
Aug 25, 2022, 12:31:05 PM8/25/22
to OpenAlex users
You're welcome! The plan right now is the ngrams are only available via the API. The data export would be very large so we're not planning to host it. However, since the ngrams don't change we cached them with Cloudflare, so there should not be a problem scrolling through a lot of the data if needed.

Casey 

Sol Lederman

unread,
Aug 25, 2022, 1:41:46 PM8/25/22
to OpenAlex users
Awesome! Thank you Sir.
Reply all
Reply to author
Forward
0 new messages