Hello OpenAlex Community!
We have two new exciting features to announce!
We are now using full text from PDFs of Open Access publications to power searches. This has greatly improved our full text search coverage, especially among recent works.
When possible, we index the full text of works in addition to the title and abstract. This allows you to search using all three of these (title, abstract, and full-text). Until now, we have gotten full text from works using the n-grams that we obtained from the Internet Archive’s General Index. While this has been great, the coverage unfortunately cuts off abruptly for works published in 2020 or later. To remedy this, we have started to index the full text from any PDFs we are able to download from Open Access works.
We’re introducing a new data attribute and filter: “has_fulltext,” which indicates whether or not we have indexed the work’s full text for search, either from the pdf, or from the n-grams. Another data attribute—”fulltext_origin”—indicates whether the full text was obtained from the PDF (value: ‘pdf’) or from the n-grams (value: ‘ngrams’). [see documentation]
Keep in mind that this is all about indexing the full text on our end to power the searches that you can do. This is separate from whether you can actually obtain the full text—if you want to do that, you’re going to want to look in the work’s “open_access.oa_url” attribute.
Here is a graph showing what our full text coverage now looks like [api call]:
And here is a graph showing our coverage of full text specifically obtained from pdfs ("fulltext_origin:pdf"):
You can now use cursor pagination to get all of the grouped results when doing group-by queries. [documentation link]
When using “group_by” to get groups of entities, the grouped results are returned with a maximum page size of 200. Previously this meant that—in cases where there were more than 200 groups—you could only get the top 200. You can now use cursor pagination when doing group_by queries, similar to how you have been doing when getting lists of entities. This lets you do things like get all of the institutions that your university collaborates with, rather than just the top 200. Please check out the documentation to learn more: https://docs.openalex.org/how-to-use-the-api/get-groups-of-entities#paging
That’s all for now, but stay tuned because we have some more big announcements coming up!
Cheers,
OpenAlex team