Record location linking

34 views
Skip to first unread message

Ed Summers

unread,
Sep 10, 2024, 11:37:03 PM9/10/24
to openalex-...@googlegroups.com
One (awesome) thing I’ve noticed in OpenAlex data is that you can sometimes find if a given work is available from multiple locations. For example this one is available both as a published article in Contemporary Mathematics as well as a preprint from arXiv (look at the locations property):

https://api.openalex.org/works/W1685360256

I took a look in the documentation, albeit somewhat quickly, to try to learn more about how this matching works and I ran across this description [1]:


We get information about scholarly works as records. A record can take several forms. It may be an item of Crossref metadata; an entry from a repository like arXiv, Pubmed, or an institutional repository; or publicly available information on the internet. A record contains information about a work, so our first task whenever we get a new record is to determine if the work already exists in our system. If we are able to link it to an existing work—using a DOI or other metadata matching technique—then we use the information in the record to enrich that work.


I was wondering if there is any more information to learn about how this linking process works, or if I could get a pointer to the relevant code on GitHub where this is happening?

Thanks for any information you can provide, and for a super resource!

//Ed

[1] https://help.openalex.org/hc/en-us/articles/24347019383191-Where-do-works-in-OpenAlex-come-from

Samuel Mok

unread,
Sep 11, 2024, 6:52:20 AM9/11/24
to Ed Summers, openalex-...@googlegroups.com
Hi Ed,

AFAIK, there are several places in the code where work matching is done. I gave a quick overview of the code in an email in this group in June. I'll repeat the most important parts here:

The main matching statements (SQL embedded in python):
image.png
Links to the relevant scripts that contain the matching code:

Cheers,
Samuel

--
You received this message because you are subscribed to the Google Groups "OpenAlex Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openalex-commun...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openalex-community/AC632715-C740-40D9-A6DE-3DA8E487E77A%40pobox.com.

Ed Summers

unread,
Sep 11, 2024, 12:18:28 PM9/11/24
to OpenAlex Community
Thanks Samuel, this is super helpful! My apologies for not finding the previous discussion when I did a quick search.

//Ed
Reply all
Reply to author
Forward
0 new messages