DOAJ in OpenAlex vs Unpaywall

164 views
Skip to first unread message

Bianca Kramer

unread,
Mar 28, 2024, 6:03:48 PM3/28/24
to OpenAlex Community
Hi all,

Very happy with this new group, thanks for setting it up! 
Also taking advantage and posting this again in this group :) 


I was looking at how DOAJ is integrated in both Unpaywall and OpenAlex, and noticed some differences I had not realized before - which raised some questions for me. This might be a long post :-) 

1) In Unpaywall, 'journal-is-in-doaj' is a variable (T/F) at article level, and appears to be linked the year the journal started to publish all content using an open license (which is a field in DOAJ's own journal metadata).

Thus for a given journal, the proportion of articles with the label 'journal-is-in-doaj' may vary between 0% and 100% - see chart below for all journals with at least 1 article where 'journal-is-in-doaj' is true (n= 23,694). 

image (6).png

As an example, here's the breakdown for Journal of Biomedical Science (a BMC journal), showing that from 2009 onwards, all articles have the label 'journal-is-in-doaj' = true, while prior to that, this is false for all articles: 

image (7).png

 In contrast, in OpenAlex, 'is_in_doaj' is a variable assigned at journal level, (specifically, it'a a variable of the entity 'sources', with sources linked to articles through the variable 'locations') 

Probably as a result of this, It appears that in OpenAlex, the year the journal first became OA is not taken into account at article level - see the overall plot and the specific example below, but now for OpenAlex: 

image (8).png

Journal of Biomedical Science (BMC):

image (9).png

This gets complicated real fast when also looking at the difference in total number of journals, but without going into that, the main point is that Unpaywall data allow more fine-grained assessment of whether an article can/should be countied as published in a DOAj-journal at time of publication, while in OpenAlex, the information on DOAJ-journals is more binary, and thus, less useful for longitudinal analysis. 

I wondered whether there are plans to bring Unpaywall data and OpenAlex data more closely together in this respect? (and if so, hopefully, in the direction of keeping the more fine-grained data, although I realize that's harder in the OpenAlex data structure?)

I hope this was all somewhat clear :) 

kind regards, 
Bianca 
Sesame Open Science
Reply all
Reply to author
Forward
0 new messages