I was looking at how DOAJ is integrated in both Unpaywall and OpenAlex, and noticed some differences I had not realized before - which raised some questions for me. This might be a long post :-)
1) In Unpaywall, 'journal-is-in-doaj' is a variable (T/F)
at article level, and appears to be linked the year
the journal started to publish all content using an open license (which is a field in DOAJ's own journal metadata).
Thus for a given journal, the proportion of articles with the label 'journal-is-in-doaj' may vary between 0% and 100% - see chart below for all journals with at least 1 article where 'journal-is-in-doaj' is true (n= 23,694).
.png?part=0.4&view=1)
As an example, here's the breakdown for
Journal of Biomedical Science (a BMC journal), showing that from 2009 onwards, all articles have the label 'journal-is-in-doaj' = true, while prior to that, this is false for all articles:
In contrast, in OpenAlex, 'is_in_doaj' is a variable assigned at journal level, (specifically, it'a a variable of the entity 'sources', with sources linked to articles through the variable 'locations')
Probably as a result of this, It appears that in OpenAlex, the year the journal first became OA is not taken into account at article level - see the overall plot and the specific example below, but now for OpenAlex:
Journal of Biomedical Science (BMC):
This gets complicated real fast when also looking at the difference in total number of journals, but without going into that, the main point is that Unpaywall data allow more fine-grained assessment of whether an article can/should be countied as published in a DOAj-journal at time of publication, while in OpenAlex, the information on DOAJ-journals is more binary, and thus, less useful for longitudinal analysis.
I wondered whether there are plans to bring Unpaywall data and OpenAlex data more closely together in this respect? (and if so, hopefully, in the direction of keeping the more fine-grained data, although I realize that's harder in the OpenAlex data structure?)
I hope this was all somewhat clear :)