Hi everyone
I am not completely sure if this discussion should be on the support mailing-list or the public googlegroup, but I think some of you could have valuable inputs in the discussion.
We are currently working on an analysis of the OA dynamics: for a given set of publications, what is the increase of the open-access rate between two observation dates?
To do so, we use multiple Unpaywall snapshots, from 2018 to now.
From one observation date T1 to another observation T2, later in time, we expect the open access rate to grow up because some of the closed publications in T1 have been archived on an open repository in the meantime.
However, we also detect other cases, like (list not exhaustive !)
1) cases where it seems Unpaywall got better at detecting OA between T1 and T2
ex: 10.1364/oe.25.013816 was seen closed as of 20201009 and open as of 20211201 (it is actually a gold OA article published in a DOAJ journal)
2) cases where it seems Unpaywall was right in T1 and not in T2
ex 10.1016/j.molmed.2017.09.008
3) swinging cases like 10.3917/anso.162.0351 This one was was detected:
closed as of the 20180927
open as of the 20191122
closed as of the 20201009
open as of the 20211201
In reality, it is actually OA.
4) cases where the publication was open by the publisher in the meantime, like
10.1016/j.jviscsurg.2016.09.006 (open after 1651 days according to crossref metadata)
In the 4 cases above, the 1st, 2nd, and 3rd are linked to the OA discovery itself (the actual openness of the publication may have not changed, only the Unpaywall diagnosis has). The 4th case is an interesting one as it reflects a real change in the openness of the paper.
I am pretty sure the overall trends we observe thanks to Unpaywall data are correct, but as soon as one starts digging down to a more specific perimeter, this fuzziness makes it more difficult to have a clear understanding of what is actually going on.