Swinging OA status

97 views
Skip to first unread message

ejean...@gmail.com

unread,
Jan 28, 2022, 9:56:35 AM1/28/22
to Unpaywall discussion
Hi everyone

I am not completely sure if this discussion should be on the support mailing-list or the public googlegroup, but I think some of you could have valuable inputs in the discussion.
 
We are currently working on an analysis of the OA dynamics: for a given set of publications, what is the increase of the open-access rate between two observation dates?
To do so, we use multiple Unpaywall snapshots, from 2018 to now.

On a large perimeter (we are studying a set of 1.3 million DOIs), the results are pretty smooth: https://frenchopensciencemonitor.esr.gouv.fr/integration/en/publi.general.dynamique-ouverture.chart-evolution-proportion where each line is based on a different Unpaywall snapshot ( graph extracted from the new FrenchOpenScienceMonitor.esr.gouv.fr ). 

From one observation date T1 to another observation T2, later in time, we expect the open access rate to grow up because some of the closed publications in T1 have been archived on an open repository in the meantime.

However, we also detect other cases, like (list not exhaustive !)

1) cases where it seems Unpaywall got better at detecting OA between T1 and T2
ex: 10.1364/oe.25.013816 was seen closed as of 20201009 and open as of 20211201 (it is actually a gold OA article published in a DOAJ journal)

2) cases where it seems Unpaywall was right in T1 and not in T2
ex 10.1016/j.molmed.2017.09.008
https://api.oadoi.org/v2/10.1016/j.molmed.2017.09.008?email=unpa...@impactstory.org indicates it is OA whereas it does not seem to be (as of today)

3) swinging cases like 10.3917/anso.162.0351 This one was was detected:
closed as of the 20180927
open as of the 20191122 
closed as of the 20201009
open as of the 20211201
In reality, it is actually OA.

4) cases where the publication was open by the publisher in the meantime, like
10.1016/j.jviscsurg.2016.09.006 (open after 1651 days according to crossref metadata)

In the 4 cases above, the 1st, 2nd, and 3rd are linked to the OA discovery itself (the actual openness of the publication may have not changed, only the Unpaywall diagnosis has). The 4th case is an interesting one as it reflects a real change in the openness of the paper.

I am pretty sure the overall trends we observe thanks to Unpaywall data are correct, but as soon as one starts digging down to a more specific perimeter, this fuzziness makes it more difficult to have a clear understanding of what is actually going on.
 
I was wondering if you guys at Unpaywall or folks in the community already had a look at these swinging OA statuses?

Cheers,

Eric Jeangirard


Richard Orr

unread,
Feb 3, 2022, 1:58:00 PM2/3/22
to ejean...@gmail.com, Unpaywall discussion
Hi Eric,

There's a lot to get into here, so to jump straight to your question - have we had a look at these swinging OA statuses - no, but we should start.

It's interesting to see you separate these into groups where something about Unpaywall changed (1,2,3), and the article's real status changed (4). The groups I see first are the ones where we're right (1,4) and where we're wrong (2,3).

Case 2 is due to a bug I introduced between snapshots. I fixed it today. It didn't affect enough articles to show up in our statistical monitoring or individual article test cases. It's a good example of why we appreciate bug reports. There will be bugs on this scale, and we'll fix them.

Case 3 is the interesting one to me. The swinging OA status itself is a sign that something is wrong. The OA status history since we started tracking changes is:

2019-11-19  Bronze
2020-02-04  Closed
2021-01-24  Bronze
2021-03-21  Closed
2021-09-04  Bronze
2021-12-18  Closed

That's a pretty clear sign of a Bronze article that our process fails to detect sometimes. This is something we can look for proactively, and either fix our process or use our knowledge of the OA status history to stabilize it.

From an Unpaywall maintenance perspective, we expect to keep finding apparently-new OA articles that have been OA all along as our process improves. Case 1 is purely good from that perspective: either DOAJ got more accurate, our interpretation of DOAJ got more accurate, or the journal flipped. From the perspective of an OA researcher, I agree it would be good to know the reason for the change. At the journal level, this might be something we can investigate in the future. At the article level, I think it will always be hard to tell automatically whether each change is attributable to our process or to the article's real status.

Best,
Richard

--
You received this message because you are subscribed to the Google Groups "Unpaywall discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unpaywall+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unpaywall/54cf097b-58c5-4a8c-a1b2-8eebae1e7657n%40googlegroups.com.


--
Richard Orr
Lead Developer, Unpaywall
OurResearchWe build tools to make scholarly research more open, connected, and reusable—for everyone.

Bianca Kramer

unread,
Feb 5, 2022, 6:30:36 PM2/5/22
to Richard Orr, ejean...@gmail.com, Unpaywall discussion
Hi Eric, Richard,

Not per se a contribution to the sleuthing at hand, but this discussion on tracking changing OA status reminded me of an analysis I did a little while ago as part of my tracking of the papers in the PMC Public Health Emergency (PHE) collections (where publishers make Covid-related papers permanently or (mostly) temporarily available in PMC). 

For my most recent analysis in November 2021, I also looked at changes in oa status. I still need to add this to the Github repo (and/or write it up in a blog post), but I thought you might appreciate the pretty pictures here :)

The Sankey diagrams show the oa status (acc to Unpaywall) of records in the PMC  PHE collections in subsequent Unpaywall snapshots. for the four largest publishers represented in the PMC PHE collections (totals ranging from 77K for Elsevier to 11K for OUP).

The most interesting to me in this context are the switches from green to bronze and v.v., as they might reflect changes in availability on the publisher platform (or detection thereof!) of COVID-related papers. But I do need to think through the methodological aspects a bit more, too. 

Anyway, pretty picture attached and copy-pasted below, and a prompt for myself to do something with these data :)

image.png
kind regards,
Bianca








Op do 3 feb. 2022 om 19:58 schreef Richard Orr <ric...@ourresearch.org>:
PMC_PHE_UPW_Sankey.png

Bianca Kramer

unread,
Feb 5, 2022, 6:33:50 PM2/5/22
to Richard Orr, ejean...@gmail.com, Unpaywall discussion
BTW Really nice to have longitudinal OA data by observation date in the French monitor, too. Great work Eric and your team! 

Op zo 6 feb. 2022 om 00:29 schreef Bianca Kramer <bianca...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages