Deprecating oa_date in the API and snapshot

256 views
Skip to first unread message

Casey M

unread,
Dec 13, 2023, 2:42:09 PM12/13/23
to Unpaywall discussion
Good afternoon,

We're updating some parts of the Unpaywall software, which will lead us to remove the field "oa_date" that is provided in the API and snapshot.

We plan to make this change on January 5th. Of course this value can still be found in old versions of snapshots. But starting January 5th it will no longer be included.

Thanks,
Casey

--
Casey Meyer, CTO
OurResearchWe build tools to make scholarly research more open, connected, and reusable—for everyone.

Bianca Kramer

unread,
Dec 13, 2023, 5:08:13 PM12/13/23
to Casey M, Unpaywall discussion
Hi Casey,

Thanks for the update - could you share a bit more about the reason behind deprecating oa_date?

It's a real loss for the use case of identifying (with all due caveats of course) observed (vs required/allowed) embargoes for repository-based OA, which, to my knowledge is a feature uniquely enabled by Unpaywall data.

I could not make out from your message whether the field will be deprecated entirely, or removed from the API and snapshot but still be available from the data feed, so any clarification you would have on that would also be welcome. 

Apologies for being a bit curmudgeonly about this... it's one of my favourite variables in Unpaywall... ;-)

kind regards,
Bianca


--
You received this message because you are subscribed to the Google Groups "Unpaywall discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unpaywall+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unpaywall/b060f440-04b5-410b-bdfc-421fab7c3d92n%40googlegroups.com.

dan.p...@deltathink.com

unread,
Dec 14, 2023, 5:24:53 AM12/14/23
to Unpaywall discussion
+1 to Bianca's question :-)

Casey Meyer

unread,
Dec 14, 2023, 3:32:49 PM12/14/23
to dan.p...@deltathink.com, Unpaywall discussion
Hi Bianca,

Yes it's completely fair to want more of an explanation. We mean that the field would be deprecated entirely, so not available for the API, snapshot, or data feed.

We're doing a big refactor which will make it easier to choose the best parts of different data sources, so that Unpaywall and OpenAlex have an even higher quality, single representation of an article. What we've realized is it's going to take a lot of work to move oa_date into this refactor. We receive lots of support requests but have seen very few discussing oa_date. So our sense is that it's not used nearly as often as other features, so it's best to spend that time improving other areas of Unpaywall/OpenAlex.

The code that powers oa_date is completely open and can still be referenced in case we want to come back to this later.

Thanks,
Casey

Jonge, H. de [Hans]

unread,
Dec 14, 2023, 4:17:25 PM12/14/23
to Casey Meyer, dan.p...@deltathink.com, Unpaywall discussion
Dear Casey
Thanks for your explanation. 
I still think it would really be a pitty to depreciate oa_data. At the Dutch Research Council NWO we have been using this data point for several years now in our annual OA reports to monitor the delay with which research funded by our council becomes OA. Not with the aim to check compliance on the level of individual researchers or penalise our grantees but simply because we are genuinely interested in the effectiveness of our zero embargo OA policy and development of embargo policies of publishers. And I could emagine there are many more funding agencies that are interested in this kind of data (especially when they are openly available). And with the rise of rights retention policies implemented at various institutions but especially the zero embargo policy outlined by the Nelson memo in the US I could emagine a lot of institutions and agencies are interested, not to say dependent on this kind of data. See our most recent report as example: https://doi.org/10.5281/zenodo.10024149
I can not judge how much oa_date is being debated in comparison to other UPW and OA feautures. But can it also be so simple that this data is just so highly appreciated that it does not need any discussion? We have valued this data enormously! 
All best regards and please let me assure your that we are a HUGE fan and supporter of all the work you are doing in the area of open research information! 
Hans de Jonge
Director of Open Science Policies
Dutch Research Council NWO


Van: unpa...@googlegroups.com <unpa...@googlegroups.com> namens Casey Meyer <ca...@ourresearch.org>
Verzonden: Thursday, December 14, 2023 9:32:35 PM
Aan: dan.p...@deltathink.com <dan.p...@deltathink.com>
CC: Unpaywall discussion <unpa...@googlegroups.com>
Onderwerp: Re: Deprecating oa_date in the API and snapshot
 

Kate Nyhan

unread,
Dec 15, 2023, 12:00:46 PM12/15/23
to Unpaywall discussion

Hi, FWIW, I just realized that oa_date is (was) available due to this thread. I think it's very useful info; you need it to study embargoed OA. In the past I was able to do that on a small scale by comparing the PubMed EPubDate and the PubMed Central PmcLiveDate variables. That was a big hassle, and it only accounted for public access in one repository. The Unpaywall oa_date would be much more useful and convenient!

Kate Nyhan

Casey Meyer

unread,
Dec 15, 2023, 12:54:45 PM12/15/23
to Kate Nyhan, Unpaywall discussion
Hello Hans and Kate,

Thank you for that feedback and the impact it will have on your research.

Thanks,
Casey

Eck, N.J.P. van (Nees Jan)

unread,
Dec 18, 2023, 12:27:11 PM12/18/23
to Casey Meyer, Kate Nyhan, Unpaywall discussion

Hi Casey,

 

At the Centre for Science and Technology Studies (CWTS) at Leiden University we support the point already made by others. oa_date is an important field in scientometric analyses that we perform based on Unpaywall data. We hope this field can be retained in future versions of the data.

 

Best,

Nees

Federico Leva

unread,
Dec 19, 2023, 2:06:39 AM12/19/23
to Casey M, Unpaywall discussion
Il 13/12/23 21:42, Casey M ha scritto:
> We're updating some parts of the Unpaywall software,

I'm very curious to hear more!

> which will lead us to
> remove the field "oa_date" that is provided in the API and snapshot.

That's a pity (wasn't this field also instrumental for the "The Future
of OA" paper?) but I see how it's hard to maintain. Is it still based on
the giant postgresql table which records retrievals?

Personally I don't really use oa_date except when I'm debugging false
positives for OAbot on the English Wikipedia
(https://phabricator.wikimedia.org/T344114 ). For bronze OA articles, I
didn't trust it much. I hope the "updated" date for individual OA
locations remains otherwise I'll be lost.

Perhaps if you tell us more about the changes and the future data
structure people can come up with an alternative definition/estimation
method which could rely on the surviving data fields.

With an eye to OpenAlex integration, I think it would be worthwhile for
both OpenAlex and Unpaywall to have a "first bright archival date", with
fatcat/IA scholar data. I often use fatcat to confirm the date when a
work was gratis OA at the publisher. I don't see a precalculated date
field but for the wayback machine it can be derived from the URL of the
webcapture:
<https://api.fatcat.wiki/redoc#tag/webcaptures/operation/get_webcapture>.

(To keep this working well, ideally you'd also submit newly discovered
unpaywall URLs to the wayback machine and run a bot to update the fatcat
items.)

Best,
Federico

Eric Schares

unread,
Dec 20, 2023, 11:17:41 AM12/20/23
to Unpaywall discussion
Hi all,

I'm following this with interest and would like to echo my colleagues' call to not completely remove the `oa_date` field. It is a critical piece in detecting whether a paper classified as Hybrid was free to read at the time of publication (possible APC), or because of the expiration of an embargo.

I have recently been looking at Elsevier articles classified as Hybrid and found quite a few available as part of the "Open Archive," or after a journal-specific embargo. (Incidentally, I think these should instead be classified as Bronze since they were not free to read at the time of publication).

Neuron Hybrid DOIs:
2015: 10.1016/j.neuron.2015.02.005 Open Archive      published_date: "2015-03-01"     oa_date on best location: "2016-03-04"
   10.1016/j.neuron.2014.11.025 Open Access      published_date: "2015-01-01"     oa_date on best location: "2014-11-26"

Thank you,
Eric Schares
Iowa State University

Petra Otten

unread,
Dec 21, 2023, 6:07:58 AM12/21/23
to Unpaywall discussion

Dear Casey,

Thank you for your notification about deprecating the field OA_date in the API and the snapshot.
But I would like to express my concerns about this decision. I’m the project leader of a project initiated by the consortium of Dutch university libraries and the Royal Library (UKB) and the organisation of Universities of the Netherlands (UNL). In this project we aim to enhance our yearly report about Dutch Open Access for the national government.

We consider Unpaywall to be the gold standard when it comes to verifying Open Access statuses, and we greatly appreciate your immense achievements and efforts. It is a fundamental building block for us and for the worldwide Open Access community.
One of the key fields which we use in our new reports is the OA_date field in order to detect immediate or near immediate OA of publications of a specific publication year independent of the moment in time the monitoring is done. We want to emphasize the critical importance of the OA_date field to us. Its absence would have a substantial and detrimental impact.

So please, could you reconsider this decision?

Kind regards,

Petra Otten MA
Project manager Dutch OA Monitor Project
--
petra...@wur.nl
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages