Next snapshot release

246 views
Skip to first unread message

Nees Jan van Eck

unread,
Mar 22, 2023, 5:31:13 PM3/22/23
to Unpaywall discussion
Hi Unpaywall team, when will the next snapshot be released?

Thanks,
Nees

Casey M

unread,
Apr 10, 2023, 10:12:21 PM4/10/23
to Unpaywall discussion
Hi Nees,

We need to do that soon! We can plan to create it in the next two weeks.

Thanks,
Casey

Nees Jan van Eck

unread,
Apr 11, 2023, 3:22:11 AM4/11/23
to Unpaywall discussion
Many thanks, Casey. That would be great.

Kamil Mroczek

unread,
May 8, 2023, 11:21:58 AM5/8/23
to Unpaywall discussion
Hello,

Has the latest snapshot been created yet? I recently filled out the snapshot form but it is still pointing to 2022-03-09 version.

Thanks!

Kamil

Casey M

unread,
May 9, 2023, 8:56:36 PM5/9/23
to Kamil Mroczek, Unpaywall discussion
Hi Kamil,

Instead of creating a new Unpaywall snapshot, we would like to refer you to our new product, OpenAlex. Have you heard of it by chance? 

All of the data from Unpaywall is going into OpenAlex now... and it includes much more than what is in Unpaywall. The snapshot is done in a similar format as Unpaywall and is updated on a monthly basis. In fact, the latest snapshot is dated May 3, 2023. You can read about OpenAlex here: https://openalex.org/ and instructions for downloading the latest snapshot are here: https://docs.openalex.org/download-all-data/openalex-snapshot. You can see some of the data by using the API. Here is an example API call: https://api.openalex.org/works. Give it a try and let me know what you think!

Thanks,
Casey

--
You received this message because you are subscribed to the Google Groups "Unpaywall discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unpaywall+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unpaywall/07e19588-b7fc-4b90-b592-d98677705a42n%40googlegroups.com.

Kamil Mroczek

unread,
May 10, 2023, 11:21:31 AM5/10/23
to Casey M, Unpaywall discussion
Hi Casey,

Thanks for this. I will check out OpenAlex.

A while back I looked into OpenAlex vs Unpaywall and determined Unpaywall was the correct product to use. Is there a page that details the differences or can you quickly summarize which use cases each is intended to service?

Thanks!

Kamil

Eck, N.J.P. van (Nees Jan)

unread,
May 10, 2023, 4:07:37 PM5/10/23
to Steve Gruber, Unpaywall discussion, Casey Meyer

Hi Steve,

 

Yes, I know that we have access to the Unpaywall data feed subscription, but for particular applications it is more convenient for us to make use of the snapshot. We will migrate to OpenAlex. No problem.

 

Best,

Nees

 

From: Steve Gruber <st...@ourresearch.org>
Sent: Wednesday, May 10, 2023 9:55 PM
To: Eck, N.J.P. van (Nees Jan) <eckn...@cwts.leidenuniv.nl>
Cc: Unpaywall discussion <unpa...@googlegroups.com>
Subject: Re: Next snapshot release

 

Hi Nees,

I hope this note finds you well.

Just to add here - CWTS does have access to the Unpaywall Data Feed subscription and you should be receiving the daily changefile updates.

Thed van Leeuwen is the main contact for that.

 

Thanks,

Steve

 

--

You received this message because you are subscribed to the Google Groups "Unpaywall discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unpaywall+...@googlegroups.com.

Eric Jeangirard

unread,
May 11, 2023, 4:05:43 AM5/11/23
to Casey M, Kamil Mroczek, Unpaywall discussion
Hi Casey

I'd like to intervene here because I'm a bit worried: your answer suggests that Unpaywall data is "replicated" in Openalex data. In fact, I see many differences in the 'locations' field.

For example, for doi 10.1016/j.jbc.2021.100324
One one hand, Unpaywall indicates two OA locations :
 - Location 1:
"host_type": "publisher",
"url_for_pdf": "http://www.jbc.org/article/S0021925821000958/pdf"
- Location 2
host_type": "repository",
"url_for_landing_page": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7949154"

On the other hand, OpenAlex has one OA  location only, 
- Location
and specifies "any_repository_has_fulltext": false

In this case (and it's only one but I can easily find thousands of them), Unpaywall data and OpenAlex data are not the same, and Unpaywall data seems better (to me).
In particular, Openalex data misses the repository hosting, which hides the shadow green open access.
Also, OpenAlex does not provide the direct pdf link for download.

So depending on the use case, I would be very reluctant (for now), to replace Unpaywall data by OpenAlex data.

Regards

Eric


Bianca Kramer

unread,
May 11, 2023, 5:06:06 AM5/11/23
to Eric Jeangirard, Casey M, Kamil Mroczek, Unpaywall discussion
Hi Casey, Eric, all, 

I would like to support Eric's points of caution. Especially if repository hosting would be consistently missing for gold/hybrid/bronze OA, that is a problem both for analysis and for visibility of green OA. 

A few additional observations:

  • Additional fields from oa_locations in Unpaywall that currently do not seem to be integrated (yet) in OpenAlex:
    • "updated" and "oa_date" (use case: for green OA,  useful for exploring empirical embargo times)
    • "pmh_id" (I know the Unpaywall documentation says this is primarily for internal debugging, so it's fair not to expect it to be included per se, nonetheless, it's useful information to check repository records)
    • "journal_is_in_doaj" and "journal_is_oa" (for publisher locations) - I think all the other information needed to do custom oa classification (which might be different from "oa_color") is there, but these are missing. 

Regarding Eric's point re: direct link to the pdf: in the example given, the direct link to the pdf is in the field "oa_url" in OpenAlex, (https://api.openalex.org/works/doi:10.1016/j.jbc.2021.100324), but indeed, the field ' "pdf_url" in _best_oa_location is null, while there is a field "url_for_pdf" in Unpaywall (https://api.unpaywall.org/v2/10.1016/j.jbc.2021.100324?email=bianca...@gmail.com- so that might have been a mistake in data ingest/processing?

kind regards, Bianca 

Op do 11 mei 2023 om 10:05 schreef Eric Jeangirard <ejean...@gmail.com>:

Federico Leva

unread,
May 11, 2023, 6:27:29 AM5/11/23
to Casey M, Kamil Mroczek, Unpaywall discussion
Thanks Casey and a belated welcome in the Unpaywall/OurResearch community!

I was a bit surprised by this answer. I was under the impression that
downloading OpenAlex dumps was considered a rather expensive operation
(<https://docs.openalex.org/download-all-data/download-to-your-machine>
mentions a 70 $ figure, though now provided in-kind by AWS).

Considering also the subtle data differences mentioned by others, I
wonder whether I should really download and process an order of
magnitude more data when I'm usually just looking for OA repository URLs.

Best,
Federico

Il 10/05/23 03:56, Casey M ha scritto:

najko...@gmail.com

unread,
May 16, 2023, 3:45:45 AM5/16/23
to Unpaywall discussion
Hi Bianca, everyone,

The OpenAlex source object contains information about open access journals:

https://docs.openalex.org/api-entities/sources/source-object#is_in_doaj

I would also like to see comprehensive Unpaywall open access evidence in OpenAlex, and not just the "best" open access location, which is biased towards publisher-provided open access.

Najko

Casey M

unread,
May 16, 2023, 9:39:12 AM5/16/23
to najko...@gmail.com, Unpaywall discussion
Thanks for the comments Bianca, Eric, Najko! You're right that we have work to do in order to replicate Unpaywall exactly in OpenAlex. We may create one more Unpaywall snapshot while we're fixing the issues you mentioned. Our goal is to get all of the data matched up.

As to those issues:

1. We recently discovered that same bug regarding pdf_url in locations. We plan to fix that soon. It affects a lot of records.

2. We will look into the issue you mentioned Eric, regarding the missing location.

3. The is_in_doaj or is_oa fields are reflected in sources: https://api.openalex.org/sources. We will look into moving oa_date into locations, and will consider pmh_id as well.

Thanks again for the awesome feedback and discussion.

Casey

Reply all
Reply to author
Forward
0 new messages