Metadata sources

Christina Drummond

unread,

Aug 12, 2020, 2:46:24 PM8/12/20

to OA eBU Data Trust Technical Norms/Standards Working Group

Greetings All,

Building from the COPIM WP5 report's section on OA book metadata suppliers (3.11), which data sources would you recommend the OAeBU Data Trust reference for the best, up to date background metadata on books at scale?

Christina

Eric Hellman

unread,

Aug 12, 2020, 4:33:33 PM8/12/20

to OAeBU-DataTrust-N...@googlegroups.com

What do you mean by "background metadata"?

By "books" do you mean the works underlying the books, or some sort of "edition"?

What problem are you hoping that the metadata solves? (There are no wrong answers!)

Eric Hellman
President, Free Ebook Foundation
Founder, Unglue.it https://unglue.it/
https://go-to-hellman.blogspot.com/
twitter: @gluejar

--
This message was generated through one of the OA eBook Usage Data Trust community forums. Learn more about this Andrew W. Mellon supported 2020-2021 pilot project at https://educopia.org/data_trust/.
---
To post to this group, email OAeBU-DataTrust-N...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "OA eBU Data Trust Technical Norms/Standards Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to OAeBU-DataTrust-Norms-S...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/b035cf51-5d67-40be-8e5b-0f49e81741f7n%40googlegroups.com.

Christina Drummond

unread,

Aug 12, 2020, 5:26:26 PM8/12/20

to OAeBU-DataTrust-N...@googlegroups.com

Generally, I'm wondering which specific data sources members of this group see as important enough that they'd expect the data sources to be represented within the trust.

The question about the definition of books is well taken. I tend to think of "books" as any (electronic) book object with a unique identifier, which could have relationships with other books. But that's my love for relational databases coming through.

What do others think? What would be necessary to have in the data trust?

(I am not intentionally trying to play the question game, although I do quite enjoy it.)

-----------------------------

Christina Drummond, CIPP/US, M.A. International Science and Technology Policy

OAeBU Data Trust Program Officer

Educopia Institute

Working from Columbus, OH USA | EDT Timezone (GMT-4)

chri...@educopia.org | 614.323.8396

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/58ED8294-9546-4532-9FBB-5476C5462C34%40hellman.net.

Eric Hellman

unread,

Aug 12, 2020, 6:35:18 PM8/12/20

to OAeBU-DataTrust-N...@googlegroups.com

... and I don't mean to be pendantic, and "the question game" is hard to avoid, but the word "unique" is not an attribute of ISBN as applied to books, in which case your answer raises a bunch of other questions. Sometimes the best answer I've been able to work with is that a book is a row in my book table, which supplied foriegn keys to my identifier table.

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/CAG93Qkg59_jNLiCKDFcfmRvUH%3D5hx5fRLMOTuTZpU5Zxr8A1ZA%40mail.gmail.com.

Amanda Ramalho

unread,

Aug 12, 2020, 7:20:16 PM8/12/20

to OAeBU-DataTrust-N...@googlegroups.com

Hi Christina,

Here in Brazil having metadata from books has always been a challenge. The National Library, which managed the ISBNs, did not have a reliable database that could be used. In March 2020 the management of the ISBN passed to the "Câmara Brasileira do Livro (CBL)" and we hope that the scenario will change in a few years (it is a big challenge, a lot of wrong data!).
In 2012 SciELO launched the SciELO Books project, with the objective of creating a reliable peer-reviewed academic book database. It started with Brazilian university publishers but was opened to other countries and today it indexes 2 more Colombian publishers. The project has been successful and the publishers did not have a greater adhesion due to the lack of resources. Today we have about 1,300 published books, of these 838 are open access and we have their complete metadata in ONIX and KBART.

In Latin America, EULAC has been developing a project to standardize metadata for academic books called ULivros https://ulibros.com/

Att.,

--

Amanda Ramalho

SciELO Books Unit

FAPESP – CAPES – CNPq – BIREME– FapUNIFESP – ABEU

Programa SciELO/FAPESP

____________________________
São Paulo/SP | Brasil
livros.scielo.org | www.scielo.br

--

Lucy Montgomery

unread,

Aug 12, 2020, 10:25:45 PM8/12/20

to oaebu-datatrust-n...@googlegroups.com

Thanks for sharing, Amanda. This is really helpful information for our team.

Lucy

--

Associate Professor Lucy Montgomery

PhD (QUT); BA (Asian Studies) (Hons 1) (Adelaide)

Program Lead | Innovation in Knowledge Communication, Centre for Culture and Technology

Co-Lead | Curtin Open Knowledge Initiative

Tel | +61 8 9266 4992
Mobile | 0401 103 672
Email I lucy.mo...@curtin.edu.au
Web | http://ccat.curtin.edu.au/

Latest Book - Open Knowledge Institutions: Reinventing Universities:

https://wip.mitpress.mit.edu/oki

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/CAEwePGBiX9nJLaVBD7xBcO_oQ%3DsWDsVUhC%3DDkyaGupKfOz6-hg%40mail.gmail.com.

Ronald Snijder

unread,

Aug 13, 2020, 3:41:20 AM8/13/20

to OAeBU-DataTrust-N...@googlegroups.com

Dear Christina,

Just like Eric we have been building our own metadata sources: the OAPEN Library (around 12,000 publications) and the Directory of Open Access Books (DOAB) with close to 30,000 descriptions. Both are reliant on data provided by publishers.

Quality control differs: DOAB is set up as a tool that is simple to use for publishers and our main focus is whether the link to the publication actually works. In the OAPEN Library, records are only created – and updated – by our staff. Here is much more quality control. However, sometimes a correction on a larger scale is needed. For instance, we updated ~1,000 records based on CrossRef data (https://oapen.org/blog/;jsessionid=5CB9850EFF50162618A6BCB4F0381ED4?link=https%3A%2F%2Foapen.hypotheses.org%2F124). Obviously, this only works for books with a DOI.

So, in short:

DOAB could be used as an indicator whether a certain book in OA, but the metadata is not tightly controlled
OAPEN Library has a smaller collection, but has better metadata
CrossRef might be a good source

Kind regards,

Ronald Snijder, PhD

OAPEN Foundation

Prins Willem-Alexanderhof 5

PO Box 90407

2509 LK The Hague

The Netherlands

email: r.sn...@oapen.org

www.oapen.org

ORCID: 0000-0001-9260-4941

--

Javier Arias

unread,

Aug 13, 2020, 5:43:14 AM8/13/20

to OAeBU-DataTrust-N...@googlegroups.com

That's precisely the approach we took in the HIRMEOS project. We've got a small API used to 'translate' book identifiers (e.g. find the DOI associated with this ISBN). In case it helps, the schema of its database is at https://github.com/hirmeos/identifiers_db/blob/master/identifiers.png , where 'work' represents any type of publication (chapter, book, book collections, etc.) and all identifiers are stored as URIs (e.g. urn:isbn:9781906924225).

Overall, the main difficulty I'm finding within COPIM is precisely the different interpretations of the word book. Most metadata formats follow Christina's definition as, I assume, is closer to what librarians want to catalogue. However, there are many occasions in which one would want to represent a single book metadata record, grouping all the various formats; specially when reporting metrics back to authors. ONIX, for example, implements Christina's definition, but it is also used by some platforms with Eric's definition. It gets even worse when the book is OA and not only does it have multiple formats, but there are also multiple locations for the digital ones. I've always thought that the metadata record associated to the book DOI could become a living master record, but Crossref's officially broken the DOI system through the addition of 'co-access'... So, does anyone know a metadata standard that implements this relation of Book -> Format -> Location ?

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/252813B9-BD26-404C-992C-27B4F9D36FC8%40hellman.net.

signature.asc

Eric Hellman

unread,

Aug 13, 2020, 10:00:42 AM8/13/20

to OAeBU-DataTrust-N...@googlegroups.com

For unglue.it, our schema looks like this:

work

identifier

foreign key: work

foriegn key: edition

type (olib, ltwk, goog, gdrd, thng, isbn, oclc, olwk, doab, gtbg, glue, doi, http)

edition

foreign key: work

ebook

foreign key: edition

format (pdf,epub,mobi,html,online)

relation

foreign key: "to" work

foreign key: "from" work

type

The identifier table has 1.2 million entries; the work table a quarter million.

The doab identifier is probably most useful for OAeBU if the community is willing to have DOAB be a gatekeeper and definer of book-ness.

There have been a number of efforts to build identifier cross-connection services (I've been involved in some) but they've not been useful across applications because every application requires its own merge and de-dupe criteria. Not to mention inter-corporate management issues.

So for example, our application required us to be aware of underlying works for copyright purposes and of connections to diverse sites like Google, Project Gutenberg, OpenLibrary, Goodreads, and Librarything, whereas the HIRMEOS schema is simpler and more focussed.

Eric

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/8e090693-4472-3a70-00ef-a2c71709877a%40openbookpublishers.com.

Hawkins, Kevin

unread,

Aug 13, 2020, 10:51:41 AM8/13/20

to OAeBU-DataTrust-N...@googlegroups.com

Javier, could you say more about why you don’t like Crossref’s co-access model? And when you say you want a model for Book -> Format -> Location, do you mean going from a single title of a work to a single (or multiple?) formats and then to single (or multiple?) locations?

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/8e090693-4472-3a70-00ef-a2c71709877a%40openbookpublishers.com.

Eric Hellman

unread,

Aug 13, 2020, 11:11:11 AM8/13/20

to OAeBU-DataTrust-N...@googlegroups.com

I agree with Javier here. Crossref co-access is a step down a slippery slope.

What about citation splitting? The Crossref DOI is a citation identifier. This means that we identify content to enable accurate citation for scholarly content. This is different from other identifiers, like the ISBN, which are used to identify all the different formats - hardback, paperback, ePub. Therefore, a basic Crossref principle is that content, even if it’s available in different formats, should only have one Crossref DOI. For content that is part of Co-access, there will be multiple DOIs for the same content and this could mean that where systems and services use the DOI to track citations that all the citations will not be captured since they are spread across multiple DOIs. In addition, a service like Crossref Event Data (which collects post publication events), having multiple DOIs for the same content makes it harder to track activity.

We'll soon have dois identifying teddy bears and toy figurines - to solve a "use case".

Year 5 Harry Potter Bust
ISBN-13: 9781593967574
Manufacturer: Diamond Comics
About This Item
Harry Potter and the Order of the Phoenix is set to be released in 2007, and this bust is based off of original scan data from the film. Harry is holding the Prophecy Orb and his signature wand. Harry stands 6 1/4" tall. Comes with a hand-numbered Certificate of Authenticity.

Eric Hellman
President, Free Ebook Foundation
Founder, Unglue.it https://unglue.it/
https://go-to-hellman.blogspot.com/
twitter: @gluejar

On Aug 13, 2020, at 10:51 AM, Hawkins, Kevin <Kevin....@unt.edu> wrote:

Javier, could you say more about why you don’t like Crossref’s co-access model? And when you say you want a model for Book -> Format -> Location, do you mean going from a single title of a work to a single (or multiple?) formats and then to single (or multiple?) locations?

From: oaebu-datatrust-n...@googlegroups.com <oaebu-datatrust-n...@googlegroups.com> On Behalf Of Javier Arias
Sent: Thursday, August 13, 2020 4:43 AM
To: OAeBU-DataTrust-N...@googlegroups.com
Subject: [EXT] Re: [OAeBU-DataTrust-Norms-Standards-WG] Metadata sources

That's precisely the approach we took in the HIRMEOS project. We've got a small API used to 'translate' book identifiers (e.g. find the DOI associated with this ISBN). In case it helps, the schema of its database is athttps://github.com/hirmeos/identifiers_db/blob/master/identifiers.png , where 'work' represents any type of publication (chapter, book, book collections, etc.) and all identifiers are stored as URIs (e.g. urn:isbn:9781906924225).

--
This message was generated through one of the OA eBook Usage Data Trust community forums. Learn more about this Andrew W. Mellon supported 2020-2021 pilot project athttps://educopia.org/data_trust/.

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/DM6PR01MB45697F7C1333F7A07B8679AEFF430%40DM6PR01MB4569.prod.exchangelabs.com.

Javier Arias

unread,

Aug 17, 2020, 5:10:28 AM8/17/20

to OAeBU-DataTrust-N...@googlegroups.com

DOIs were supposed to be a unique identifier for a particular object, regardless of its location(s). Crossref has checks to prevent minting multiple DOIs for the same object, but these are not perfect and so there are some cases in which you can find two or more DOIs assigned to the same object. Co-access was originally proposed as a way to map these errors together and inform the user about them.

The current aim of co-access seems to be related to platforms systematically ignoring DOIs provided by the publisher. The URL the DOI resolves to is ought to be treated as the canonical location, and the DOI owner can also link other URLs the object may be located at to the DOI record ("multiple resolution"). However, platforms do not seem comfortable using an identifier that is also a location. A location that doesn't give them traffic. With co-access, these platforms have actually been encouraged to mint their own (redundant) DOIs instead of reusing any existing ones.

With the Book -> Format -> Location I was naively hoping that it would be interpreted as three entities with 1-M relations: a single title has many formats, each of them hosted in various locations. Usage is recorded at Location and reported at the Book level, but most metadata standards focus on the Format.

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/DM6PR01MB45697F7C1333F7A07B8679AEFF430%40DM6PR01MB4569.prod.exchangelabs.com.

signature.asc

Hawkins, Kevin

unread,

Aug 19, 2020, 4:17:38 PM8/19/20

to OAeBU-DataTrust-N...@googlegroups.com

Very insightful, Javi – thank you!

To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/ab6c34a0-dda1-65fd-857b-926fcf4d4327%40openbookpublishers.com.

Reply all

Reply to author

Forward