Metadata sources

19 views
Skip to first unread message

Christina Drummond

unread,
Aug 12, 2020, 2:46:24 PM8/12/20
to OA eBU Data Trust Technical Norms/Standards Working Group
Greetings All,

Building from the COPIM WP5 report's section on OA book metadata suppliers (3.11), which data sources would you recommend the OAeBU Data Trust reference for the best, up to date background metadata on books at scale?

Christina

Eric Hellman

unread,
Aug 12, 2020, 4:33:33 PM8/12/20
to OAeBU-DataTrust-N...@googlegroups.com
What do you mean by "background metadata"?
By "books" do you mean the works underlying the books, or some sort of "edition"?
What problem are you hoping that the metadata solves? (There are no wrong answers!)



Eric Hellman
President, Free Ebook Foundation
Founder, Unglue.it https://unglue.it/
https://go-to-hellman.blogspot.com/
twitter: @gluejar

--
This message was generated through one of the OA eBook Usage Data Trust community forums. Learn more about this Andrew W. Mellon supported 2020-2021 pilot project at https://educopia.org/data_trust/.
---
To post to this group, email OAeBU-DataTrust-N...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "OA eBU Data Trust Technical Norms/Standards Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to OAeBU-DataTrust-Norms-S...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/b035cf51-5d67-40be-8e5b-0f49e81741f7n%40googlegroups.com.

Christina Drummond

unread,
Aug 12, 2020, 5:26:26 PM8/12/20
to OAeBU-DataTrust-N...@googlegroups.com
Generally, I'm wondering which specific data sources members of this group see as important enough that they'd expect the data sources to be represented within the trust. 

The question about the definition of books is well taken. I tend to think of "books" as any (electronic) book object with a unique identifier, which could have relationships with other books.  But that's my love for relational databases coming through. 

What do others think? What would be necessary to have in the data trust? 

(I am not intentionally trying to play the question game, although I do quite enjoy it.)
-----------------------------
Christina Drummond, CIPP/US, M.A. International Science and Technology Policy
OAeBU Data Trust Program Officer
Educopia Institute
Working from Columbus, OH USA | EDT Timezone (GMT-4)


Eric Hellman

unread,
Aug 12, 2020, 6:35:18 PM8/12/20
to OAeBU-DataTrust-N...@googlegroups.com
... and I don't mean to be pendantic, and "the question game" is hard to avoid, but  the word "unique" is not an attribute of ISBN as applied to books, in which case your answer raises a bunch of other questions. Sometimes the best answer I've been able to work with is that a book is a row in my book table, which supplied foriegn keys  to my identifier table.

Amanda Ramalho

unread,
Aug 12, 2020, 7:20:16 PM8/12/20
to OAeBU-DataTrust-N...@googlegroups.com
Hi Christina,

Here in Brazil having metadata from books has always been a challenge. The National Library, which managed the ISBNs, did not have a reliable database that could be used. In March 2020 the management of the ISBN passed to the "Câmara Brasileira do Livro (CBL)" and we hope that the scenario will change in a few years (it is a big challenge, a lot of wrong data!).
In 2012 SciELO launched the SciELO Books project, with the objective of creating a reliable peer-reviewed academic book database. It started with Brazilian university publishers but was opened to other countries and today it indexes 2 more Colombian publishers. The project has been successful and the publishers did not have a greater adhesion due to the lack of resources. Today we have about 1,300 published books, of these 838 are open access and we have their complete metadata in ONIX and KBART.

In Latin America, EULAC has been developing a project to standardize metadata for academic books called ULivros https://ulibros.com/

Att.,
--
Amanda Ramalho
SciELO Books Unit
FAPESP – CAPES  CNPq – BIREME– FapUNIFESP – ABEU
Programa SciELO/FAPESP
____________________________
São Paulo/SP | Brasil
livros.scielo.org | www.scielo.br


--

Lucy Montgomery

unread,
Aug 12, 2020, 10:25:45 PM8/12/20
to oaebu-datatrust-n...@googlegroups.com

Thanks for sharing, Amanda. This is really helpful information for our team.

 

Lucy

 

-- 

Associate Professor Lucy Montgomery

PhD (QUT); BA (Asian Studies) (Hons 1) (Adelaide)

Program Lead | Innovation in Knowledge Communication, Centre for Culture and Technology

 

Co-Lead | Curtin Open Knowledge Initiative

Tel | +61 8 9266 4992 
Mobile | 0401 103 672 
Email I lucy.mo...@curtin.edu.au
Web | http://ccat.curtin.edu.au/

Latest Book - Open Knowledge Institutions: Reinventing Universities:

 https://wip.mitpress.mit.edu/oki

Ronald Snijder

unread,
Aug 13, 2020, 3:41:20 AM8/13/20
to OAeBU-DataTrust-N...@googlegroups.com

Dear Christina,

 

Just like Eric we have been building our own metadata sources: the OAPEN Library (around 12,000 publications) and the Directory of Open Access Books (DOAB) with close to 30,000 descriptions. Both are reliant on data provided by publishers.

 

Quality control differs: DOAB is set up as a tool that is simple to use for publishers and our main focus is whether the link to the publication actually works. In the OAPEN Library, records are only created – and updated – by our staff. Here is much more quality control. However, sometimes a correction on a larger scale is needed. For instance, we updated ~1,000 records based on CrossRef data (https://oapen.org/blog/;jsessionid=5CB9850EFF50162618A6BCB4F0381ED4?link=https%3A%2F%2Foapen.hypotheses.org%2F124). Obviously, this only works for books with a DOI.

 

So, in short:

  • DOAB could be used as an indicator whether a certain book in OA, but the metadata is not tightly controlled
  • OAPEN Library has a smaller collection, but has better metadata
  • CrossRef might be a good source

 

Kind regards,

Ronald Snijder, PhD

 

OAPEN Foundation

Prins Willem-Alexanderhof 5

PO Box 90407

2509 LK The Hague

The Netherlands

 

email: r.sn...@oapen.org

www.oapen.org

 

ORCID: 0000-0001-9260-4941

--

Javier Arias

unread,
Aug 13, 2020, 5:43:14 AM8/13/20
to OAeBU-DataTrust-N...@googlegroups.com

That's precisely the approach we took in the HIRMEOS project. We've got a small API used to 'translate' book identifiers (e.g. find the DOI associated with this ISBN). In case it helps, the schema of its database is at https://github.com/hirmeos/identifiers_db/blob/master/identifiers.png , where 'work' represents any type of publication (chapter, book, book collections, etc.) and all identifiers are stored as URIs (e.g. urn:isbn:9781906924225).

Overall, the main difficulty I'm finding within COPIM is precisely the different interpretations of the word book. Most metadata formats follow Christina's definition as, I assume, is closer to what librarians want to catalogue. However, there are many occasions in which one would want to represent a single book metadata record, grouping all the various formats; specially when reporting metrics back to authors. ONIX, for example, implements Christina's definition, but it is also used by some platforms with Eric's definition. It gets even worse when the book is OA and not only does it have multiple formats, but there are also multiple locations for the digital ones. I've always thought that the metadata record associated to the book DOI could become a living master record, but Crossref's officially broken the DOI system through the addition of 'co-access'... So, does anyone know a metadata standard that implements this relation of Book -> Format -> Location ?

signature.asc

Eric Hellman

unread,
Aug 13, 2020, 10:00:42 AM8/13/20
to OAeBU-DataTrust-N...@googlegroups.com
For unglue.it, our schema looks like this:

work
identifier
foreign key: work
foriegn key: edition
type (olib, ltwk, goog, gdrd, thng, isbn, oclc, olwk, doab, gtbg, glue, doi, http)
edition
foreign key: work
ebook
foreign key: edition
format (pdf,epub,mobi,html,online)
relation
foreign key: "to" work
foreign key: "from" work
type

The identifier table has 1.2 million entries; the work table a quarter million.

The doab identifier is probably most useful for OAeBU if the community is willing to have DOAB be a gatekeeper and definer of book-ness.

There have been a number of efforts to build identifier cross-connection services (I've been involved in some) but they've not been useful across applications because every application requires its own merge and de-dupe criteria. Not to mention inter-corporate management issues.

So for example, our application required us to be aware of underlying works for copyright purposes and of connections to diverse sites like Google, Project Gutenberg, OpenLibrary, Goodreads, and Librarything, whereas the HIRMEOS schema is simpler and more focussed.

Eric

Hawkins, Kevin

unread,
Aug 13, 2020, 10:51:41 AM8/13/20
to OAeBU-DataTrust-N...@googlegroups.com

Javier, could you say more about why you don’t like Crossref’s co-access model? And when you say you want a model for Book -> Format -> Location, do you mean going from a single title of a work to a single (or multiple?) formats and then to single (or multiple?) locations?

Eric Hellman

unread,
Aug 13, 2020, 11:11:11 AM8/13/20
to OAeBU-DataTrust-N...@googlegroups.com
I agree with Javier here. Crossref co-access is a step down a slippery slope. 

What about citation splitting? The Crossref DOI is a citation identifier. This means that we identify content to enable accurate citation for scholarly content. This is different from other identifiers, like the ISBN, which are used to identify all the different formats - hardback, paperback, ePub. Therefore, a basic Crossref principle is that content, even if it’s available in different formats, should only have one Crossref DOI. For content that is part of Co-access, there will be multiple DOIs for the same content and this could mean that where systems and services use the DOI to track citations that all the citations will not be captured since they are spread across multiple DOIs. In addition, a service like Crossref Event Data (which collects post publication events), having multiple DOIs for the same content makes it harder to track activity.

We'll soon have dois identifying teddy bears and toy figurines - to solve a "use case".

Year 5 Harry Potter Bust
 ISBN-13: 9781593967574
    Manufacturer: Diamond Comics

About This Item

Harry Potter and the Order of the Phoenix is set to be released in 2007, and this bust is based off of original scan data from the film. Harry is holding the Prophecy Orb and his signature wand. Harry stands 6 1/4" tall. Comes with a hand-numbered Certificate of Authenticity.


Eric Hellman
President, Free Ebook Foundation
Founder, Unglue.it https://unglue.it/
https://go-to-hellman.blogspot.com/
twitter: @gluejar
On Aug 13, 2020, at 10:51 AM, Hawkins, Kevin <Kevin....@unt.edu> wrote:

Javier, could you say more about why you don’t like Crossref’s co-access model? And when you say you want a model for Book -> Format -> Location, do you mean going from a single title of a work to a single (or multiple?) formats and then to single (or multiple?) locations?
 
From: oaebu-datatrust-n...@googlegroups.com <oaebu-datatrust-n...@googlegroups.com> On Behalf Of Javier Arias
Sent: Thursday, August 13, 2020 4:43 AM
To: OAeBU-DataTrust-N...@googlegroups.com
Subject: [EXT] Re: [OAeBU-DataTrust-Norms-Standards-WG] Metadata sources
 

That's precisely the approach we took in the HIRMEOS project. We've got a small API used to 'translate' book identifiers (e.g. find the DOI associated with this ISBN). In case it helps, the schema of its database is athttps://github.com/hirmeos/identifiers_db/blob/master/identifiers.png , where 'work' represents any type of publication (chapter, book, book collections, etc.) and all identifiers are stored as URIs (e.g. urn:isbn:9781906924225).

-- 
This message was generated through one of the OA eBook Usage Data Trust community forums. Learn more about this Andrew W. Mellon supported 2020-2021 pilot project athttps://educopia.org/data_trust/.

Javier Arias

unread,
Aug 17, 2020, 5:10:28 AM8/17/20
to OAeBU-DataTrust-N...@googlegroups.com
DOIs were supposed to be a unique identifier for a particular object, regardless of its location(s). Crossref has checks to prevent minting multiple DOIs for the same object, but these are not perfect and so there are some cases in which you can find two or more DOIs assigned to the same object. Co-access was originally proposed as a way to map these errors together and inform the user about them.


The current aim of co-access seems to be related to platforms systematically ignoring DOIs provided by the publisher. The URL the DOI resolves to is ought to be treated as the canonical location, and the DOI owner can also link other URLs the object may be located at to the DOI record ("multiple resolution"). However, platforms do not seem comfortable using an identifier that is also a location. A location that doesn't give them traffic. With co-access, these platforms have actually been encouraged to mint their own (redundant) DOIs instead of reusing any existing ones.


With the Book -> Format -> Location I was naively hoping that it would be interpreted as three entities with 1-M relations: a single title has many formats, each of them hosted in various locations. Usage is recorded at Location and reported at the Book level, but most metadata standards focus on the Format.
signature.asc

Hawkins, Kevin

unread,
Aug 19, 2020, 4:17:38 PM8/19/20
to OAeBU-DataTrust-N...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages