COPIM (WP5) scoping report is out!

10 views
Skip to first unread message

Javier Arias

unread,
Aug 11, 2020, 5:52:14 AM8/11/20
to oaebu-datatrust-n...@googlegroups.com
Dear All,

I'd like to share a recently fulfilled milestone of the Metadata and
Dissemination Work Package (WP5) of the COPIM project (
https://www.copim.ac.uk/ ). The goal of this WP is to develop a metadata
management software - which we have called Thoth (after the Egyptian god
of writing :). The first step in the design of Thoth has been the
release of a scoping report, authored by Graham Stone (JISC), which you
can find at https://doi.org/10.21428/785a6451.939caeab

Even though COPIM doesn't have a WP dedicated to metrics per se,
dissemination and discovery is closely linked to usage, and metadata
integrity is a key aspect of usage metrics collection. Therefore, I
hope, you will find that some of the recommendations listed in the
scoping report are relevant to the Data Trust project, as well as they
provide good insight of the current outlook of OA books.

All the best,
Javi


signature.asc

Eric Hellman

unread,
Aug 12, 2020, 10:58:08 AM8/12/20
to OAeBU-DataTrust-N...@googlegroups.com
Thanks for this!

A couple of comments.

1. The concept of a "Data Lake" is deeply rooted in the print metadata landscape, and doesn't reflect the reality of today's digital world. You can't try to clean your digital metadata and carefully monitor the inflow for its pristine nature. Managing metadata for the digital world is like managing a river with many sources and many different downstream usages. It's new metadata everyday; things that were true yesterday are false today. The most poisonous thing about the "lake" model of metadata is that it requires gatekeeping, leaving large populations excluded from contributing to the lake.

2. I'm sure many other community participants were unexamined by the WP5 authors, but Unglue.it is in an adjacent position and has tackled most of the problems considered in the report. COPIM can learn from its successes and shortcomings, and the code is all free open source software. Unglue.it was built as a crowd-funding platform and has pivoted to being primarily a cataloguing and distribution platform for free ebooks. The catalogue has over 80,000 entries and has distributed almost 2 million ebooks. It's worth learning from
i. we implemted a metadata harvesting, editing and creation facitlity.
ii. we implemented faceted ONIX, OPDS and MARC feeds, as well as custom APIs. The OPDS feed was built to supply NYPL's Simply-E server.
iii. we implemented a FRBR model so that editions are grouped into works; many different ebook formats can be grouped onto editions if needed.

3. Deduplication and edition management are REALLY hard. Free licenses make this much harder than otherwise, because books can appear in so many guises, in so many places. For many types of works, OA books can be revised, sometimes quite often, and for these works, the lake is irrelevant. It would be a shame if WP5 precluded support for revision metadata. Ironically, the only contemplation of revisions in the report is self-referential!

4. The discussion of chapter metadata seems to omit consideration of the impact of OA licenses. The question is of scope. When is an article a chapter and not an article? Should COPIM be managing articles? How do you deal with book remixes? The OA world has long had the concept of "overlay journals" perhaps this idea will come into its own with "overlay" books. This is not a hard issue, but it's easy to get wrong, and easy to get tripped up by the lack of support in legacy metadata supply chains.

5. The only problem ISBN solves for Free Ebooks is fitting them into a sales-oriented system. Not that that's a bad thing. 

Always happy to discuss!

Eric


Eric Hellman
President, Free Ebook Foundation
Founder, Unglue.it https://unglue.it/
https://go-to-hellman.blogspot.com/
twitter: @gluejar

--
This message was generated through one of the OA eBook Usage Data Trust community forums. Learn more about this Andrew W. Mellon supported 2020-2021 pilot project at https://educopia.org/data_trust/.
---
To post to this group, email OAeBU-DataTrust-N...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "OA eBU Data Trust Technical Norms/Standards Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to OAeBU-DataTrust-Norms-S...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/ba8789ae-f241-51b3-d417-0e4eca86ef10%40openbookpublishers.com.

Brian O'Leary

unread,
Aug 12, 2020, 11:27:08 AM8/12/20
to OAeBU-DataTrust-N...@googlegroups.com
Thanks; look forward to reviewing the WP.

--
This message was generated through one of the OA eBook Usage Data Trust community forums. Learn more about this Andrew W. Mellon supported 2020-2021 pilot project at https://educopia.org/data_trust/.
---
To post to this group, email OAeBU-DataTrust-N...@googlegroups.com
---
You received this message because you are subscribed to the Google Groups "OA eBU Data Trust Technical Norms/Standards Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to OAeBU-DataTrust-Norms-S...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/OAeBU-DataTrust-Norms-Standards-WG/ba8789ae-f241-51b3-d417-0e4eca86ef10%40openbookpublishers.com.


--
Brian F. O'Leary
Executive Director, Book Industry Study Group
232 Madison Avenue, Suite 1400
New York, NY 10016

Reply all
Reply to author
Forward
0 new messages