use case: automated workflow for harvesting, transforming and indexing of bibliographic metadata with OpenRefine

164 views
Skip to first unread message

Felix Lohmeier

unread,
Apr 22, 2018, 5:05:16 PM4/22/18
to OpenRefine
Hi all,

I would like to share my use case in the library domain. Maybe it's helpful for someone else.

We at State and University Library Hamburg are building a website that shall aggregate Open Access content from local universities. Therefore we build an automated workflow for harvesting, transforming and indexing of metadata using metha, OpenRefine and Solr with simple bash scripts.

Workflow:
  1. Harvest metadata in different standards (dublin core, datacite, ...) from multiple OAI-PMH endpoints
  2. Transform harvested data with specific rules for each source to produce normalized and enriched data
  3. Load transformed data into a Solr search index (which serves as a backend for a discovery system)

Flowchart:

Why OpenRefine?


Non-tech-savvy library staff are able to use a graphical user interface for exploring the data, creating the transformation rules and checking the results.


Source code (including example data):


https://github.com/subhh/HOS-MetadataTransformations

Thad Guidry

unread,
Apr 22, 2018, 5:39:41 PM4/22/18
to openr...@googlegroups.com
That's great Felix !

Did you consider Kylo ?
Search and Discovery and many other features such as Transformations and Data Wrangling via Spark in a nice browser interface

More videos here: https://www.youtube.com/channel/UCwXhRbDtW--SXdlKQj0Ls9A

I think Kylo could simplify things for you and your team...and probably still could.  You should do a simple Proof Of Concept against one of your OAI endpoints.

All the best,
-Thad


--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Felix Lohmeier

unread,
Apr 22, 2018, 5:55:17 PM4/22/18
to OpenRefine
Thanks Thad, I will have a look at Kylo.
Reply all
Reply to author
Forward
0 new messages