What's you favorite OpenRefine tutorial ?

110 views
Skip to first unread message

Florian Giroud

unread,
Mar 4, 2021, 9:19:42 AM3/4/21
to OpenRefine
Hello everyone,
We are looking for inspiration to write end to end tests for OpenRefine
And we are looking for a "perfect" OpenRefine project, that covers important features of the software.
We were thinking about using popular OpenRefine tutorials and move them to an e2e test, that would also ensure the tutorial is not broken.

So, what is your favorite OpenRefine tutorial?

Best,
- Florian Giroud

Owen Stephens

unread,
Mar 4, 2021, 10:00:28 AM3/4/21
to OpenRefine
Since I wrote the initial material for this, it seems immodest to say it's my "favourite", but my intention in writing it was to cover the important features of the software (at least for the audience I was writing for). The result was the Library Carpentry OpenRefine lesson https://librarycarpentry.org/lc-open-refine/

This tutorial is made up of 13 "episodes" and these introduce functionality as follows:

Create project from a csv file (the data imported at this step is used throughout the rest of the tutorial and has been setup to include issues that need 'fixing')

Split multi-value cells
Records mode
Join multi-value cells

Create a text facet
Filter data rows by selecting value
Use "Include" to select multiple values
Use "Invert" to invert selection
Use "Facet by blank" menu option
Text filters and other facet types are mentioned in this episode but the tutorial doesn't give direct examples of their use

Split multi-valued cells (again)
Use clusters via Edit cells -> Cluster and edit
Use key collision + fingerprint to find clusters

Reorder columns via All menu, Edit columns->Re-order / remove columns
Rename column via Edit column -> Rename this column
Sort data

Use Edit cells->Common transforms->Collapse consecutive whitespace to apply transformation to data

Use Edit cells->Transform… to access GREL expression editor
Apply value.toTitleCase() expression

Undo / Redo panel
Extract and Apply history

Use Edit cells->Transform… to access GREL expression editor
Apply value.toDate() expression
Use Edit column->Add column based on this column and apply value.toString() expression to a Date value

Create an array using value.split(",")
Extract value from array using square brackets notation e.g. value.split(",")[0]
Sorting arrays
Create custom text Facet->Custom text facet...
GREL "contains" function
GREL "match" function
Reversing arrays
Joining arrays

Export options introduced but no worked example

Star rows
Facet by star
Add column by fetching URLs
Use HTTP header fields
String concatenation using "+"
Using parseJson()
Using a reconciliation service to reconcile data
Using reconciliation facets Judgement and best candidate’s score
Using "double tick" to reconcile a cell and all identical cells
Using Reconcile->Actions->Match each cell to its best candidate
Extracting ID from reconciliation source using cell.recon.match.id
The "cross" function is introduced but no worked example

I think this is a pretty comprehensive introduction to all the main functionality, and it is designed to work from a single CSV import with exercises that are designed specifically to work with that data
The scenarios used are, I believe, realistic and based on my experience of working with this type of data (in this case data describing articles published in scientific journals/serials)

Hopefully it gives a useful idea of what might be covered in the tests. The content of the tutorial is managed via GitHub https://github.com/LibraryCarpentry/lc-open-refine

Best wishes

Owen

Florian Giroud

unread,
Mar 10, 2021, 4:45:06 AM3/10/21
to OpenRefine
Thank you very much, that's exactly what we needed, and sorry for the delay to answer 

Issue created : https://github.com/OpenRefine/OpenRefine/issues/3709

Best,
- Florian Giroud
Reply all
Reply to author
Forward
0 new messages