text annotation/relationships in OpenRefine?

23 views
Skip to first unread message

Erik Paulson

unread,
Mar 12, 2021, 1:29:12 PM3/12/21
to openr...@googlegroups.com
I'm interested in being able to do some text annotation in OpenRefine - it's a middle step in some workflows, and so it'd be nice to be able to do that right with a UI hosted in OpenRefine. 

I'm thinking something like putting the the Brat interface on a cell (as used by the Stanford NLP tools):
Stanford-CoreNLP-small.png
(from https://brat.nlplab.org/examples.html  - but there are lots of other example annotators, some of which are likely better than Brat)

so there'd be a UI to select, annotate, and create relationships between entities in a cell. Then, the resulting annotation datastructure could be put into another cell - ideally with a nice set of GREL/Jython functions to do common operations, like spit out the entities of a given type, etc. Ideally it'd support nested entities.

Is there anything like this in an OpenRefine extension? 

Is this something that can even be done through an extension? - I kinda assume that popping up a dialog and doing the annotation for a cell is easy enough to do in a new dialog from the extension, but I'm less clear if an extension can somehow give a cell a new visual representation for the main OpenRefine display grid, or if an extension can attach extra data to a cell/add new GREL functions that can access that extra data. (and maybe there's more that needs to happen)

Happy to open this as a Github issue instead of a mailing list discussion. I'm not an NLP person so I'm not an expert and don't have a firm idea about how something like this should work, but I think I see how a base annotation use case could work.

Thanks,

-Erik

Thad Guidry

unread,
Mar 12, 2021, 2:05:17 PM3/12/21
to openr...@googlegroups.com
Hi Erik!

I think a different kind of editing interface would probably be best for annotation that is provided out of the box by our current roadmap on OpenRefine.
OpenRefine's columnar grid layout is probably not the nicest for the kind of workflow you are suggesting.  But of course that could be changed, given lots of user feedback and interest.

I can envision however an extension that can isolate and expose an annotation workflow on a single column (nice wider view in the browser) that has cells of text.
While it builds a relationship graph on the backend, perhaps outside of OpenRefine server backend, and using other technologies (Wikibase, graph, db, etc.) to store the relationships.

The workflow that most language professionals like to use involves both some automatic annotation, along with manual review and editing.

Relatedly, there are a few nice projects (non-exhaustive listing of a few of my favorites), that it might help to look over, or even use alternatively:



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAKJO4n5oNrGyY156geu%3DrLSZSuP-2Nnot4EWOYEP552tegv9Dg%40mail.gmail.com.

Erik Paulson

unread,
Mar 12, 2021, 2:33:27 PM3/12/21
to openr...@googlegroups.com
Oh, I wasn't clear - the relationships I talking about were just "relationships" inside of the annotation, like "X is the target of Y in this sentence", so it wouldn't need a new backend, it's just part of the annotation. 

I'm not exactly doing NLP work, and the columnar interface is actually OK for me.

I'm looking be able to add annotations to the data in some cells and have that annotation data available in the row - right now I'd have to kick out of OpenRefine and come back into OpenRefine, so to simplify that step I'd like to have the UI right in OpenRefine. 

I agree that for editing a cell, you'd want to have a dialog or something else that gave you a wider view of the cell.

Using OpenRefine might be a bit clunky to do for a lot of cells, but it's better than not being able to do on any cells :)

-Erik



Thad Guidry

unread,
Mar 12, 2021, 2:51:21 PM3/12/21
to openr...@googlegroups.com
Hmm, curious...why are you coming BACK into OpenRefine?
Can you explain the workflow in more detail?

Tom Morris

unread,
Mar 12, 2021, 2:58:18 PM3/12/21
to openr...@googlegroups.com
There is an NLP extension that uses the Stanford NLP package: https://github.com/stkenny/Refine-NER-Extension

This is all software, so anything's possible, but brat consists of not only the web front end, but a backend server. In principle, you could integrate the two disparate tools, but it seems like a lot of work for minimal gain. Also, the brat standoff annotation format isn't particularly friendly for post-processing, so you'd probably need a set of dedicated functions to support it. I'm having a hard time envisioning what type of workflow would have this in the middle.

Tom

--

Erik Paulson

unread,
Mar 12, 2021, 3:14:12 PM3/12/21
to openr...@googlegroups.com
I can use the annotations in downstream OpenRefine operations - I can use the results of the annotations in facets and filters, and I can pass it along and customize my reconciliation service to be able to take advantage of it. There's a ton of useful things that OpenRefine can do downstream.

Sometimes rather than writing a bunch of GREL and regexes to try and pull data out of a cell, it's sometimes just easier to have a GUI to mark up the cell. Again, I get that it doesn't scale to having a human do that for 1000s of cells, but I'd like to start somewhere.


Erik Paulson

unread,
Mar 12, 2021, 3:30:20 PM3/12/21
to openr...@googlegroups.com
The NER extension is the opposite of what I want. :)

Brat is an example of the sort of annotation UI I was thinking, I'm not necessarily looking to integrate Brat and OpenRefine. 

My questions really are can an extension change what's displayed in an OpenRefine cell - for annotated cells, can I give back a custom HTML representation for that cell, and can an extension add additional data to a cell that I can pull out in an GREL function - could I write an extension so I can write GREL that does something like
 
cells['AnnotatedColumn'].annotatedDataStructure 
(or maybe cells['AnnotatedColumn'].customdata['AnnotationExtension']['annotatedDataStructure']  - some kind of scratch pad that extensions can use for each cell?)

ie how can an extension extend a cell object and make it available through GREL?

-Erik


Tom Morris

unread,
Mar 12, 2021, 4:00:27 PM3/12/21
to openref...@googlegroups.com, openr...@googlegroups.com
[The details of extension writing are more appropriate to the openrefine-dev list, so moving the main list to cc (because I seem to remember Google Groups will refuse to deliver bccs]

On Fri, Mar 12, 2021 at 3:30 PM Erik Paulson <epau...@unit1127.com> wrote:

can an extension change what's displayed in an OpenRefine cell - for annotated cells, can I give back a custom HTML representation for that cell,

There's not a defined interface for pluggable cell renderers/editors, but extensions can basically do whatever they want. Mucking with the internals of the cell renderer might be fragile and not terribly supportable though.
 
and can an extension add additional data to a cell that I can pull out in an GREL function - could I write an extension so I can write GREL that does something like
 
cells['AnnotatedColumn'].annotatedDataStructure 
(or maybe cells['AnnotatedColumn'].customdata['AnnotationExtension']['annotatedDataStructure']  - some kind of scratch pad that extensions can use for each cell?) 

ie how can an extension extend a cell object and make it available through GREL?

There's no defined interface for extensions to add cell metadata (or data). While such a mechanism could be defined, we'd need to figure out how it interacts with serialization, what happens when you try to load the project, but don't have the extension available, how to minimize the performance/storage impact, etc. I'm not sure any of the core team has the time for such a project right now, but we'd be happy to review design proposals.
Reply all
Reply to author
Forward
0 new messages