Extension Writing Advice

77 views
Skip to first unread message

Michael Bowie

unread,
Nov 21, 2017, 11:03:47 AM11/21/17
to OpenRefine
Dear Community, 

I hope all is well. 

My project is to build a semi-automatic data cleaning extension for Open Refine. My background is C++ so I'm not very familiar with Java or writing extensions for Open Refine. I have read through GiTorto's guide, Owen Stephen's GREL guide, and read through a handful of basic extensions. 

However, I'm still not clear on how exactly the command classes are called, interact with the backend data, and return values to the client. 

Any advice for writing a "Hello World" extensions that sums up the values in a column?

Thank you!

My best,
Michael 

Ettore Rizza

unread,
Nov 21, 2017, 11:17:54 AM11/21/17
to OpenRefine
Hi Michael,

I am not a developer, so I can not answer your main question. But when I want statistics on a column of numbers, such as a sum, I use this extension.




Best regards,

Owen Stephens

unread,
Nov 21, 2017, 11:30:20 AM11/21/17
to OpenRefine
Hi Michael,

I'm still learning some of this stuff myself, so I'm not sure how much I can help, but the following extension might be worth looking at https://github.com/sparkica/refine-stats

This collects all the values in a column that can be interpreted as a number in a column and then creates a series of stats across the column (including total).

You should be able to see from the extension, from the UI side, this is implemented as a command that can be accessed via the column menu
In the java class com.tribapps.refine.stats.Summarize there is implemented:

A 'RowVisitor' which collects together a list of values across the column
doGet/doPost methods which trigger the calculations and returns the results. doGet gets the current state of the project in OpenRefine so it can ensure only the current filtered rows are passed to the row visitor
And finally 'computeStatistics' which does the calculations on the list of values collected by the row visitor

Hope that gives some more clues as to how you would approach this

Owen

Michael Bowie

unread,
Nov 21, 2017, 5:06:17 PM11/21/17
to OpenRefine
Hi Ettore, 

Yes, this is actually one of the "basic extensions" I was referring to. 

My best,
Michael 

Michael Bowie

unread,
Nov 21, 2017, 5:06:17 PM11/21/17
to OpenRefine
Hi Owen, 

I appreciated your GREL write up. 

Yes, that extension is one that I've been trying to reverse engineer. 

I understand what is happening at a high level as you mentioned; however, trying to implement my own version line-by-line is when I get confused. There are a lot of little details in those functions that I'm not sure what they're doing. Thus, I was wondering what the bare bone essentials are in order to sum up values in a column.

Or maybe any other resources you would recommend going through in order to learn how to write extensions? The client side I understand and can implement. I'm just stuck trying to work with the backend. 

My best,
Michael 

Owen Stephens

unread,
Nov 22, 2017, 5:11:15 AM11/22/17
to OpenRefine
Hi Michael,

Probably stretching the limits of my knowledge here - I'm still working out some of this stuff as I need to - I'm probably at the "knows enough to be dangerous" stage at the moment, as I now understand enough to make changes & enhancements, but don't feel I have a full understanding of the codebase yet.

Anyway - this page explains some of the mechanics (in terms of facets, but actually it is key to understanding the process in place I think) https://github.com/OpenRefine/OpenRefine/wiki/Faceted-Browsing-Architecture

I'm guessing you've already seen https://github.com/OpenRefine/OpenRefine/wiki/Extension-Points. It was the lack of straightforward documentation that led me to write that post on adding simple GREL functions - which is much easier as you don't have to understand the Engine etc. Now I've started to do contributions to the core codebase I've started to get to grips with the Engine and RowVisitor constructs, but I'm still learning.

If there are specific questions you can ask about that stats extension then I (or someone) might be able to answer - so it might be the best approach if you can try to narrow down to the bits that are unclear - "why is this line there"; "what does this thing do" etc.

Improving the documentation on the wiki is on my to-do list (both for users and developers) - and if there is anything you can contribute (whether that's some documentation, notes, or just a list of the things you'd like to see in the documentation) - that would be very helpful.

Best wishes

Owen

Michael Bowie

unread,
Dec 5, 2017, 1:09:34 PM12/5/17
to OpenRefine
Hi Owen, 

Looking at the ExtraCTU-plugin, for example, where is the "createOperation" method of the "ExtractionCommand" class get called? I'm not seeing where the javascript files are calling this method? This method appears to be the one that actually begins the extraction process. 

My best,
Michael 

Antonin Delpeuch (lists)

unread,
Dec 5, 2017, 4:16:58 PM12/5/17
to openr...@googlegroups.com
Hi Michael,

A client-side JavaScript function cannot call a Java method directly,
they live in different universes! So, here is the process:

* An HTTP request is triggered by the JavaScript code
* The Java server recieves the HTTP request and matches it to one of its
known routes
* The Java server instanciates the Java class associated with that route
* it calls the relevant method on that Command object, for instance
doPost if the HTTP request used the POST method
* the method returns a response
* the Java server wraps that response in HTTP and sends it back to the
browser
* the JavaScript callback is called with the result of that request

Although OpenRefine is generally run as a local service, there is a
clear separation between its server side (the Java application) and its
client side (the web application that you run in your browser).

This client/server separation is pretty standard. It might be useful to
read some intro course to building web apps if that sounds crazy to you.

Cheers,
Antonin

On 05/12/2017 18:09, Michael Bowie wrote:
> Hi Owen, 
>
> Looking at the ExtraCTU-plugin, for example, where is the
> "createOperation" method of the "ExtractionCommand" class get called?
> I'm not seeing where the javascript files are calling this method? This
> method appears to be the one that actually begins the extraction process. 
>
> My best,
> Michael 
>
> On Wednesday, November 22, 2017 at 5:11:15 AM UTC-5, Owen Stephens wrote:
>
> Hi Michael,
>
> Probably stretching the limits of my knowledge here - I'm still
> working out some of this stuff as I need to - I'm probably at the
> "knows enough to be dangerous" stage at the moment, as I now
> understand enough to make changes & enhancements, but don't feel I
> have a full understanding of the codebase yet.
>
> Anyway - this page explains some of the mechanics (in terms of
> facets, but actually it is key to understanding the process in place
> I
> think) https://github.com/OpenRefine/OpenRefine/wiki/Faceted-Browsing-Architecture
> <https://github.com/OpenRefine/OpenRefine/wiki/Faceted-Browsing-Architecture>
>
> I'm guessing you've already
> seen https://github.com/OpenRefine/OpenRefine/wiki/Extension-Points
> <https://github.com/OpenRefine/OpenRefine/wiki/Extension-Points>. It
> --
> You received this message because you are subscribed to the Google
> Groups "OpenRefine" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to openrefine+...@googlegroups.com
> <mailto:openrefine+...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Owen Stephens

unread,
Dec 5, 2017, 6:07:34 PM12/5/17
to OpenRefine
Hi Michael,

So trying to delve into some of the specifics for ExtraCTU plugin:

In the extensions module/MOD-INF/lib/controller.js registers the command and the associated java class (this is mentioned in https://github.com/OpenRefine/OpenRefine/wiki/Extension-Points)
  register("estrazioni", new commands.ExtractionCommand()); //comando chiamato in post
(I haven't entirely got my head around the mechanics of this registration process yet I'm afraid)

controller.js also injects other js files into the project page - including dialogs/extraction.js - which contains the "extract" function
The extract function includes a call to Refine.postProcess
Refine.postProcess('extraction-extension', 'estrazioni', data, {},
           
{ rowsChanged: true, modelsChanged: true });

Refine.postProcess is defined in /main/webapp/modules/core/scripts/project.js (in OpenRefine, not in the extension) - it's this postProcess which sends the request to the Java backend - basically a route and a JSON package:
$.post(
   
"command/" + moduleName + "/" + command + "?" + $.param(params),
    body
,
    onDone
,
   
"json"
 
);

Then we are into the second part of the process described by Antonin:
* The Java server instanciates the Java class associated with that route  
* it calls the relevant method on that Command object, for instance 
doPost if the HTTP request used the POST method 

This is exactly what happens in this case. However, the ExtractionCommand object extends EngineDependentCommand, which is where the doPost is defined (since it isn't overridden in ExtractionCommand) - and it is this that calls createOperation:

 try {
           
Project project = getProject(request);
           
           
AbstractOperation op = createOperation(project, request, getEngineConfig(request));
           
Process process = op.createProcess(project, new Properties());
           
            performProcessAndRespond
(request, response, project, process);
       
} catch (Exception e) {
            respondException
(response, e);
       
}

Hope that helps

Owen
Reply all
Reply to author
Forward
0 new messages