API Command for Templating Exporter

120 views
Skip to first unread message

Dave Angel

unread,
Jun 23, 2013, 6:25:21 AM6/23/13
to openr...@googlegroups.com
Hi there,

I want to use Refine as part of a batch ETL process. Taking a CSV from an Oracle table, cleaning and transforming as necessary with Refine expressions, transforming to XML and then loading to an XML DB where I will do denormalisations of the multiple normalised relational tables. The tables could run into millions of rows.

I can see how using the python/ruby scripts I could batch up a large CSV into 100K row projects, load and refine. I can't see how I could then export using a templating exporter to turn the output into XML. Is there a URL I can post to to do a custom template export? Do I need to write a custom extension?

I've given the whole process I am trying to perform above so that if anyone has any other thoughts on this then please shout!

Thanks,
Dave

Martin Magdinier

unread,
Jun 24, 2013, 8:21:01 PM6/24/13
to openrefine
Hi Dave,

I am not aware of an automatic functionality to export data from OpenRefine.

OpenRefine is not design to work as an automated ETL and I would recommend other solution to solve your process like penthao or talend (to mention other open source projects). You can use refine to profile your data and define step you need to perform and then implement them in an ETL tool. 

Martin



--
You received this message because you are subscribed to the Google Groups "Open Refine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Dave Angel

unread,
Jun 26, 2013, 6:16:18 AM6/26/13
to openr...@googlegroups.com
Thanks Martin.

I've seen other posts commenting that OpenRefine isn't an ETL tool and I understand. It is however a potentially very useful process when putting your own custom ETL process together...which one often has to do. To be add a little more colour to my previous post...this library:


wraps HTTP calls into OpenRefine, such as:
/command/core/apply-operations?project=#{@project_id}

If there was a URI for the templater I could use this in a batch process. That's all I need. I guess I can go work that out from the source code, however it would be great if the project published these URLs as an API.

Cheers,
Dave

Tom Morris

unread,
Jun 26, 2013, 4:08:16 PM6/26/13
to openr...@googlegroups.com
Hi Dave.  I didn't understand what you were asking for before, so thought I had to put aside some time to go do research.  The name of the command is much simpler to answer.


On Wed, Jun 26, 2013 at 6:16 AM, Dave Angel <batwa...@gmail.com> wrote:
Thanks Martin.

I've seen other posts commenting that OpenRefine isn't an ETL tool and I understand. It is however a potentially very useful process when putting your own custom ETL process together...which one often has to do. To be add a little more colour to my previous post...this library:


wraps HTTP calls into OpenRefine, such as:
/command/core/apply-operations?project=#{@project_id}

If there was a URI for the templater I could use this in a batch process. That's all I need.

You can see the command registry by searching on Github (or your own copy of the sources):


Different modules register their commands in different namespaces, but most stuff is in 'core'.  The command you want is 'export-rows'.

The other way to discover this stuff, if you're more of a front-end web developer, is to inspect the Javascript in your browser near the bit of the UI that issues the command that you are trying to find.
 
I guess I can go work that out from the source code, however it would be great if the project published these URLs as an API.

Well, that's kind of the point.  It's not documented because it's an internal protocol, not an external API.  The folks who wrote the various client libraries reverse engineered the protocol for their use, but we reserve the right to change it. 

We do recognize that some users have a need to be able to do scripted repetitive operations, but we don't have a good out-of-the-box solution right now.  In your scenario, it sounds like a streaming solution would be desirable too (or are you doing global operations e.g. clustering on the whole data set?).

Tom

Dave Angel

unread,
Jul 3, 2013, 12:49:33 PM7/3/13
to openr...@googlegroups.com
Hi Tom, 

Thanks for this. Put me on the right track and I think I can do what I want now. I appreciate all your comments. 

Thanks,
Lee

eric....@canada.ca

unread,
Apr 16, 2017, 12:21:16 AM4/16/17
to OpenRefine
I am trying to get https://github.com/felixlohmeier/openrefine-batch#options  working for json processing ,  but will also want to interact with the templater.  Can anyone share some solutions they have found?

Felix Lohmeier

unread,
Dec 11, 2017, 4:27:24 PM12/11/17
to OpenRefine
On Sunday, 16 April 2017 06:21:16 UTC+2, eric....@canada.ca wrote:
I am trying to get https://github.com/felixlohmeier/openrefine-batch#options  working for json processing ,  but will also want to interact with the templater.  Can anyone share some solutions they have found?

I have added support for templating export today. New version is available at https://github.com/felixlohmeier/openrefine-batch.
Reply all
Reply to author
Forward
0 new messages