JDBC in OpenRefine?

798 views
Skip to first unread message

MRB

unread,
May 21, 2019, 5:37:03 PM5/21/19
to OpenRefine
Greetings,

I'm a brand new OR user and am excited about learning more and using it to examine and help me clean up our data.  I see that OR 3.1 enables the user to login to 3 databases, but I don't know how to enable it to use other database JDBC jarfiles such as Oracle and MSSQL.  I did a search in here before asking the question, but didn't see any mention.  Are there plans to let us users copy jars into a folder that OR can pick up and provide a dropdown for?  It would be incredibly helpful to analyze our data without having to export to a CSV first.

Thanks,
Mark

MRB

unread,
May 21, 2019, 6:03:43 PM5/21/19
to OpenRefine
I forgot to mention that a lot of our legacy data that now lives in Oracle tables that we post for public consumption via text files is full of special characters, and we have trouble generating clean csv and have had to resort to using fixed-with text files.  So it's difficult to even get a complete csv to use for profiling, analyzing the data. I'm hoping that we could read a table directly from a database into OR, clean up, or at least identify the records with weird data and clean it up.   Thanks, Mark

Thad Guidry

unread,
May 21, 2019, 7:09:55 PM5/21/19
to openr...@googlegroups.com
Hi Mark,

If you know a bit of Java or can get a student or co-worker to hack a bit, its pretty easy to extend what we already provide with a few bits of additional code...

2. Provide the connection and service classes, look at the PostgreSQL one or MySQL one as examples: https://github.com/OpenRefine/OpenRefine/tree/master/extensions/database/src/com/google/refine/extension/database
4. Add drivers manually to the classpath, or update the pom file to provide them as dependencies as other DB libraries are done: https://github.com/OpenRefine/OpenRefine/blob/master/extensions/database/pom.xml

Hopefully I didn't forget anything.
And yes, someone should probably write this up on our Wiki pages.



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/28a9bbfd-8940-4045-9320-f9036522c9af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mark Bustin

unread,
May 21, 2019, 9:05:04 PM5/21/19
to openr...@googlegroups.com
Hi Thad,
Thank you for your response but  I am no programmer and I cannot do this. If OR modeled other java projects like dbeaver then users like myself that are database proficient could add jar files and profile structured data - would be incredibly helpful.  I’ll wait and hope someone can do this. Thanks! Mark

Thad Guidry

unread,
May 21, 2019, 9:55:34 PM5/21/19
to openr...@googlegroups.com
Yes Mark,

I understand your needs very well.  We will get there, it just requires a bit more work in the future and we have it planned.
I also am used to tools that work similarly, but with our limited budget we had to make adjustments to what could be built quickly with existing developer help that was paid for some of it and the rest donated some of their time.



Ettore Rizza

unread,
May 22, 2019, 4:44:38 AM5/22/19
to OpenRefine
Hello Mark,

In the meantime, a workaround could be to import your fixed-width text file into OpenRefine (it reads them as well as CSVs), clean it up, then use the templating feature to put your data into an SQL script in order to recreate your table. I don't know if that's an option for you. 


On Wednesday, 22 May 2019 03:05:04 UTC+2, MRB wrote:
Hi Thad,
Thank you for your response but  I am no programmer and I cannot do this. If OR modeled other java projects like dbeaver then users like myself that are database proficient could add jar files and profile structured data - would be incredibly helpful.  I’ll wait and hope someone can do this. Thanks! Mark
On Tue, May 21, 2019 at 7:09 PM Thad Guidry <thadg...@gmail.com> wrote:
Hi Mark,

If you know a bit of Java or can get a student or co-worker to hack a bit, its pretty easy to extend what we already provide with a few bits of additional code...

2. Provide the connection and service classes, look at the PostgreSQL one or MySQL one as examples: https://github.com/OpenRefine/OpenRefine/tree/master/extensions/database/src/com/google/refine/extension/database
4. Add drivers manually to the classpath, or update the pom file to provide them as dependencies as other DB libraries are done: https://github.com/OpenRefine/OpenRefine/blob/master/extensions/database/pom.xml

Hopefully I didn't forget anything.
And yes, someone should probably write this up on our Wiki pages.



On Tue, May 21, 2019 at 5:03 PM MRB <mrbu...@gmail.com> wrote:
I forgot to mention that a lot of our legacy data that now lives in Oracle tables that we post for public consumption via text files is full of special characters, and we have trouble generating clean csv and have had to resort to using fixed-with text files.  So it's difficult to even get a complete csv to use for profiling, analyzing the data. I'm hoping that we could read a table directly from a database into OR, clean up, or at least identify the records with weird data and clean it up.   Thanks, Mark

On Tuesday, May 21, 2019 at 5:37:03 PM UTC-4, MRB wrote:
Greetings, 

I'm a brand new OR user and am excited about learning more and using it to examine and help me clean up our data.  I see that OR 3.1 enables the user to login to 3 databases, but I don't know how to enable it to use other database JDBC jarfiles such as Oracle and MSSQL.  I did a search in here before asking the question, but didn't see any mention.  Are there plans to let us users copy jars into a folder that OR can pick up and provide a dropdown for?  It would be incredibly helpful to analyze our data without having to export to a CSV first.

Thanks,
Mark

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openr...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openr...@googlegroups.com.

Mark Bustin

unread,
May 22, 2019, 5:36:21 AM5/22/19
to openr...@googlegroups.com
Good to know Thad - Thank you.
Mark

Mark Bustin

unread,
May 22, 2019, 6:34:36 AM5/22/19
to openr...@googlegroups.com
Thank you Ettore, thinking about doing that.   - Mark

To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/055833fb-2abe-4185-9659-f1a43e05cc28%40googlegroups.com.

Thad Guidry

unread,
May 22, 2019, 8:37:24 PM5/22/19
to openr...@googlegroups.com, a.j....@open.ac.uk
You can use the Database exporter instead of our Templating exporter.

The Database export actually gives you Templating options essentially.  It does not connect to your DB table, but instead just creates a SQL file.  You then import the SQL file manually.

Tony Hirst (cc'd) has a good blog post on our feature: https://blog.ouseful.info/2018/06/19/quick-notes-openrefine-working-with-databases/



MRB

unread,
Jun 6, 2019, 9:29:29 PM6/6/19
to OpenRefine
Thanks Thad.  Have you ever used the open source java dba tool dbeaver?  Handles jdbc very nicely and even helps download the proper jdbc drivers.  Something like that would be great, I wonder if that code can be borrowed? :)

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openr...@googlegroups.com.

Thad Guidry

unread,
Jun 6, 2019, 10:07:35 PM6/6/19
to openr...@googlegroups.com
No I have not.
But I guess you are talking about their Apache 2 licensed source under their plugins folder?
 


To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/95e38869-4132-4b4a-ae2e-c84c0bdeb51a%40googlegroups.com.

MRB

unread,
Jun 7, 2019, 10:35:03 AM6/7/19
to OpenRefine
I guess, just downloaded and installed, did not pay attention to code in Github, not a developer.  But the way it handles multiple dbs and even assists in the driver download/installation is nice.  Thanks, Mark

Roguejackal99 Jackalees

unread,
May 17, 2021, 10:45:50 AM5/17/21
to OpenRefine
Hi Thad. Congratulations for this grate tool. 
I was wondering if there is any progress in adding Oracle database to the list of databases to rear write from?. I have a team that would be more that thankful for a tool capable of doing that. Right now we download data to CSV files and then upload again but it wastes a lot of time....

Thanks in advance.
David.

Thad Guidry

unread,
May 17, 2021, 11:33:27 AM5/17/21
to openr...@googlegroups.com
Hi David,

Yes, it should be fairly easy to add Oracle support via JDBC.  How you go about that should be discussed on our developer mailing list and not this user mailing list.
It would be expected that folks would download the Oracle JDBC driver .jar file and place it into the folder
Take a look at the extension folders and files where you can create a new subfolder for `oracle` under /extension/database


Here's some technical docs to get started...but likely you can just copy/paste from one of the other database vendor folders to have a starting point.
https://docs.openrefine.org/technical-reference/writing-extensions#introduction
The general steps were mentioned by me earlier on May 21, 2019 in this same thread.

The final caveat, I am not sure on Oracle licensing nowadays to include its JDBC driver into our packaging via a POM dependency.  I don't think you can do that.
What can be done is update some of our docs to instruct users who want to use Oracle db extension to first manually download the JDBC .jar file and add it to the classpath for OpenRefine to use upon startup.
(or provide a nice little shell script that folks could run easily and give instructions for that)


Thad Guidry

unread,
May 17, 2021, 11:53:26 AM5/17/21
to openr...@googlegroups.com
I've just made a PR to update the README.md for the database extension to help you further.

Reply all
Reply to author
Forward
0 new messages