Handling large projects & memory issues in OpenRefine

112 views
Skip to first unread message

Thad Guidry

unread,
Jan 29, 2016, 1:13:20 PM1/29/16
to openrefine
Recently Java 8 has arrived and offers a few benefits that can directly impact and improve performance of OpenRefine without the development team having to change our code to improve performance. (something that is still on our road map)

Java 8 offers a new feature called String Deduplication that is available inside the JVM itself.  In practice, the performance hit for using it from what I have witnessed is absolutely marginal to none at all.

In order to turn on String Deduplication (and additionally Print the statistics as well in the command console if you desire) you can uncomment and add the following JAVA_OPTIONS to your refine.ini file:

-XX:+UseG1GC -XX:+UseStringDeduplication -XX:+PrintStringDeduplicationStatistics

​Enjoy !

Owen Stephens

unread,
Feb 1, 2016, 10:24:01 AM2/1/16
to OpenRefine
Hi Thad,

I found that to get this to work I had to put each option on a separate line in refine.ini:

JAVA_OPTIONS=-XX:+UseG1GC
JAVA_OPTIONS=-XX:+UseStringDeduplication
JAVA_OPTIONS=-XX:+PrintStringDeduplicationStatistics

(not sure if this is expected or not, but when I put these in a single JAVA_OPTIONS line I got errors of the nature /var/folders/50/vz8ngr_13gj1syf7wdwhlvkh0000gn/T/refine.XXXXXXX.OJo5icDJ: line 3: export: `-XX:+PrintStringDeduplicationStatistics': not a valid identifier)

Owen

Owen Stephens

unread,
Feb 1, 2016, 10:24:37 AM2/1/16
to OpenRefine
That's on OS X 11.3

Thad Guidry

unread,
Feb 1, 2016, 11:22:48 PM2/1/16
to openrefine
Wierd.  Dunno about OS X. :)  Thanks for the update however , Owen !

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

qi cui

unread,
Feb 1, 2016, 11:29:57 PM2/1/16
to OpenRefine
Not sure if adding the double quote between the value will help. OR use a lazy way to parse the ini file by sed the file and run as a shell to export the variables. 
Reply all
Reply to author
Forward
0 new messages