OpenRefine stuck at 0% complete

71 views
Skip to first unread message

Gregory Deych

unread,
Jun 18, 2016, 5:58:50 PM6/18/16
to OpenRefine
I'm using OpenRefine together with reconcile-csv to get matches from two large files containing strings (500,000 rows, ~ 30 MB in size).  I've successfully started the reconcile-csv service on the first file and imported 2nd file into OpenRefine.  However, after several hours of computation CPU utilization at 78%  but the progress is still at 0% and memory utilization is at baseline (1.5 GB overall). The server has 16 GB of memory, of which I allocated 4 GB to the reconcile-CSV and 7.5 GB to OpenRefine JVMs.  The memory shows as committed but not utilized. I suspect the program is not going to complete, but I'm rather at a loss what else to try.


Gregory Deych

unread,
Jun 19, 2016, 2:05:19 PM6/19/16
to OpenRefine

Addendum - after 24 hours, the progress bar is at 2%.  So some activity is going on, but rather slowly.  Memory usage is still at 10%, CPU utilization at 75% percent.  The System is a Xeon E5-2670, running at 2.5 GHZ.  Would a faster CPU result in significant improvement?

Thad Guidry

unread,
Jun 19, 2016, 8:50:37 PM6/19/16
to openrefine
Gregory,

Your running into a constant garbage collection cycles in Java... I.E. it might never ever complete.  So just quit it and lets look at some things.

This is possibly due to a number of things:
1. too many options checked for pre-processing your file.  Don't do any extra operations during import time...it will just slow things down.
2. An Excel file that has is slightly malformed, but that should be saved as a CSV file and then try to import into OpenRefine.
3. Using UTF-8 might not be the best course and could double the amount of memory needed, if you can try just Windows encoding CP-1252 in OpenRefine's encoding importer options. (this might throw away important chars however, so depends on the data you originally have.

Let us know further,

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages