Try removing all options for csv import. I bet there is some issue with your csv. Try also just line based and see if that works. We are here to help. Do not get discouraged.
Am a bit dismayed with Open Refine at the moment. Hoping it is something I am doing or not doing that will speed things up.First the computer specs:HP Proliant DL160, Xeon 2.4ghz x2, 32GB ram, Citrix Enterprise.Anyway I was trying to import a csv file that is 168MB with 1,767,334 rows and 7 columns. When I clicked the create project it started just fine, then quickly went from 2 minutes to over 70 minutes. I edited the ini file and increased ram to 4GB, almost same results, I changed it again this time to 16GB, same results. Finally I shut down ALL my VM's running on the server, assigned 16 CPU's, and gave it 31GB of ram, inside Refine I gave it 28GB of ram.From 11:28 until 11:38 it used 2 CPU's around 95% full throttle then dropped like a rock, at 12:43 was still showing only around 1% to 2% of 1 CPU. Every minute that passes the time remaining got longer and longer.240 minutes remaining Heap usage: 26724/26724MBI have JRE x64 installed to use above 4GB of ram.The VM OS is Windows 7 Ultimate with nothing running except windows core processes, java, and Refine.Here is my ini file# Launch4j runtime config# initial memory heap size-Xms2048M# max memory memory heap size-Xmx28672M# Use system defined HTTP proxies-Djava.net.useSystemProxies=true#-XX:+UseLargePages#-Dsomevar="%SOMEVAR%"***************************************************************I have Refine running from the VM C: Drive which has 9GB free, this VM has a 2TB drive attached directly as removable SATA.Refine default install folder is around 50MB, when I checked it while it was running it went up to 215mb (50+168=215)Inside the refine command prompt window it simply says[refine] POST /command/core/get-importing-job-status (1014ms)That line is simply repeated about every secondSome questions:Refine really does NOT use more than 1 CPU?Why would Refine go from 95% down to 1% when it still is processing the file?Is there anything I can do to speed this thing up?If I throw in 2 SSD drives will this make Refine work any better?Am I missing something here? 168mb csv really takes this long?************Citrix is telling me it is writing data to disk at 122.8kbpsRegards,Bill--
You received this message because you are subscribed to the Google Groups "Open Refine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
If you have several thousand files...you might want to look at ETL software depending on your needs. I use Pentaho data integration, but you might look at Clover ETL or Talend Open Studio.
OpenRefine does not have the goal of replacing existing ETL tools. We primarily focus on cleanup and analysis with scripting support.
We also mention those tools and related software on the wiki under Related Software.