With 2Gb memory, it started to struggle around 1.35million rows, and failed to do the operations (although successfully loaded) on 1.55million rows
With 4Gb memory, there is no sudden up tick in time taken to do the operations - it just gradually increases until it failed to do the operations on 2.625million rows (n.b. the script allows you to do repeated loads of the same file size and take an average - I've not done this here which is probably why those spikes are in there)
The script is very basic and ideally I'd like to try doing some other type of test - like increasing the number of columns as well as rows, and trying loads of the same data in different formats (e.g. xls vs csv) - feel free to take the script and adapt it if you are interested. For example I've just done a test run with an 8 column file (1Gb memory) and in that case it failed slightly earlier at 600k rows. Also at the request of Scott Carlson on Twitter (https://twitter.com/scottythered) I ran a test where I loaded and exported the data, but didn't carry out any operations. I only did this with 1Gb memory assigned and found that it would keep loading the data up to around 1.5million rows (but good luck in doing anything with the data once it is OR!):
If anyone has done anything similar I'd be interested to know if you saw similar results. If you have a particular aspect of OpenRefine you'd like to see tested in terms of data volume, let me know or add an Issue to the script GitHub. Also feel free to take and develop the script if you want, or ask questions/make requests.
Best wishes
Owen
--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
*If you inadvertently leave a p3.16xlarge running over a 3 day weekend, you'll have a $2,000 surprise on Tuesday morning. For a month: $17,000.
Caveat Emptor: Of course, in general, there are scenarios where "more" actually results in diminishing returns (i.e., Java garbage collection issues).
--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/e4a67cb0-5155-4980-976b-0c247b1d1326n%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "OpenRefine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openrefine/-loChQe4CNg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/492601bd-5f18-431f-98e7-518f7c66261fn%40googlegroups.com.
You received this message because you are subscribed to a topic in the Google Groups "OpenRefine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openrefine/-loChQe4CNg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/492601bd-5f18-431f-98e7-518f7c66261fn%40googlegroups.com.
Hi Owen,I start OpenRefine by clicking the OpenRefine.exe, and have used openrefine.l4j.ini to change the memory. See below for how I have it configured.I'll try your recommendation and start OpenRefine by running refine.bat and configure the refine.ini file.Thanks again!Jennifer
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/8f75e253-abb3-463d-9cb4-9a43da5020f1n%40googlegroups.com.
--Jennifer NewcomerResearch DirectorPhD CandidateUniversity of Colorado, DenverUrban & Regional Planning | Geography
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/8f75e253-abb3-463d-9cb4-9a43da5020f1n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/ec14f2b9-adea-4042-b0e6-ebb9a0ce54c0n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/da2b940e-9dd4-463a-a2f1-dcb6eb591de2n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CAChbWaNSG_%2BbXNFNxN5xnFcL5r3bcrcAw1y8%2BXEZ_31er63oRw%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/7bbd06fd-5aa3-4397-8f5a-55f91eb2a906n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/073a2952-7e4a-4882-a8cb-cf4682107090n%40googlegroups.com.
Hi Owen,