File Size Limits on CSV

1,915 views
Skip to first unread message

Allan Li

unread,
Dec 15, 2014, 4:30:16 AM12/15/14
to openr...@googlegroups.com
Hi guys,

Trying to load in a 1.2GB CSV file into OpenRefine and after loading it all in and waiting for it to do its thing, it throws the follow error message: "Unknown error. No technical details" and that's it. Am I doing something wrong or is the file too big?


Thanks
Allan

Tom Morris

unread,
Dec 15, 2014, 9:56:48 AM12/15/14
to openr...@googlegroups.com
On Mon, Dec 15, 2014 at 4:30 AM, Allan Li <allan...@gmail.com> wrote:

Trying to load in a 1.2GB CSV file into OpenRefine and after loading it all in and waiting for it to do its thing, it throws the follow error message: "Unknown error. No technical details" and that's it. Am I doing something wrong or is the file too big?

We strive to never provide such unhelpful error messages, so I'd consider that a bug.  Were there any messages logged on the console that the Refine server was started from?  If it was run detached, you could try starting it from a console to see if there's any additional error info available.

Whether the file is too big is a little difficult to determine.  How many rows and columns? How much JVM heap did you allocate? (It certainly won't work with the default heap size.)

A file that big could be pushing the limits, but I would have thought we could handle it with sufficient available memory.

Tom

Thad Guidry

unread,
Dec 15, 2014, 11:35:54 AM12/15/14
to openrefine
I did most of the large file testing with a 2GB csv file before with about 6 columns in it and using 16GB Ram.

Like Tom says however, it depends on if your 1.2GB csv file has many many columns, then it certainly might require over 20GB RAM to process with the current data architecture underpinnings of OpenRefine.

When you say "loading it all in"...was this on the Preview ? or had it actually passed the preview and you had clicked on the Create Project button ?

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Allan Li

unread,
Dec 15, 2014, 3:48:53 PM12/15/14
to openr...@googlegroups.com
Were there any messages logged on the console that the Refine server was started from?

I'll have to check again and get back on this one
 

How many rows and columns? How much JVM heap did you allocate? (It certainly won't work with the default heap size.)
 
How do I change the JVM heap? It looked like it was only using 965MB. This might be the issue. I'm expecting the file to contain at least 20M to 30M rows.


When you say "loading it all in"...was this on the Preview ? or had it actually passed the preview and you had clicked on the Create Project button ?

That's correct, I finished the preview and had already clicked on the Create Project button. Then the "Reading ABC_FILE" with time remaining and heap usage.



Thanks
Allan 

Thad Guidry

unread,
Dec 15, 2014, 8:54:21 PM12/15/14
to openrefine
Allan,

has the FAQ section that describes how to allocate more memory.


Tom Morris

unread,
Dec 15, 2014, 10:16:11 PM12/15/14
to openr...@googlegroups.com
On Mon, Dec 15, 2014 at 3:48 PM, Allan Li <allan...@gmail.com> wrote:

How many rows and columns? How much JVM heap did you allocate? (It certainly won't work with the default heap size.)
 
How do I change the JVM heap? It looked like it was only using 965MB. This might be the issue. I'm expecting the file to contain at least 20M to 30M rows.

If you're using the default heap size, which I think is 1 GB, you definitely won't have enough for a file this size.  See the FAQ that Thad linked to for instructions on how to increase it. 

When you say "loading it all in"...was this on the Preview ? or had it actually passed the preview and you had clicked on the Create Project button ?

That's correct, I finished the preview and had already clicked on the Create Project button. Then the "Reading ABC_FILE" with time remaining and heap usage.

If the loading progress bar stays red for more than a second or two, indicating that memory is almost maxed out, you might as well kill the import because it's never going to complete.  Even for very large files, the progress indicator should show steady progress at a reasonable pace.  If it stalls, there's probably a problem that's going to keep the import from completing.

BTW, you didn't mention what version you are using.  If you aren't already, you should use the OpenRefine 2.6 beta, not 2.5. 

Please report back on how you make out.  We're always interested in hearing what works and what doesn't.

Tom
Reply all
Reply to author
Forward
0 new messages