huge file not uploading in openrefine

326 views
Skip to first unread message

Petros Liveris

unread,
Sep 24, 2020, 4:51:38 AM9/24/20
to OpenRefine
Dear all,

i have a csv that has ~ 4.000.000 rows, when i try to upload it to openrefine, i get the following error:

Error uploading data
Processing of multipart/form-data request failed. No space left on device

i have run openrefine with the following command:

./refine -m 40G -i 0.0.0.0

and in the partition where openrefine is installed, i have free 20gb

How should I fix this?

Thank you in advance

Thad Guidry

unread,
Sep 24, 2020, 7:22:22 AM9/24/20
to openr...@googlegroups.com
I don't think you are going to need 40G worth of memory allocated for that file.  Try using only 16G , unless your CSV file has very long strings (like whole sentences or paragraphs of text) within it.
You might have some options checkboxes ticked to do pre-processing on the CSV during import that are causing more memory or storage to be consumed.
I'd suggest a few things to lower memory and disk storage overhead:
Don't worry about Parsing cell text into numbers, dates, until after you have imported.
If it is regular ASCII text and not UTF-8, then select the Character encoding as ASCII on the CSV importer.
Don't store blank rows (they are likely unnecessary)
Verify that your CSV is really comma separated values in a file editor. (It might be tabs, or some custom chars you are missing.)
Double check your quote char that's being used inside the CSV file, it might be single quotes instead of regular double quotes or some different char, and if so, change that option slightly to Use character [ ' ] to enclose cells containing column separators.

But first, try using the option:

Load at most [10] rows(s) of data

to see if it can load only 10 rows but still has the same error.  That would mean something else is likely going on.
Check the OpenRefine console logs and tell us if you see more detailed errors you can copy/paste in a reply.



--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/CANN0m7YZZN7XmHou2adxrgFU7DCdj7hzq_OYe_bP7SQcsnE9eg%40mail.gmail.com.

Thad Guidry

unread,
Sep 24, 2020, 7:23:02 AM9/24/20
to openr...@googlegroups.com
Oh, and by the way, what's the file size for the CSV on disk before you try to import it?


Petros Liveris

unread,
Sep 24, 2020, 7:58:21 AM9/24/20
to OpenRefine
The file is 1.4GB, and I do not even get to the screen that asks me to give the options on how parse the data. I feel it is a disk issue, i do not know where openrefine stores the data while uploading it.

On a different machine, i get past this part, but then it takes a lot of time since i have say only 6GB of memory on this machine, and it works in 100% of memory, so it is very slow

i tried with the command

./refine -p 3333 -i 0.0.0.0 -m 24G -d /home/user/

and i get

./refine: line 810: [: 24G: integer expression expected
You have 48138M of free memory.
Your current configuration is set to use 24G of memory.
OpenRefine can run better when given more memory. Read our FAQ on how to allocate more memory here:
/usr/bin/java -cp server/classes:server/target/lib/* -Xms256M -Xmx24G -Drefine.memory=24G -Drefine.max_form_content_size=1048576 -Drefine.verbosity=info -Dpython.path=main/webapp/WEB-INF/lib/jython -Dpython.cachedir=/home/user/.local/share/google/refine/cachedir -Drefine.data_dir=/home/user/ -Drefine.webapp=main/webapp -Drefine.port=3333 -Drefine.host=0.0.0.0 com.google.refine.Refine
Starting OpenRefine at 'http://0.0.0.0:3333/'

14:52:28.869 [            refine_server] Starting Server bound to '0.0.0.0:3333' (0ms)
14:52:28.871 [            refine_server] refine.memory size: 24G JVM Max heap: 22906667008 (2ms)
14:52:28.883 [            refine_server] Initializing context: '/' from '/data/openrefine-3.4/webapp' (12ms)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/openrefine-3.4/server/target/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/openrefine-3.4/webapp/WEB-INF/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14:52:29.509 [                   refine] Starting OpenRefine 3.4 [6443506]... (626ms


the console shows this when the process stops:

14:55:31.921 [                   refine] POST /command/core/get-importing-job-status (1007ms)
14:56:30.916 [                   refine] GET /command/core/get-csrf-token (58995ms)
14:56:30.952 [                   refine] POST /command/core/cancel-importing-job (36ms)

where should i look for openrefine logs?

error.png

Thad Guidry

unread,
Sep 24, 2020, 8:25:38 AM9/24/20
to openr...@googlegroups.com
Yes the -d option is how you can set the data dir.
Notice in the logs it says
-Drefine.data_dir=/home/user

By default, the workspace dir (data_dir) is located in one of several places on Windows, or if using Linux or MacOS.

That console is the log output.
On Windows you can redirect command output using > char such as ./refine > mylogfile.txt
The logs are not saved to disk or file, since there isn't much information in them that is useful to keep around.



Thad Guidry

unread,
Sep 24, 2020, 8:32:00 AM9/24/20
to openr...@googlegroups.com
I think we use temp folder (default with Java env internally) to hold the chunks of the file while uploading, but I could be wrong, would have to check.
Normally looks like System.getProperty("java.io.tmpdir")

I did a web search and saw others having a similar issue with other web applications that use Java and have the same error message as you.
They suggested to clear out the temp folder for your OS and try again.


Petros Liveris

unread,
Sep 24, 2020, 8:45:36 AM9/24/20
to OpenRefine
thank you very much, in my case it was a disk issue, i cleaned the /home/user and all is fine now
thank you again for your kind help

Reply all
Reply to author
Forward
0 new messages