OpenRefine fails on startup - workspace loading

833 views
Skip to first unread message

Vojtěch Dostál

unread,
Dec 16, 2020, 10:21:44 AM12/16/20
to OpenRefine
Hello all,

I have been using OpenRefine for several months, but yesterday I found out that I suddenly could not open it. I tried several solutions and found out that the problem is likely to be in loading of my workspace directory. When I empty it, or just leave three small projects in, it loads normally. However, it fails with bigger projects present (and it's seemingly not a problem in one specific project, but probably jut general problem with the amount of data which OpenRefine needs to load from directory).

I have not changed my settings recently. And this wasn't the case for my laptop a few days ago I could open OpenRefine with all the directories there.

If I slowly decrease the number of files, OpenRefine tab will at one point open, but with an "aw, snap" error page (actually, it's "Out of Memory" error). When I decrease even more, leaving just about 3 projects in, it will then open successfully.

Does anyone have experience with this behavior - a sudden problem with directory loading? 

My current settings are:
# initial memory heap size
-Xms256M
# max memory memory heap size
-Xmx4096M
(changing those up or down has no effect on my problem as far as I can tell)

The full command line output on startup is:

16:10:56.171 [            refine_server] Starting Server bound to '127.0.0.1:3333' (0ms)
16:10:56.182 [            refine_server] Initializing context: '/' from 'C:\Users\vojte\Downloads\openrefine-3.4.1\webapp' (11ms)
16:10:56.967 [            refine_server] Failed to use jdatapath to detect user data path: resorting to environment variables (785ms)
16:10:56.969 [            refine_server] Failed to use jdatapath to detect user data path: resorting to environment variables (2ms)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/vojte/Downloads/openrefine-3.4.1/server/target/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/vojte/Downloads/openrefine-3.4.1/webapp/WEB-INF/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16:10:57.186 [                   refine] Starting OpenRefine 3.4.1 [437dc4d]... (217ms)
16:10:57.186 [                   refine] initializing FileProjectManager with dir (0ms)
16:10:57.186 [                   refine] C:\Users\vojte\AppData\Roaming\OpenRefine (0ms)
16:11:20.441 [                   refine] POST /command/core/load-language (23255ms)
16:11:20.470 [                   refine] GET /command/core/get-preference (29ms)
16:11:20.478 [                   refine] POST /command/core/load-language (8ms)
16:11:20.489 [                   refine] POST /command/core/load-language (11ms)
16:11:20.658 [                   refine] POST /command/core/get-importing-configuration (169ms)
16:11:20.675 [                   refine] GET /command/core/get-all-project-tags (17ms)
16:11:20.719 [                   refine] GET /command/core/get-all-project-metadata (44ms)

thank you for your help
Vojtěch

Thad Guidry

unread,
Dec 16, 2020, 1:00:36 PM12/16/20
to openr...@googlegroups.com
Hi Vojtěch !

So sorry this is happening to you.  Hopefully this might help!

I suggest using your browser in Safe Mode (all extensions disabled)...and then see if you experience things differently.
Also trying Firefox or Chrome or Edge (using a different browser)...to see if things behave differently.

Also, double check that you indeed see your OpenRefine project files/folder under this path as shown in our logs...


16:10:57.186 [                   refine] initializing FileProjectManager with dir (0ms)
16:10:57.186 [                   refine] C:\Users\vojte\AppData\Roaming\OpenRefine (0ms)

--
You received this message because you are subscribed to the Google Groups "OpenRefine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine/6d219401-4fcf-4509-a770-7c0f55478e77n%40googlegroups.com.

Tom Morris

unread,
Dec 16, 2020, 9:44:12 PM12/16/20
to openr...@googlegroups.com
Sorry to hear about your problems! 

What versions of OpenRefine, Java, browser, and operating system are you using?

What, if anything has changed recently, vis a vis:
- OpenRefine version
- Java version 
- Browser version (or which browser)
- Operating system version

Do any of your project names, or their metadata, include non-ASCII (ie multinational) characters?

If you can make a zip/tar of your workspace directory available (publicly or privately), someone might be able to debug a little further.

Tom

--
Message has been deleted

Thad Guidry

unread,
Dec 18, 2020, 1:53:25 PM12/18/20
to openr...@googlegroups.com
Trying it out now for you Vojtěch
I'll let you know how it goes...



On Fri, Dec 18, 2020 at 12:49 PM Vojtěch Dostál <vojtech...@wikimedia.cz> wrote:

Thank you Thad and Tom, I really appreciate your help on this.
Based on your suggestions, I've tried running the browser in Safe Mode and also using a different browser, but that didn't help.

>Also, double check that you indeed see your OpenRefine project files/folder under this path as shown in our logs...
The path seems alright and my OpenRefine seems to be able to understand it - based on the fact that I can open up to about three projects if I copy them to that directory.

>What versions of OpenRefine, Java, browser, and operating system are you using?
OpenRefine 3.4.1, Java Development Kit 1.8.0_271, Google Chrome  87.0.4280.88 64bit, Windows 10 Home, 64bit. 

>What, if anything has changed recently, vis a vis:
- OpenRefine version - nope
- Java version - nope 
- Browser version (or which browser) - nope
- Operating system version - Hmm, I seem to have had an automatic update which kind of coincides with the onset of my troubles:
https://support.microsoft.com/en-gb/help/4592438/windows-10-update-kb4592438 .... however, OpenRefine still fails after uninstall of this update.

There should no longer be any non-ASCII characters in my project names, this was a problem in past and I renamed all that did not comply...

I've uploaded my project directory here:
I would be very happy if anyone could try to load it on their computer...

BTW when I try to open OpenRefine with all my projects present, I sometimes also get this longer error log in the command line:

10:45:44.157 [            refine_server] Starting Server bound to '127.0.0.1:3333' (0ms)
10:45:44.157 [            refine_server] Initializing context: '/' from 'C:\Users\vojte\Downloads\openrefine-3.4.1\webapp' (0ms)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/vojte/Downloads/openrefine-3.4.1/server/target/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/Users/vojte/Downloads/openrefine-3.4.1/webapp/WEB-INF/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
10:45:45.251 [                   refine] Starting OpenRefine 3.4.1 [437dc4d]... (1094ms)
10:45:45.251 [                   refine] initializing FileProjectManager with dir (0ms)
10:45:45.251 [                   refine] C:\Users\vojte\AppData\Roaming\OpenRefine (0ms)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Unknown Source)
        at java.lang.String.<init>(Unknown Source)
        at java.lang.StringBuilder.toString(Unknown Source)
        at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:404)
        at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:280)
        at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:264)
        at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeArray(JsonNodeDeserializer.java:434)
        at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer$ArrayDeserializer.deserialize(JsonNodeDeserializer.java:141)
        at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer$ArrayDeserializer.deserialize(JsonNodeDeserializer.java:126)
        at com.fasterxml.jackson.databind.deser.impl.FieldProperty.deserializeAndSet(FieldProperty.java:138)
        at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:369)
        at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:159)
        at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4014)
        at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3050)
        at com.google.refine.io.ProjectMetadataUtilities.loadFromFile(ProjectMetadataUtilities.java:170)
        at com.google.refine.io.ProjectMetadataUtilities.loadMetaDataIfExist(ProjectMetadataUtilities.java:117)
        at com.google.refine.io.ProjectMetadataUtilities.load(ProjectMetadataUtilities.java:99)
        at com.google.refine.io.FileProjectManager.loadProjectMetadata(FileProjectManager.java:124)
        at com.google.refine.io.FileProjectManager.recover(FileProjectManager.java:404)
        at com.google.refine.io.FileProjectManager.initialize(FileProjectManager.java:83)
        at com.google.refine.RefineServlet.init(RefineServlet.java:139)
        at javax.servlet.GenericServlet.init(GenericServlet.java:241)
        at edu.mit.simile.butterfly.Butterfly.init(Butterfly.java:180)
        at org.mortbay.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:440)
        at org.mortbay.jetty.servlet.ServletHolder.doStart(ServletHolder.java:263)
        at com.google.refine.RefineServer.configure(Refine.java:291)
        at com.google.refine.RefineServer.init(Refine.java:203)
        at com.google.refine.Refine.init(Refine.java:109)
        at com.google.refine.Refine.main(Refine.java:103)

Hope this helps
Vojtěch

Dne čtvrtek 17. prosince 2020 v 3:44:12 UTC+1 uživatel tfmo...@gmail.com napsal:

Thad Guidry

unread,
Dec 18, 2020, 2:19:55 PM12/18/20
to openr...@googlegroups.com
I reproduced the issue.

So it seems that we have an issue in 3.4.1 or possibly even earlier, where XLSX files being import sometimes are creating enormously huge metadata in the "name" field in our metadata.json file.
And if this happens over multiple projects that have imported XLSX files, then the aggregate amount of memory needed to build the Projects array exceeds the Java Heap default memory.
In your case, even 4096mb is exceeded because of the sheer number of XLSX files previous imported.
It probably happened when you simply imported 1 more XLSX file  :-) ...and then the total memory needed to just open and view the Project workspace (where OpenRefine had to parse and read gigs of metadata.json files because of the below bug) was then exceeded.

All your projects are there, uncorrupted and without harm as far as I can tell.
I'll open a bug for us to fix this metadata.json issue with XLSX files specifically.

Good job finding this bug for us!

In the meantime, and to get you productive, and since you have a backup of your project workspace, you can:
1. remove the workspace.json file
2. remove (or move to another folder for safe keeping locally) many of those particular project folders in the workspace that you don't need to work with currently.
3. Start OpenRefine which will then recover and should generate a new workspace.json file with only those relevant project ids found remaining in the workspace folder.


Thad Guidry

unread,
Dec 18, 2020, 3:11:45 PM12/18/20
to openr...@googlegroups.com
Hi Vojtěch
To help with this new issue I created here https://github.com/OpenRefine/OpenRefine/issues/3431

Would it be possible for you to also share and upload to Google Drive the original XLSX file forthis if it's not private and you still have it ?
DATA PRO FRETTIEHO-KOMPLET NAROZENI A UMRTI.xlsx


Message has been deleted

Yael Netzer

unread,
Mar 2, 2021, 7:18:16 AM3/2/21
to openr...@googlegroups.com
Hi, 
Must admit this is something that keeps happening to me in the last weeks, and I keep moving the projects from one directory to another, trying to detect project files that grew up too much.
openrefine 3.4.1 It feels that it happens after importing large files or perhaps has to do with Hebrew?
Sometimes the failure is in the cmd window, but sometimes there is no error but I get this screen --

Thanks!
Yael
image.png
I can create new projects, but when I try to open I get this screen (i.e. tags but no projects)
image.png

14:01:49.255 [            refine_server] Starting Server bound to '127.0.0.1:3333' (0ms)
14:01:49.266 [            refine_server] Initializing context: '/' from 'C:\OpenRefine\openrefine-3.4.1\webapp' (11ms)

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/OpenRefine/openrefine-3.4.1/server/target/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/C:/OpenRefine/openrefine-3.4.1/webapp/WEB-INF/lib/slf4j-log4j12-1.7.18.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14:01:50.569 [                   refine] Starting OpenRefine 3.4.1 [437dc4d]... (1303ms)
14:01:50.569 [                   refine] initializing FileProjectManager with dir (0ms)
14:01:50.570 [                   refine] C:\OpenRefine\yaels_data\data (1ms)
14:02:30.109 [                   refine] POST /command/core/load-language (39539ms)
14:02:30.164 [                   refine] GET /command/core/get-preference (55ms)
14:02:30.213 [                   refine] POST /command/core/load-language (49ms)
14:02:30.229 [                   refine] POST /command/core/load-language (16ms)
14:02:30.933 [                   refine] POST /command/core/get-importing-configuration (704ms)
14:02:30.973 [                   refine] GET /command/core/get-all-project-tags (40ms)
14:02:31.098 [                   refine] GET /command/core/get-all-project-metadata (125ms)
14:03:50.044 [                   refine] GET /command/database/saved-connection (78946ms)


On Tue, 22 Dec 2020 at 22:59, Vojtěch Dostál <vojtech...@wikimedia.cz> wrote:
Thank you Thad! It's good to hear that my question actually turned out to be helpful.
I'll abide by your recommendations in the meantime.

Dne pátek 18. prosince 2020 v 21:11:45 UTC+1 uživatel thadg...@gmail.com napsal:

Vojtěch Dostál

unread,
Mar 2, 2021, 7:57:34 AM3/2/21
to OpenRefine
Hi Yael, 

I sometimes got similar "blue screen" errors too. In my case, filtering out the superhuge metadata files helped (see https://github.com/OpenRefine/OpenRefine/issues/3431). If you also have huge metadata files (several MB or even hundreds of MB), this is something worth looking into.

cheers
Vojtech

Dne úterý 2. března 2021 v 13:18:16 UTC+1 uživatel yael....@gmail.com napsal:

Tom Morris

unread,
Mar 2, 2021, 3:46:29 PM3/2/21
to OpenRefine
Hi Yael. Sorry to hear you are having trouble. I'm not sure your problem is related, since there's no "Out of Memory" errors in your log, so it might be worth creating a separate thread/issue. If you to the Javascript console of your browser (usually under something like View -> Developer -> ...), do you see any error messages there?

Regards,
Tom

Reply all
Reply to author
Forward
0 new messages