Out of memory exception when working with large files and idbfs

555 views
Skip to first unread message

Nolan Darilek

unread,
Aug 20, 2018, 4:33:55 PM8/20/18
to emscripte...@googlegroups.com
Hello,


I'm trying to process large files of geospatial data in a Rust wasm
binary. To do this, I'm trying to load 3.2 GB into IDBFS. Unfortunately,
I'm getting out-of-memory exceptions while writing the data, generally
at around the 50 MB mark but not always. One of my page elements lists
the files in my storage location, and when I reload it I see the file
with the size as it was at around the time of the exception. This to me
suggests that I'm correctly setting up IDBFS and am persisting data to
it. But I'm not sure where this out-of-memory exception is coming from.
Here's my code snippet:


                js! {
                    let filename = @{fh.to_str().unwrap()};
                    let element = document.querySelector("#import");
                    let file = element.files[0];
                    let size = file.size;
                    console.log(file, filename);
                    let chunk_size = 1024*10000;
                    let offset = 0;
                    let reader = new FileReader();
                    let blob = file.slice(offset, chunk_size);
                    let counter = 0;
                    let stream = FS.open(filename, "w");
                    reader.onloadend = function(e) {
                        if(e.target.error) {
                            FS.close(stream);
                            FS.unlink(filename);
                            return console.error(e.target.error);
                        }
                        // if(counter % 100 == 0) {
                            console.log("onload ", offset);
                        // }
                        let result = new Uint8Array(e.target.result);
                        offset += result.length;
                        if(offset >= size)
                            return FS.close(stream);
                        FS.write(stream, result, 0, result.length);
                        FS.syncfs(function(err) {
                            if(err) {
                                return console.error(err);
                            }
                            blob = file.slice(offset,
offset+result.length);
                            e.target.readAsArrayBuffer(blob);
                            counter++;
                        });
                    };
                    console.log("offset=",offset);
                    reader.readAsArrayBuffer(blob);
                }


Maybe I'm hitting some sort of 50MB database limit, but I don't see any
permissions dialog to increase the size, and that wouldn't explain why I
made it to 143 MB at one point. I've set chunk_size to ~1000000 rathern
than ~10000000, but that doesn't seem to help either. And since
reloading gives me a file of the correct size, I assume I'm syncing
correctly from Emscripten to indexdb, but maybe I'm missing something?


Thanks for any help.

Alon Zakai

unread,
Aug 20, 2018, 7:13:17 PM8/20/18
to emscripten-discuss
There is an open PR to split up IDBFS files to avoid browser-specific limitations,


50MB sounds low, but it could be that. Which browser is it in, and does it occur in other browsers?


--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nolan Darilek

unread,
Aug 23, 2018, 7:15:17 PM8/23/18
to emscripte...@googlegroups.com

Hey, sorry for the delay, I was sidetracked with other projects.


This is under Firefox. I don't easily have a way to test under other browsers just yet.


It occurs to me that, as a web developer, I do lots of things at localhost. How would I clear browser storage for all of localhost, preferably via the filesystem? (I.e. where would I look in my profile directory if I wanted to just rm -rf all of localhost's indexdb storage?)


Thanks.

Nolan Darilek

unread,
Aug 24, 2018, 11:11:36 AM8/24/18
to emscripte...@googlegroups.com

OK, I have a bit more to work with. This is definitely not a browser storage issue.


I commented out the code that interacts with Emscripten. (I.e. as of now my code only slices and dices the 3.2G file.) Without Emscripten, things work fine. When I add streaming and sync into the mix, things fail.


I do have an OOM dump if that helps. I don't know enough about these to know whether or not they contain sensitive information. If they don't, then I can post it. Looks like around 3G of arrays, which I suspect is Emscripten.


I'm curious as to what best practices are for processing large files and syncing filesystems. Right now I'm reading 10M chunks and running FS.syncfs(false, ...) after each one. Maybe I'm not giving the GC enough time to keep up with incoming data? I'd like for this to work on low-end devices like phones, so I don't want to wait too long before running a syncfs, but maybe there's a lower bound on how often to run it. I'm not getting the "multiple syncs in flight" errors I used to before I rewrote this function, so I at least know that Emscripten thinks they're done.


And the other possible issue is that I'm on Emscripten 1.37.36. I tried upgrading to the latest version, but something regarding dead code elimination strips out _main even though I explicitly include it, and I'm thinking it'd be wise to debug one showstopper at a time. :) But maybe something changed in the interim that would resolve my upload issue, and I should switch to figuring out why _main is being stripped.


This is under Rust nightly with the wasm32-unknown-emscripten target. Both 1.37.36 and 1.38.11 are running on the same Rust version, so I doubt that's the issue.


Thanks for any help. Here's my current code with Emscripten calls commented out.


                    let filename = @{fh.to_str().unwrap()};
                    let element = document.querySelector("#import");
                    let file = element.files[0];
                    let size = file.size;

                    let chunk_size = 1024*10000;
                    let offset = 0;
                    let reader = new FileReader();

                    let counter = 0;
                    // let stream = FS.open(filename, "w");
                    reader.onloadend = function() {
                        if(reader.error) {
                            // FS.close(stream);
                            // FS.unlink(filename);
                            return console.error(reader.error);


                        }
                        // if(counter % 100 == 0) {
                            console.log("onload ", offset);
                        // }

                        let result = new Uint8Array(reader.result);
                        // FS.write(stream, result, 0, result.length);
                        /*FS.syncfs(function(err) {
                            if(err)
                                return console.error(err);
                            offset += result.length;
                            read();
                            counter++;
                        });*/
                        offset += result.length;
                        read();
                    };
                    function read() {
                        if(offset >= file.size)
                            return; // fs.close(stream);
                        var blob = file.slice(offset, offset+chunk_size);
                        reader.readAsArrayBuffer(blob);
                    }
                    read();

Nolan Darilek

unread,
Aug 27, 2018, 8:27:05 AM8/27/18
to emscripte...@googlegroups.com

FWIW, I also tried reopening the stream on each onload callback in append mode, writing my data, then closing it. Same result, though maybe it got a bit further?


I know that IDBFS is disk-based, but is the stream for a single file read into memory all at once, so meaningfully persisting files is possible but handling one large file beyond the browser's memory limits isn't?


I'm trying an upgrade to 1.38.11 with the intention of filing a bug report. May have to resolve my dead code elimination issues first, though.

Alon Zakai

unread,
Aug 27, 2018, 5:50:57 PM8/27/18
to emscripten-discuss
By "OOM dump" do you mean the entire browser crashes?

Nolan Darilek

unread,
Aug 27, 2018, 5:56:02 PM8/27/18
to emscripte...@googlegroups.com

No, the browser doesn't crash. I get an out-of-memory message in the console, and if I set a preference in about:config, Firefox dumps a gzipped JSON file with memory profiling data at the time of the crash.


I've since rewritten this logic mostly in Rust, and that actually yields a stacktrace rather than just crashing with the not-too-helpful "out of memory" message. The stacktrace seems to indicate that the OOM happens on `write()`, though, so I'm not sure how informative that is.


I filed an issue this morning, to which I've attached the OOM dump and the stacktrace:


https://github.com/kripken/emscripten/issues/7050

Reply all
Reply to author
Forward
0 new messages