Recommended pattern for loading large fixed files once with --preload-file and re-running main() with small changing files

Ahmed Fasih

unread,

Sep 12, 2014, 11:45:53 AM9/12/14

to emscripte...@googlegroups.com

I'm Emscriptenifying a dictionary-based application, which opens large dictionary files as well as smaller input file when it runs. I'm targeting the browser, where I want to be able to edit run the application (invoke "main()"), examine the output, tweak the input, and repeat. So this involves getting the user input into FS "file", running a bunch of Javascript, displaying the results, and preparing for the next rinse-repeat.

Sorry if the following background is too verbose, I hope it'll make clear what I'm doing and asking:

I have emcc build a .js file and use the --preload-file option to place the large dictionary files in a .data file. The resulting Javascript file has the following format:

// code to read the .data file built due to the --preload-file flag

// (code from --pre-js, if any)

// and the rest of the code is Emscripten-generated

I wrap all this generated code inside a `function run(args) { ... }` block, with some pre-initialization like Module.TOTAL_MEMORY and creating the FS-based "file" that the application will operate on. (This step is manual since I don't know of any "--really-pre-js" flag that'll let me put code *before* the data-loading code generated due to --preload-file. Is there one?)

At this stage, I have a Javascript function that I can invoke to do everything I want (get input from the user and create a small file, load the large dictionary files, and run the application), over and over again if necessary by complete tear-down and re-initialization of main().

The major inefficiency with this pattern is the large dictionary files that are reloaded for every invocation of this all-in-one `run()` function. I want to get rid of this major network inefficiency. If I run the data-loading code first, and then wrap the rest of the Emscripten-generated Javascript in my `function run(Module, ...) {...}` block that gets the Module object, the application will run correctly *once* (and the large data file is downloaded only *once,* at page load time), but subsequent calls to `run()` do not produce any output from the application. Is is possible that after `run` is called, the .data files that were loaded initially get freed? (I doubt that because my application doesn't complain that it couldn't find the input file.) Or what could be preventing the application from actually doing something if I remove the data-loading code from the `run` function and place it outside?

In any case, I wanted to solicit feedback on my general approach, and see if there are better approaches to getting "interactive" runs in the browser, where a small input file changes for each run, but the large data files in memory are fetched only once.

Many thanks for your help and your hard work!

Jukka Jylänki

unread,

Sep 14, 2014, 2:45:07 PM9/14/14

to emscripte...@googlegroups.com

I would recommend adding some JS <-> C interop here, and running the C main() immediately at page startup, performing whatever necessary initialization that is needed for the generic runtime. Then, using the link flag -s NO_EXIT_RUNTIME=1, just return from main() function, and later, have a C API that you expose to JS to enable the individual runs. That allows you to avoid the C runtime startup and shutdown on every iteration, and directly call a C function from JavaScript to perform the per-iteration work. See the documentation on "Interacting with Code" for tips on how to do that, or if you are looking for C++ interop, check out the embind documentation.

Jukka

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ahmed Fasih

unread,

Sep 17, 2014, 11:18:44 PM9/17/14

to emscripte...@googlegroups.com

Jukka, thanks for the feedback, it was very helpful. I was trying to package this application into something like Alon described in his blog post a couple of years ago [1], where a single function call from browser client code encapsulates all the Emscripten code, and where subsequent calls didn't rely on any global Emscripten state maintained. But this was foolish since I have these large dictionary files that have to somehow be maintained between calls if I don't want to make them over and over again. So I bit the bullet and dove into the code and thankfully found that it was easy enough to circumvent main(). Hooray!

Please accept my thanks for making this possible: http://fasiha.github.io/mecab-emscripten/ !

[1] http://mozakai.blogspot.com/2012/03/howto-port-cc-library-to-javascript.html

--
You received this message because you are subscribed to a topic in the Google Groups "emscripten-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/emscripten-discuss/gLEpmggULTk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to emscripten-disc...@googlegroups.com.

Jukka Jylänki

unread,

Sep 18, 2014, 4:46:56 AM9/18/14

to emscripte...@googlegroups.com

Nice! Looks like it works, although I've no idea what the text is that it produced! :)

Jukka

Reply all

Reply to author

Forward