Emscripten C/C++ headers

91 views
Skip to first unread message

John Dallman

unread,
Oct 28, 2024, 1:25:12 PM10/28/24
to emscripte...@googlegroups.com
Greetings, all!

I have a somewhat unusual first project with Emscripten. I need to get a domain-specific C-like language generating WebAssembly. This is not too bad, because the DSL compiles into C code, which I then feed to the C compiler for the relevant platform. At this level, WebAssembly is "just another platform" but as always, the details are more complicated.

I need to provide declarations for the platform's C run-time library functions, types, constants, and so on to the DSL. This is a fairly routine task, given the platform's C headers, but there are a few things about the Emscripten headers that are puzzling me.

To forestall the obvious question, no, I can't just use the provided headers. The DSL is C-like, but has some syntax differences. Unlike C++, many C programs are not valid programs in the DSL, which gives much more freedom in the language design. It's a separate development that split from normal C in the mid-eighties and is still very much worth using for its specialised role. Yes, new hires have to learn it, which takes about two days for someone who knows C or C++. Learning about the specialised application area it is used for takes much longer.

I'm running Emscripten on Linux. I started by looking at Emscripten 3.1.41, since that's the version used by a couple of other product teams that work on my site. That is fairly simple when I run a few emcc compiles with -H to get a report of what files are referenced. The top-level headers come from  emsdk/upstream/emscripten/cache/sysroot/include.  A few Clang-associated ones come from emsdk/upstream/lib/clang/17/include. That's all fine with me.

Then I decided to check the latest emsdk, and found that with 3.1.69, the headers come from a different place. It builds a cache of the headers and other files it uses under ~/.emscripten_cache. The problems with that are:

Storage: It will take 36MB in the user directory of everyone who ever compiles with Emscripten. We keep all our user directories on a server disk, because that's enormously convenient in many ways. But we really don't want to burn space with duplicates of that cache. 

Version lock: We need to be able to have several Emscripten versions in use simultaneously, by the same account, without conflicts. Our reason for this is that we plan to release products on WebAssembly, and from time to time, update the version of Emscripten we use, to get access to new C and C++ standards, compiler bug fixes, and so on. But we will not update the tools used to build a product version that's been released and is under maintenance, because we'd have to re-do a lot of the QA that we do at a release. So the service accounts that run our builds can't have caches of version-specific headers in their user directories. 

I need a way to tell Emscripten to put that cache somewhere else. I did some grep'ing of the Emscripten scripts, and found this line, in both 3.1.41 and 3.1.69:
upstream/emscripten/tools/config.py: CACHE = os.path.expanduser(os.path.join('~', '.emscripten_cache'))
I have not yet attempted to read the Python code and learn how it all works, because that could take ages; I don't know Python well at all and am not keen on it. I'm a naturally low-level programmer, much happier with assembly code than object orientation 

Is there an environment variable I can use to relocate that cache? 

Thanks very much,

John 

Sam Clegg

unread,
Oct 29, 2024, 5:28:27 PM10/29/24
to emscripte...@googlegroups.com
The emscripten cache actually defaults to living inside of the emscripten directory.   The line where this occurs is `CACHE = path_from_root('cache')`.  Then `os.path.expanduser(os.path.join('~', '.emscripten_cache'))` location is only used when the emscripten directory is read only.

When using emsdk the cache location should always be `emsdk/upstream/emscripten/cache`.   emsdk also shipped with a pre-populated cache so most system libraries are already built there.

For those who want to use a different cache location you can use the `EMCC_CACHE` environment variable or the `CACHE` key in your emscripten config file (both of these override the default).   However, it doesn't sounds like you need to do either of those things and the default location inside the emsdk tree should work for you.    Side note: emcc will also using the $HOME location if the in-tree location is not writable.  You can set `EMCC_FROZEN_CACHE` if you want to accept this read-only location (obviously new libraries cannot be placed there in that case though).  See https://github.com/emscripten-core/emscripten/blob/31f9fb3c71b1aacf5edf65e35515d41b59c10391/tools/config.py#L82-L92.

cheers,
sam

--
You received this message because you are subscribed to the Google Groups "emscripten-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to emscripten-disc...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/emscripten-discuss/CAH1xqgm39n%2B-%3DAeT09zJ38Bp3SMkiWk_TxMK0cEMcinnaaQn4w%40mail.gmail.com.

Sam Clegg

unread,
Oct 29, 2024, 5:53:03 PM10/29/24
to emscripte...@googlegroups.com
I decided to take this as a signal that its probably time to remove the automatic $HOME/.emscripten_cache fallback: https://github.com/emscripten-core/emscripten/pull/22801

John Dallman

unread,
Oct 30, 2024, 8:46:52 AM10/30/24
to emscripte...@googlegroups.com
This doesn't seem to be working. I'm using two accounts: "kerman" is my system management account, which can sudo, and "jgd" is my personal account. Both are members of the same primary group. 

I installed 3.1.70, the latest as of today, into /local/kernel_webasm/tools, and made emsdk/upstream/emscripten/cache group-writable. I then did a compile as jgd and found ~jgd/.emscripten_cache was populated and used. I removed that and thought again. 

I tried making the emsdk directory group-writable, and tried again. ~jgd/.emscripten_cache was populated and used. I removed it again. 

I created a cache directory, /local/kernel_webasm/tools/emcache/, made sure it was group-writable, and in my jgd session, did:

EMCC_CACHE=/local/kernel_webasm/tools/emcache
export EMCC_CACHE

I tried compiling again, and once again, ~jgd/.emscripten_cache was populated and used.

I'm running Rocky Linux 8.10. I really do want to have many accounts capable of using the same Emscripten installation, because putting it on a network drive makes standardizing development tools very much easier.     

Thanks,

John

Sam Clegg

unread,
Oct 30, 2024, 2:33:21 PM10/30/24
to emscripte...@googlegroups.com
On Wed, Oct 30, 2024 at 5:46 AM John Dallman <jgdats...@gmail.com> wrote:
This doesn't seem to be working. I'm using two accounts: "kerman" is my system management account, which can sudo, and "jgd" is my personal account. Both are members of the same primary group. 

I installed 3.1.70, the latest as of today, into /local/kernel_webasm/tools, and made emsdk/upstream/emscripten/cache group-writable. I then did a compile as jgd and found ~jgd/.emscripten_cache was populated and used. I removed that and thought again. 

I tried making the emsdk directory group-writable, and tried again. ~jgd/.emscripten_cache was populated and used. I removed it again. 

I created a cache directory, /local/kernel_webasm/tools/emcache/, made sure it was group-writable, and in my jgd session, did:

EMCC_CACHE=/local/kernel_webasm/tools/emcache
export EMCC_CACHE

I tried compiling again, and once again, ~jgd/.emscripten_cache was populated and used.

Something strange must be going on here.   Can you try running `emcc` with `EMCC_DEBUG=1` as well as `EMCC_CACHE` set?   Do you see the ` 'Using home-directory for emscripten cache due to read-only root` message?

The `CACHE = os.path.expanduser(os.path.join('~', '.emscripten_cache'))` should not even execute when you have an explicit cache configured.

Where is your emscripten config coming from?  Are you using `source emsdk_env.sh`?

Making `emsdk/upstream/emscripten/cache` group writable is the correct solution, there should be no need to serate of alternate cache directories in this case I think.

cheers,
sam

P.S. I am about to land a change that completely removes the fallback to `$HOME/.emscripten_cache`: https://github.com/emscripten-core/emscripten/pull/22801


John Dallman

unread,
Oct 31, 2024, 10:56:41 AM10/31/24
to emscripte...@googlegroups.com
> Can you try running `emcc` with `EMCC_DEBUG=1` as well as `EMCC_CACHE` set?   

Done.

> Do you see the ` 'Using home-directory for emscripten cache due to read-only root` message?

Yes, I do. I have: 

$ touch $EMCC_CACHE/test.jgd
$ ls -l $EMCC_CACHE
total 0
-rw-rw-r-- 1 jgd UK-Parasolid-GG 0 Oct 31 14:42 test.jgd
$ ls -ld $EMCC_CACHE
drwxrwxr-x 2 kerman UK-Parasolid-GG 4096 Oct 31 14:42 /local/kernel_webasm/tools/emcache/
$ python3 --version
Python 3.6.8

> Where is your emscripten config coming from?  Are you using `source emsdk_env.sh`?

Yes, I'm sourcing that script. I have not altered any config files. 

> Making `emsdk/upstream/emscripten/cache` group writable is the correct solution, there 
> should be no need to serate of alternate cache directories in this case I think.

Checking that: 

$ whoami
kerman
$ pwd
/local/kernel_webasm/tools/emsdk
$ ls -ld upstream/emscripten/cache
drwxrwxr-x 4 kerman UK-Parasolid-GG 4096 Oct 24 23:29 upstream/emscripten/cache
$ ls -l upstream/emscripten/cache
total 12
drwxr-xr-x 2 kerman UK-Parasolid-GG 4096 Oct 24 23:29 build
-rwxr-xr-x 1 kerman UK-Parasolid-GG    0 Oct 24 23:29 cache.lock
drwxr-xr-x 5 kerman UK-Parasolid-GG 4096 Oct 24 23:18 sysroot
-rw-r--r-- 1 kerman UK-Parasolid-GG    1 Oct 24 23:18 sysroot_install.stamp

Those files were extracted with those timestamps: Oct 24 is well before I installed this Emscripten, and I don't work that late at night. Is the presence of that cache.lock file the problem? 

Thanks,

John
 

Sam Clegg

unread,
Oct 31, 2024, 2:43:13 PM10/31/24
to emscripte...@googlegroups.com
Interesting.  It seems like `os.access(path, os.W_OK)` must be returning `False` in our python code for  `upstream/emscripten/cache`.

Are you able to touch a new file in `upstream/emscripten/cache`?

What does python show when you run: python3 -c "import os; print(os.access('upstream/emscripten/cache', os.W_OK))"

Sam Clegg

unread,
Oct 31, 2024, 2:44:23 PM10/31/24
to emscripte...@googlegroups.com
Regarding the overriding of the cache directory (which I don't recommend you do) the reason that didn't work is because I gave you the wrong variable name.  It should be `EM_CACHE` not `EMCC_CACHE`.  Sorry about that!

John Dallman

unread,
Nov 1, 2024, 6:18:25 AM11/1/24
to emscripte...@googlegroups.com
> Are you able to touch a new file in `upstream/emscripten/cache`?

Yes:

$ echo $EMSDK
/local/kernel_webasm/tools/emsdk
$ echo $EMCC_CACHE
<blank line> 
$ touch /local/kernel_webasm/tools/emsdk/upstream/emscripten/cache/test.jgd
$ ls -l /local/kernel_webasm/tools/emsdk/upstream/emscripten/cache/test.jgd
-rw-rw-r-- 1 jgd UK-Parasolid-GG 0 Nov  1 10:07 /local/kernel_webasm/tools/emsdk/upstream/emscripten/cache/test.jgd

> What does python show when you run: python3 -c "import os; print(os.access('upstream/emscripten/cache', os.W_OK))"

$> cat ../../pytest.sh

python3 -c "import os; print(os.access('upstream/emscripten/cache', os.W_OK))"
$ source ../../pytest.sh
False

Is my Python too old? It's the one that came with Rocky Linux 8.10.

$ python3 --version
Python 3.6.8

Thanks very much,

John 

Sam Clegg

unread,
Nov 1, 2024, 1:49:03 PM11/1/24
to emscripte...@googlegroups.com
On Fri, Nov 1, 2024 at 3:18 AM John Dallman <jgdats...@gmail.com> wrote:
> Are you able to touch a new file in `upstream/emscripten/cache`?

Yes:

$ echo $EMSDK
/local/kernel_webasm/tools/emsdk
$ echo $EMCC_CACHE
<blank line> 
$ touch /local/kernel_webasm/tools/emsdk/upstream/emscripten/cache/test.jgd
$ ls -l /local/kernel_webasm/tools/emsdk/upstream/emscripten/cache/test.jgd
-rw-rw-r-- 1 jgd UK-Parasolid-GG 0 Nov  1 10:07 /local/kernel_webasm/tools/emsdk/upstream/emscripten/cache/test.jgd

> What does python show when you run: python3 -c "import os; print(os.access('upstream/emscripten/cache', os.W_OK))"

$> cat ../../pytest.sh
python3 -c "import os; print(os.access('upstream/emscripten/cache', os.W_OK))"
$ source ../../pytest.sh

Did you run this from the emsdk directory?

Can you add `print(os.stat('upstream/emscripten/cache')` too?

 

John Dallman

unread,
Nov 6, 2024, 9:39:43 AM11/6/24
to emscripte...@googlegroups.com
Sorry to be slow replying, things have been busy. 

> Did you run this from the emsdk directory?

No. I don't read Python well and didn't realise it was necessary. Silly of me. 

Can you add `print(os.stat('upstream/emscripten/cache')` too?

Sure, and the missing ')'. 

$ echo $EMSDK
/local/kernel_webasm/tools/emsdk
$ pushd $EMSDK
/local/kernel_webasm/tools/emsdk ~/regimes/tools_webasm/patssy/lx86

$ cat ~/regimes/tools_webasm/pytest.sh
python3 -c "import os; print(os.stat('upstream/emscripten/cache'))"

python3 -c "import os; print(os.access('upstream/emscripten/cache', os.W_OK))"

$ source ~/regimes/tools_webasm/pytest.sh
os.stat_result(st_mode=16893, st_ino=101453189, st_dev=64774, st_nlink=4, st_uid=5790, st_gid=5200, st_size=4096, st_atime=1730862972, st_mtime=1730455667, st_ctime=1730455667)
True

Thanks very much,

John

Sam Clegg

unread,
Nov 6, 2024, 1:23:44 PM11/6/24
to emscripte...@googlegroups.com
On Wed, Nov 6, 2024 at 6:39 AM John Dallman <jgdats...@gmail.com> wrote:
Sorry to be slow replying, things have been busy. 

> Did you run this from the emsdk directory?

No. I don't read Python well and didn't realise it was necessary. Silly of me. 

Can you add `print(os.stat('upstream/emscripten/cache')` too?

Sure, and the missing ')'. 

$ echo $EMSDK
/local/kernel_webasm/tools/emsdk
$ pushd $EMSDK
/local/kernel_webasm/tools/emsdk ~/regimes/tools_webasm/patssy/lx86

$ cat ~/regimes/tools_webasm/pytest.sh
python3 -c "import os; print(os.stat('upstream/emscripten/cache'))"
python3 -c "import os; print(os.access('upstream/emscripten/cache', os.W_OK))"

$ source ~/regimes/tools_webasm/pytest.sh
os.stat_result(st_mode=16893, st_ino=101453189, st_dev=64774, st_nlink=4, st_uid=5790, st_gid=5200, st_size=4096, st_atime=1730862972, st_mtime=1730455667, st_ctime=1730455667)
True

 
In that case I don't see how the code in cache.py could be detecting that directory as not writable.

Can you try again after installing the latest emsdk version (3.1.71) which completely removes the fallback to `$HOME/.emscripten_cache`?

 

John Dallman

unread,
Nov 7, 2024, 10:03:21 AM11/7/24
to emscripte...@googlegroups.com
> In that case I don't see how the code in cache.py could be detecting that directory as not writable.
> Can you try again after installing the latest emsdk version (3.1.71) which completely removes the fallback to 
> `$HOME/.emscripten_cache`?

As Kerman, removed the directory tree that had 3.1.70 in it, which was /local/kernel_webasm/tools/emsdk. Installed and activated 3.1.71. Set upstream/emscripten/cache to group-writable:

$ pwd
/local/kernel_webasm/tools/emsdk
$ ls -ld upstream/emscripten/cache
drwxr-xr-x 4 kerman UK-Parasolid-GG 4096 Nov  4 22:14 upstream/emscripten/cache
$ chmod g+w upstream/emscripten/cache
$ ls -ld upstream/emscripten/cache
drwxrwxr-x 4 kerman UK-Parasolid-GG 4096 Nov  4 22:14 upstream/emscripten/cache

As jgd:

$ pwd
/u/jgd/regimes/tools_webasm/patssy/lx86
$ pushd /local/kernel_webasm/tools/emsdk/
/local/kernel_webasm/tools/emsdk ~/regimes/tools_webasm/patssy/lx86
$ cat /u/jgd/regimes/tools_webasm/pytest.sh
python3 -c "import os; print(os.stat('upstream/emscripten/cache'))"
python3 -c "import os; print(os.access('upstream/emscripten/cache', os.W_OK))"
$ /u/jgd/regimes/tools_webasm/pytest.sh
os.stat_result(st_mode=16893, st_ino=101454955, st_dev=64774, st_nlink=4, st_uid=5790, st_gid=5200, st_size=4096, st_atime=1730984300, st_mtime=1730758485, st_ctime=1730990265)
True
$ popd

That all looks OK. 

I run the setup script via a wrapper script of my own:

$ cat setup_environment
#!/bin/echo must_be_sourced:
source /local/kernel_webasm/tools/emsdk/emsdk_env.sh
$ source setup_environment
Setting up EMSDK environment (suppress these messages with EMSDK_QUIET=1) 
Adding directories to PATH:
PATH += /local/kernel_webasm/tools/emsdk
PATH += /local/kernel_webasm/tools/emsdk/upstream/emscripten
PATH += /local/kernel_webasm/tools/emsdk/node/20.18.0_64bit/bin

Setting environment variables:
PATH = /local/kernel_webasm/tools/emsdk:/local/kernel_webasm/tools/emsdk/upstream/emscripten:/local/kernel_webasm/tools/emsdk/node/20.18.0_64bit/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/puppetlabs/bin:/bin:/usr/site/devop_tools/bin:/usr/site/devop_tools/bin/lnx64:/usr/site/devop_tools/UDU/tools/bin/unx:/usr/site/bin:/usr/site/bin/lnx64:/Parasolid/tools/lx86/bin:/usr/local/sw_tools/bin:/Parasolid/tools/common/unix:/Parasolid/tools/common/vanilla:/Parasolid/pyscripts/bash
EMSDK = /local/kernel_webasm/tools/emsdk
EMSDK_NODE = /local/kernel_webasm/tools/emsdk/node/20.18.0_64bit/bin/node

That all looks good. Then I try to compile something and get Python errors:

$ cat buildrun.sh
#!/bin/bash -p
# --- utility "buildrun.sh" (m/c lx86) created 31-oct-2024
# --- from /sdl/prairie/users/jgd/regimes/tools_webasm/patssy/patssy_webas_support
# --- ------------------------------------------------
sopts="-sALLOW_MEMORY_GROWTH -sSTACK_SIZE=1024KB -sNODERAWFS -DNODERAWFS"
copts="-H -D_FORTIFY_SOURCE=2"
emcc test.c $sopts $copts -o a.out.js
node a.out.js
$./buildrun.sh
Traceback (most recent call last):
  File "/local/kernel_webasm/tools/emsdk/upstream/emscripten/emcc.py", line 1639, in <module>
    sys.exit(main(sys.argv))
  File "/usr/lib64/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/local/kernel_webasm/tools/emsdk/upstream/emscripten/emcc.py", line 1632, in main
    ret = run(args)
  File "/local/kernel_webasm/tools/emsdk/upstream/emscripten/emcc.py", line 578, in run
    shared.check_sanity()
  File "/usr/lib64/python3.6/contextlib.py", line 52, in inner
    return func(*args, **kwds)
  File "/local/kernel_webasm/tools/emsdk/upstream/emscripten/tools/shared.py", line 513, in check_sanity
    with cache.lock('sanity'):
  File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/local/kernel_webasm/tools/emsdk/upstream/emscripten/tools/cache.py", line 68, in lock
    acquire_cache_lock(reason)
  File "/local/kernel_webasm/tools/emsdk/upstream/emscripten/tools/cache.py", line 44, in acquire_cache_lock
    cachelock.acquire(60)
  File "/local/kernel_webasm/tools/emsdk/upstream/emscripten/tools/filelock.py", line 278, in acquire
    self._acquire()
  File "/local/kernel_webasm/tools/emsdk/upstream/emscripten/tools/filelock.py", line 391, in _acquire
    fd = os.open(self._lock_file, open_mode)
PermissionError: [Errno 13] Permission denied: '/local/kernel_webasm/tools/emsdk/upstream/emscripten/cache/cache.lock'

So I tried, as kerman:

$ chmod -R g+w upstream/emscripten/cache

That resolved the permissions issue when I tried to compile again. I got just one message about caching:

cache:INFO: generating system asset: symbol_lists/bab13e41f87696c3ec8f1fba6ce506c5cb5d7b6b.json... (this will be cached in "/local/kernel_webasm/tools/emsdk/upstream/emscripten/cache/symbol_lists/bab13e41f87696c3ec8f1fba6ce506c5cb5d7b6b.json" for subsequent builds)

That's OK, because it is under emsdk. 

Is there a way to turn off the colourisation of messages? That's wildly unhelpful given my vision problems.  

Did you always intend the chmod to be of the entire tree under upstream/emscripten/cache? 

Thanks very much,

John

Sam Clegg

unread,
Nov 7, 2024, 2:16:12 PM11/7/24
to emscripte...@googlegroups.com
Great news!
 

Is there a way to turn off the colourisation of messages? That's wildly unhelpful given my vision problems.  

Sadly we don't have an explicit way to turn it off today.  However if the output is not a tty i won't add color.  So you can add `2>&1 | tee /dev/null` to effectively turn it off.   e.g `./embuilder build libc --force 2>&1 | tee /dev/null`.

Luckily these messages  should only be printed when new files are cached there which should be rare.  


Did you always intend the chmod to be of the entire tree under upstream/emscripten/cache? 

Yes, this is by design.   The emcc compiler needs full read/write access to that entire tree.
 
Reply all
Reply to author
Forward
0 new messages