wasm2go: a Wasm to Go translator

295 views
Skip to first unread message

Nuno Cruces

unread,
Mar 14, 2026, 7:37:55 PMMar 14
to golang-nuts
For the past few weeks I've been working on a Wasm to Go translator. It takes a Wasm module and converts it to a Go package. I plan to use it for my SQLite driver. I'm translating a Wasm build of SQLite into ~ 600 kLoC of Go. I've tested it across 20 GOOS/GOARCH combinations.

I have found that GC produces suboptimal code in some situations.

Wasm is a little endian platform, so a memory load is something like (offset is a constant literal):

int32(binary.LittleEndian.Uint32(m.Memory[int64(ptr)+offset:]))

I've noticed that, on little endian platforms, this works much faster:

*(*int32)(unsafe.Pointer((*[4]byte)(m.Memory[int64(ptr)+offset:])))

I think because the first version has 2 bounds checks, and the later just one? This is unfortunate, as I have to generate 2 versions of the code (for little and big endian).

Am I missing any trick to get the compiler to generate better code?

Thanks! 

Andy Balholm

unread,
Mar 16, 2026, 1:28:46 PMMar 16
to golang-nuts
That's something I've wanted to do for a long time, but not found the time to do!

Dario Castañé

unread,
Mar 27, 2026, 11:19:18 AM (5 days ago) Mar 27
to golang-nuts
I attempted this a few months ago, trying to translate directly to Go assembly. My premise was that it should be "simpler" to translate. It did work but memory management wasn't easy.

I need to check Nuno's implementation. It looks really promising!

Andy Balholm

unread,
Mar 27, 2026, 12:17:54 PM (5 days ago) Mar 27
to golan...@googlegroups.com

Go assembly is a very challenging target. Simple things are easy, but anything related to memory is hard. You're much better off translating to what I call "asm.go" (using Go as though it were assembly) and letting the compiler deal with it. 

Andy

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/-f3c9l_8Ypo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/golang-nuts/a22a73fa-618d-4062-8df6-da03be0775a4n%40googlegroups.com.

Nuno Cruces

unread,
Mar 27, 2026, 7:34:26 PM (5 days ago) Mar 27
to Dario Castañé, golang-nuts
I was since able to improve the output significantly. Instead of 600 kLoC, SQLite is now under 200 kLoC of Go. I've released a new version of my SQLite driver based this, and so far feedback has been positive.

Others have similarly used wasm2go on a couple of other projects, and shared good results.

I still have the same issue with memory accesses, if anyone has any insight. I'd really love to avoid the code duplication, if I was able to make LittleEndian as fast as unsafe.Pointer.

Regards,
Nuno Cruces


--

Andy Balholm

unread,
Mar 27, 2026, 7:49:03 PM (5 days ago) Mar 27
to Nuno Cruces, golang-nuts

I think I know what the issue is. It's doing one bounds check for int64(ptr)+ offset, and then another to make sure the resulting slice has at least 4 elements. What happens if you do and discard a read of m.Memory[int64(ptr)+offset+3] first? (The compiler should know that if that bounds check succeeds, the other two will succeed too, and it can elide them.)

Andy

Nuno Cruces

unread,
Mar 28, 2026, 8:12:10 PM (4 days ago) Mar 28
to Andy Balholm, golang-nuts
Thanks Andy. I tried that, it made matters worse.

When I input small snippets like these into Godbolt the results are virtually the same (one does <=3, the other <4):

func little(ptr int32) int32 {
    return int32(binary.LittleEndian.Uint32(mem[int32(ptr)+offset:]))
}

func pointer(ptr int32) int32 {
    return *(*int32)(unsafe.Pointer((*[4]byte)(mem[int64(ptr)+offset:])))
}

The problem, AFAICT, is that the functions in binary.LittleEndian are not compiler intrinsics.
They need to be inlined, then BCE sees a discarded read and emits a single bounds check,
then SSA pattern matching triggers and replaces 4 single byte loads with a single 4 byte load.

And some piece of this probably breaks down with the massive functions wasm2go generates (maybe BCE, and then the SSA matching fails?).
The main SQLite interpreter loop (sqliteVbdeExec) is 8 kLoC long, with 1000 locals, and 500 labels/blocks.

I think I may have answered my own question, though. bits.ReverseBytes is a compiler intrinsic,
so combining it with unsafe.Pointer might work.

$ ~/go/bin/benchstat endian unsafe swap
goos: linux
goarch: amd64
pkg: github.com/ncruces/go-sqlite3/ext/stats
cpu: Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz
              │   endian    │               unsafe                │                swap                 │
              │   sec/op    │   sec/op     vs base                │   sec/op     vs base                │
_average-12     175.3n ± 2%   136.4n ± 3%  -22.19% (p=0.000 n=20)   137.6n ± 3%  -21.51% (p=0.000 n=20)
_variance-12    199.0n ± 2%   165.3n ± 3%  -16.89% (p=0.000 n=20)   160.5n ± 4%  -19.30% (p=0.000 n=20)
_math/sqrt-12   252.6n ± 2%   180.8n ± 6%  -28.41% (p=0.000 n=20)   180.3n ± 3%  -28.59% (p=0.000 n=20)
_math/tan-12    271.4n ± 2%   200.0n ± 2%  -26.28% (p=0.000 n=20)   205.5n ± 3%  -24.27% (p=0.000 n=20)
_math/cot-12    302.5n ± 4%   227.8n ± 3%  -24.68% (p=0.000 n=20)   234.1n ± 5%  -22.61% (p=0.000 n=20)
_math/cbrt-12   293.5n ± 2%   230.0n ± 2%  -21.65% (p=0.000 n=20)   231.8n ± 8%  -21.04% (p=0.000 n=20)
geomean         244.2n        187.0n       -23.44%                  188.2n       -22.94%

The last one uses helpers like this one, and seems close enough:

func swap32(x uint32) uint32 {
  switch runtime.GOARCH {
  case "armbe", "arm64be", "m68k", "mips", "mips64", "mips64p32", "ppc", "ppc64", "s390", "s390x", "shbe", "sparc", "sparc64":
    return bits.ReverseBytes32(x)
  default:
    return x
  }
}

We really could use an exported constant for this in the standard library...

Nuno

Andy Balholm

unread,
Mar 28, 2026, 8:24:10 PM (4 days ago) Mar 28
to Nuno Cruces, golang-nuts

It would be really nice to be able to say "no unsafe", though. That could be a significant distinction between wasm2go and modernc's translations.

The reason there isn't a constant for the native byte order in the standard library is explained at https://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html

If you're needing to use unsafe, and bits.ReverseBytes, to get around a limitation in the compiler, I think it would be worth filing an issue.

Andy

Nuno Cruces

unread,
Mar 28, 2026, 8:35:23 PM (4 days ago) Mar 28
to Andy Balholm, golang-nuts
Yes. I currently have an "endian" flag that you can:
  • leave unset, I use encoding.LittleEndian
  • set to "little", I use unsafe.Pointer, guard with build tags for known little endian platforms
  • set to something else, I emit portable code, but guard with build tags for "not known little endian"
Maybe with this I can move to just an "unsafe" boolean flag. As I say in the readme: despite the scary name, it should be safe.
Less so in this case if I misclassify a CPU. I don't disagree with Rob there, but in this case, it'd help prevent that bug.
I may open that issue.

Thanks!

Nuno Cruces

unread,
Mar 29, 2026, 5:53:24 AM (3 days ago) Mar 29
to Andy Balholm, golang-nuts
So, sorry for spamming the list at this point, but this works too (its plenty fast on at least amd64 and arm64):

// Architectures that are unalignedOK:
// https://go.dev/src/cmd/compile/internal/ssa/config.go

//go:nosplit
func load32(b []byte) uint32 {
  switch runtime.GOARCH {
  case "386", "amd64", "arm64", "loong64", "ppc64", "ppc64le", "s390x", "wasm":
    v := *(*uint32)(unsafe.Pointer((*[4]byte)(b)))
    switch runtime.GOARCH {
    case "ppc64", "s390x":
      return bits.ReverseBytes32(v)
    default:
      return v
    }
  default:
    return binary.LittleEndian.Uint32(b)
  }
}

So I guess this should be my request: make LittleEndian as fast as this, even when called from extremely large functions.
This would be one possible way to do it.

Nuno
Reply all
Reply to author
Forward
0 new messages