The data structures are well documented, but AFAIK the encodings are
not. Mangling and using ASCII could have worked, with some special
mangle convention for non-ASCII characters. But it's just UTF-8 for the
characters, plus by default mangling to encode the signature etc..
I found out about the UTF-8 encoding once when I coded up some traversal
of the structures for an SO answer, but my google-foo now fails me: I
can't find it again.
But here's a simple example, using Microsoft's toolchain. Here `chcp` is
a command to change the console window's active codepage, cl is the
Visual C++ compiler, probably named for its ancestor Lattice C, and
dumpbin is Microsofts binary dump utility, corresponding to gcc's
objdump. And as you can see, under the assumption that dumbin dumps the
raw bytes of the name (and yes it does), the name is UTF-8 encoded:
<example>
C:\my\forums\clc++\018> chcp 1252
Active code page: 1252
C:\my\forums\clc++\018> type dll.cpp
#include <windows.h>
__declspec( dllexport )
void bare_blåbærtøys()
{ MessageBoxW( 0, L"Oi", L"Message:", MB_SETFOREGROUND ); }
C:\my\forums\clc++\018> type name_in_utf8.txt
bare_blåbærtøys
C:\my\forums\clc++\018> cl dll.cpp user32.lib /LD
dll.cpp
Creating library dll.lib and object dll.exp
C:\my\forums\clc++\018> dumpbin /exports dll.dll | find "bare"
1 0 00001260 ?bare_blåbærtøys@@YAXXZ
C:\my\forums\clc++\018> _
</example>
• • •
Standard C++ allows the Norwegian Æ, Ø and Å (lowercase æ, ø, and å) in
identifiers, as used above, but g++ doesn't accept them, at least not by
default :( :
<example>
C:\my\forums\clc++\018> g++ -c dll.cpp -finput-charset=cp1252
dll.cpp:4:1: error: stray '\303' in program
void bare_blåbærtøys()
^
dll.cpp:4:1: error: stray '\245' in program
dll.cpp:4:1: error: stray '\303' in program
dll.cpp:4:1: error: stray '\246' in program
dll.cpp:4:1: error: stray '\303' in program
dll.cpp:4:1: error: stray '\270' in program
dll.cpp:4:15: error: expected initializer before 'b'
void bare_blåbærtøys()
^
C:\my\forums\clc++\018> g++ --version | find "++"
g++ (tdm64-1) 5.1.0
C:\my\forums\clc++\018> _
</example>
There are no errors (i.e. MinGW g++ understands the Microsoft `declspec`
directive) when the file only contains ASCII characters. I'm at a loss
at how the compiler ends up with the specific values in the error
messages. They are not the raw encoding values, but there are only three
distinct values, presumably corresponding to æ, ø and å, and they look
as if they're scooped from a translation to UTF-8.
Anyway, due to g++'s lack of support for non-ASCII function names, I
can't show the DLL example with the GNU toolchain, sorry.
Cheers & hth.,
Alf