register and volatile qualifiers, preprocessing

880 views
Skip to first unread message

Mat

unread,
Nov 30, 2009, 7:13:43 AM11/30/09
to golang-nuts
Hello,

I found GO very interesting because of its coroutine concept and
syntax but missing some details which I found important for system
programming purposes. For example, ANSI C support limited compilation
hints though the volatile and register qualifiers.

Exist there any equivalent in GO ?

I want to port a stack based vm from C which cache TOS in a register
and uses replicating-switch threading to grain performance. In C I can
simply create a preprocessor macro for the switch construct; It looks
like this:

switch (VmAdrSpace[VmPC++])
{
case VmADD: VmTOS = VmTOS + VmDS[VmDSP++]; switch (VmAdrSpace[VmPC+
+]) ....

This behavior is important for the runtime performance because a
single switch construct not only adds run-time checks for every
iteration but should compile at best to a jump table which share it's
jump address. This generates on almost modern CPU's a high rate of BTB
mis-predictions resulting in a really bad run-time performance.
Replicating-switch threading fix this though replicating of the switch
construct.

Can I express this technique in GO without using something like Perl
(I found even False more intuitive) for preprocessing ?

Thanks,
-Mat.

Ian Lance Taylor

unread,
Dec 1, 2009, 6:38:04 PM12/1/09
to Mat, golang-nuts
Mat <dam...@web.de> writes:

> I found GO very interesting because of its coroutine concept and
> syntax but missing some details which I found important for system
> programming purposes. For example, ANSI C support limited compilation
> hints though the volatile and register qualifiers.
>
> Exist there any equivalent in GO ?

There is no equivalent to volatile and register in Go, no.

The register qualifier in ISO C is nearly meaningless. Pretty much
every optimizing C or C++ compiler ignores it for purposes of
optimization. Its only meaning is that you can not take the address
of a variable which has the qualifier.

The volatile qualifier is not meaningless but it is widely misused.
Almost the only valid use of volatile in ISO C is to access memory
mapped hardware which requires a precise pattern of reads and writes.
There are also a few limited uses of it to disable optimizations in
very low-level multi-threaded code. Go had no equivalent to this.


> I want to port a stack based vm from C which cache TOS in a register
> and uses replicating-switch threading to grain performance. In C I can
> simply create a preprocessor macro for the switch construct; It looks
> like this:
>
> switch (VmAdrSpace[VmPC++])
> {
> case VmADD: VmTOS = VmTOS + VmDS[VmDSP++]; switch (VmAdrSpace[VmPC+
> +]) ....
>
> This behavior is important for the runtime performance because a
> single switch construct not only adds run-time checks for every
> iteration but should compile at best to a jump table which share it's
> jump address. This generates on almost modern CPU's a high rate of BTB
> mis-predictions resulting in a really bad run-time performance.
> Replicating-switch threading fix this though replicating of the switch
> construct.
>
> Can I express this technique in GO without using something like Perl
> (I found even False more intuitive) for preprocessing ?

I'm not certain, but I expect that the answer here is no.

Ian

Stefan Hajnoczi

unread,
Dec 2, 2009, 2:15:41 AM12/2/09
to Ian Lance Taylor, Mat, golang-nuts
On Tue, Dec 1, 2009 at 11:38 PM, Ian Lance Taylor <ia...@google.com> wrote:
> Mat <dam...@web.de> writes:
>> I want to port a stack based vm from C which cache TOS in a register
>> and uses replicating-switch threading to grain performance. In C I can
>> simply create a preprocessor macro for the switch construct; It looks
>> like this:
>>
>> switch (VmAdrSpace[VmPC++])
>> {
>>    case VmADD: VmTOS = VmTOS + VmDS[VmDSP++]; switch (VmAdrSpace[VmPC+
>> +]) ....
>>
>> This behavior is important for the runtime performance because a
>> single switch construct not only adds run-time checks for every
>> iteration but should compile at best to a jump table which share it's
>> jump address. This generates on almost modern CPU's a high rate of BTB
>> mis-predictions resulting in a really bad run-time performance.
>> Replicating-switch threading fix this though replicating of the switch
>> construct.
>>
>> Can I express this technique in GO without using something like Perl
>> (I found even False more intuitive) for preprocessing ?
>
> I'm not certain, but I expect that the answer here is no.

GCC supports a related C language extension: labels as values
(http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Labels-as-Values.html).
It makes labels values and extends the goto statement to support
computed jumps:

static void *array[] = { &&foo, &&bar, &&hack };
goto *array[i];

In Go you could put each opcode implementation into its own function
and keep an array of function pointers. At the end of each operation,
instead of returning, you perform a tail-call to the next opcode via
the array of function pointers. This requires tail-call optimization
and function calls to be cheap. I haven't looked at the gc or gccgo
output for this, but it's probably faster to use a loop with a single
switch statement inside it.

Since Go supports labels, it may be possible to get the equivalent of
GCC labels as values in Go with some compiler changes?

Stefan

Mat

unread,
Dec 2, 2009, 5:35:45 AM12/2/09
to golang-nuts
> GCC supports a related C language extension: labels as values
> (http://gcc.gnu.org/onlinedocs/gcc-3.3.1/gcc/Labels-as-Values.html).
> It makes labels values and extends the goto statement to support
> computed jumps:
>
> static void *array[] = { &&foo, &&bar, &&hack };
> goto *array[i];

The previous vm version of me uses these feature but I found it very
hard to bypass some ever changing optimization features of GCC which
results in impressive bad code generation otherwise. For this reason I
switched to replicated-switch threading (by the way the current vm
bypasses the old indirect threading one by a factor of 2 because of
this dilemma).

> In Go you could put each opcode implementation into its own function
> and keep an array of function pointers.  At the end of each operation,
> instead of returning, you perform a tail-call to the next opcode via
> the array of function pointers.  This requires tail-call optimization
> and function calls to be cheap.  I haven't looked at the gc or gccgo
> output for this, but it's probably faster to use a loop with a single
> switch statement inside it.

Thanks, I will implement a vm version which uses continuation-passing
style for threading and benchmark it against a switch-threading one.

> Since Go supports labels, it may be possible to get the equivalent of
> GCC labels as values in Go with some compiler changes?

A huge effort I think ...

-Mat
Reply all
Reply to author
Forward
0 new messages