Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
CGP - CGoto Prederefed runloop
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  21 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Leopold Toetsch  
View profile  
 More options Feb 6 2003, 9:50 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Thu, 06 Feb 2003 13:37:42 +0100
Local: Thurs, Feb 6 2003 7:37 am
Subject: [CVS ci] CGP - CGoto Prederefed runloop
This is one thing I allways wanted to try ;-)

fast_core MOps: 11
Prederef:       17.5
CGoto MOps:     19.4
CGP MOps:       27.5
CGP -O3 MOps:   65 !!!1

This runloop combines the faster dispatch of opcodes via computed goto
and the clever register addressing due to predereferencing registers and
constants.
And it's compact due to the fact that all opcode variants with constants
  collapse to just one implementation of the functions body. It's so
compact, that my ancient gcc 2.95.2 even can compile it -O3, which
didn't succeed with core_ops_cg.c.

-rw-r--r--   1 lt    users   618496 Feb  6 12:33 core_ops.c
-rw-r--r--   1 lt    users   665012 Feb  6 13:10 core_ops.o
-rw-r--r--   1 lt    users   219169 Feb  6 12:33 core_ops_cg.c
-rw-r--r--   1 lt    users   339312 Feb  6 13:10 core_ops_cg.o
-rw-r--r--   1 lt    users   154457 Feb  6 13:05 core_ops_cgp.c
-rw-r--r--   1 lt    users   165520 Feb  6 13:27 core_ops_cgp.o
-rw-r--r--   1 lt    users   219446 Feb  6 12:33 core_ops_prederef.c
-rw-r--r--   1 lt    users   240592 Feb  6 13:10 core_ops_prederef.o

This runloop is now enabled with the -P switch. If you want to run the
"normal" prederefed runloop then use '-P -g'.

Have fun,
leo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
gregor  
View profile  
 More options Feb 6 2003, 9:50 am
Newsgroups: perl.perl6.internals
From: gre...@focusresearch.com
Date: Thu, 6 Feb 2003 09:15:18 -0500
Local: Thurs, Feb 6 2003 9:15 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop
leo++

Leopold Toetsch <l...@toetsch.at>
02/06/2003 07:37 AM

        To:     P6I <perl6-intern...@perl.org>
        cc:
        Subject:        [CVS ci] CGP - CGoto Prederefed runloop

This is one thing I allways wanted to try ;-)

fast_core MOps: 11
Prederef:       17.5
CGoto MOps:     19.4
CGP MOps:       27.5
CGP -O3 MOps:   65 !!!1

This runloop combines the faster dispatch of opcodes via computed goto
and the clever register addressing due to predereferencing registers and
constants.
And it's compact due to the fact that all opcode variants with constants
  collapse to just one implementation of the functions body. It's so
compact, that my ancient gcc 2.95.2 even can compile it -O3, which
didn't succeed with core_ops_cg.c.

-rw-r--r--   1 lt    users   618496 Feb  6 12:33 core_ops.c
-rw-r--r--   1 lt    users   665012 Feb  6 13:10 core_ops.o
-rw-r--r--   1 lt    users   219169 Feb  6 12:33 core_ops_cg.c
-rw-r--r--   1 lt    users   339312 Feb  6 13:10 core_ops_cg.o
-rw-r--r--   1 lt    users   154457 Feb  6 13:05 core_ops_cgp.c
-rw-r--r--   1 lt    users   165520 Feb  6 13:27 core_ops_cgp.o
-rw-r--r--   1 lt    users   219446 Feb  6 12:33 core_ops_prederef.c
-rw-r--r--   1 lt    users   240592 Feb  6 13:10 core_ops_prederef.o

This runloop is now enabled with the -P switch. If you want to run the
"normal" prederefed runloop then use '-P -g'.

Have fun,
leo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jerome Vouillon  
View profile  
 More options Feb 6 2003, 1:48 pm
Newsgroups: perl.perl6.internals
From: vouil...@pps.jussieu.fr (Jerome Vouillon)
Date: Thu, 6 Feb 2003 17:21:25 +0100
Local: Thurs, Feb 6 2003 11:21 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

On Thu, Feb 06, 2003 at 01:37:42PM +0100, Leopold Toetsch wrote:
> This is one thing I allways wanted to try ;-)

> fast_core MOps: 11
> Prederef:       17.5
> CGoto MOps:     19.4
> CGP MOps:       27.5
> CGP -O3 MOps:   65 !!!1

> This runloop combines the faster dispatch of opcodes via computed goto
> and the clever register addressing due to predereferencing registers and
> constants.

This generates some pretty good code (for instance, for sub_i_i_i) :

      (*(INTVAL *)cur_opcode[1]) =
          (*(INTVAL *)cur_opcode[2]) - (*(INTVAL *)cur_opcode[3]);
      goto *ops_addr[*(cur_opcode += 4)];

      mov    0x4(%esi),%ecx
      mov    0x8(%esi),%edx
      mov    0xc(%esi),%eax
      mov    (%eax),%eax
      mov    (%edx),%edx
      sub    %eax,%edx
      mov    %edx,(%ecx)
      add    $0x10,%esi
      mov    (%esi),%eax
      shl    $0x2,%eax
      jmp    *0x80f75a0(%eax)

Why not also predereference the operation address?  This would save a
memory read.  The goto would become:

      goto **(cur_opcode += 4);

It seems hard to improve upon this.  One possibility is to have some
new instructions that uses an accumulator:

      accu = accu - (*(INTVAL *)cur_opcode[1]);
      goto **(cur_opcode += 2);

      mov    0x4(%esi),%eax
      mov    (%eax),%eax
      sub    %ebx,%eax
      add    $0x8,%esi
      jmp    *(%esi)

-- Jérôme


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 6 2003, 4:48 pm
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Thu, 06 Feb 2003 22:12:39 +0100
Local: Thurs, Feb 6 2003 4:12 pm
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Jerome Vouillon wrote:
>>CGP -O3 MOps:   65 !!!1
> This generates some pretty good code (for instance, for sub_i_i_i) :

Yep. I'd have a look at it, really compact.

> Why not also predereference the operation address?  This would save a
> memory read.  The goto would become:

>       goto **(cur_opcode += 4);

This is left as an exercise to the reader :)

Improvements welcome - and I'm a really bad C programmer, I won't do it.

> It seems hard to improve upon this.  One possibility is to have some
> new instructions that uses an accumulator:

>       accu = accu - (*(INTVAL *)cur_opcode[1]);

This looks very similar to my proposal WRT micro ops. I had 3 "accu"
globals - pretty fast. But the problem is, they must be somewhere in the
interpreter because of re-entrancy, threads and that. So this seems
impossible.

> -- Jérôme

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Melvin Smith  
View profile  
 More options Feb 6 2003, 9:48 pm
Newsgroups: perl.perl6.internals
From: mrjoltc...@mindspring.com (Melvin Smith)
Date: Thu, 06 Feb 2003 20:49:42 -0500
Local: Thurs, Feb 6 2003 8:49 pm
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop
At 10:12 PM 2/6/2003 +0100, Leopold Toetsch wrote:

>Improvements welcome - and I'm a really bad C programmer, I won't do it.

*cough*

If you are a "bad" C programmer, what is your "good" language? :)

-Melvin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 7 2003, 3:48 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Fri, 07 Feb 2003 09:29:50 +0100
Local: Fri, Feb 7 2003 3:29 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Melvin Smith wrote:
> At 10:12 PM 2/6/2003 +0100, Leopold Toetsch wrote:

>> Improvements welcome - and I'm a really bad C programmer, I won't do it.

> *cough*

> If you are a "bad" C programmer, what is your "good" language? :)

I don't have one. But IMHO I have a fair survey over the whole (except
io/ and icu/ :) And - as already done in imcc - I can take existing
(really good pieces) and knot them together, a la CGP.

> -Melvin

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicholas Clark  
View profile  
 More options Feb 8 2003, 10:48 am
Newsgroups: perl.perl6.internals
From: n...@unfortu.net (Nicholas Clark)
Date: Sat, 8 Feb 2003 15:10:58 +0000
Local: Sat, Feb 8 2003 10:10 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

On Thu, Feb 06, 2003 at 01:37:42PM +0100, Leopold Toetsch wrote:
> This is one thing I allways wanted to try ;-)

> fast_core MOps: 11
> Prederef:       17.5
> CGoto MOps:     19.4
> CGP MOps:       27.5
> CGP -O3 MOps:   65 !!!1

> This runloop combines the faster dispatch of opcodes via computed goto
> and the clever register addressing due to predereferencing registers and
> constants.

Oooh. Shiny

> And it's compact due to the fact that all opcode variants with constants
>  collapse to just one implementation of the functions body. It's so
> compact, that my ancient gcc 2.95.2 even can compile it -O3, which
> didn't succeed with core_ops_cg.c.

Oooooh. Shinier

I had a (possibly) impractical idea - computed goto / JIT
(or even computed goto / prederef / jit) core

I don't know whether this is possible:

Inside a cgoto core have 1 extra op - enter JITted section.

The bytecode is "compiled" by the JIT (at some point) - if there are a run
of consecutive JIT-able ops, then issue a section (an isolated section) of
machine code for those ops, and replace those ops in the bytecode with an op
that calls that section. If there isn't a run of JIT-able ops, then just
leave the ops as ops, and use the regular computed goto core dispatch.

I'd envisage it becoming a win if PBC often ends up with tight, isolated
loops that use a lot of JITted ops, but most of the code is ops we've not
written JITted versions of. The sections with the loops are converted to
native code, and the rest of the ops still gets the benefit of your sterling
efforts at improving the "regular" bytecode dispatch.

Nicholas Clark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jason Gloudon  
View profile  
 More options Feb 8 2003, 10:48 am
Newsgroups: perl.perl6.internals
From: p...@gloudon.com (Jason Gloudon)
Date: Sat, 8 Feb 2003 10:35:07 -0500
Local: Sat, Feb 8 2003 10:35 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

On Sat, Feb 08, 2003 at 03:10:58PM +0000, Nicholas Clark wrote:
> The bytecode is "compiled" by the JIT (at some point) - if there are a run
> of consecutive JIT-able ops, then issue a section (an isolated section) of
> machine code for those ops, and replace those ops in the bytecode with an op
> that calls that section. If there isn't a run of JIT-able ops, then just
> leave the ops as ops, and use the regular computed goto core dispatch.

Yep. That's the sort of use I created the enternative op for. Right now it's
used by the compiled C code generator. Basically the entry points to basic
blocks are replaced with enternative calls so that the transition to compiled
basic blocks of ops code happens transparently to the interpreter.

--
Jason


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 8 2003, 12:48 pm
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Sat, 08 Feb 2003 17:50:01 +0100
Local: Sat, Feb 8 2003 11:50 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Nicholas Clark wrote:
> I had a (possibly) impractical idea - computed goto / JIT
> (or even computed goto / prederef / jit) core

> I don't know whether this is possible:

> Inside a cgoto core have 1 extra op - enter JITted section.

> The bytecode is "compiled" by the JIT (at some point) - if there are a run
> of consecutive JIT-able ops, then issue a section (an isolated section) of
> machine code for those ops, and replace those ops in the bytecode with an op
> that calls that section. If there isn't a run of JIT-able ops, then just
> leave the ops as ops, and use the regular computed goto core dispatch.

This would need non trivial changes to JIT, for building only a section
of code (JIT isn't really (J)ust ;-)

But before going a rather complicated way, I would implement more JITed
ops, as i386 already does with many vtable calls.

Or go the other way round: Run from JIT. If there is a sequence of non
JITable ops, convert these to a CGP section, which returns to JIT when
finished. This would save a lot of function calls to jit_normal_op.

> Nicholas Clark

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gopal V  
View profile  
 More options Feb 9 2003, 4:48 am
Newsgroups: perl.perl6.internals
From: gopal...@symonds.net (Gopal V)
Date: Sun, 9 Feb 2003 20:12:59 +0530
Local: Sun, Feb 9 2003 9:42 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop
If memory serves me right, Nicholas Clark wrote:

> I had a (possibly) impractical idea - computed goto / JIT
> (or even computed goto / prederef / jit) core

> I don't know whether this is possible:

> Inside a cgoto core have 1 extra op - enter JITted section.

> The bytecode is "compiled" by the JIT (at some point) - if there are a run
> of consecutive JIT-able ops, then issue a section (an isolated section) of
> machine code for those ops, and replace those ops in the bytecode with an op
> that calls that section. If there isn't a run of JIT-able ops, then just
> leave the ops as ops, and use the regular computed goto core dispatch.

Are you discussing some sort of unroller for native code for some
opcodes ? .. basic blocks ?. Anything too complicated to unroll is
replaced by a jump back into the interpreter core ?..

This strategy has been used in pnet's engine (which is still a souped up
interpreter) ... the motivation was ease of porting unrollers than full
JITs, ie opcode by opcode .

> I'd envisage it becoming a win if PBC often ends up with tight, isolated
> loops that use a lot of JITted ops, but most of the code is ops we've not

See the Unrolling section on Rhys's paper for a detailed discussion on
this idea <http://www.southern-storm.com.au/download/pnet-engine.pdf>

Speaking truthfully , all I know is that it lifts pnetmark score for
loops by about 10 times .

Hope it helps,
Gopal
--
The difference between insanity and genius is measured by success


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 9 2003, 12:00 pm
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Sun, 09 Feb 2003 17:51:02 +0100
Local: Sun, Feb 9 2003 11:51 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Jerome Vouillon wrote:
> On Thu, Feb 06, 2003 at 01:37:42PM +0100, Leopold Toetsch wrote:

>>This is one thing I allways wanted to try ;-)

>>fast_core MOps: 11
>>Prederef:       17.5
>>CGoto MOps:     19.4
>>CGP MOps:       27.5
>>CGP -O3 MOps:   65 !!!1
> Why not also predereference the operation address?  This would save a
> memory read.  The goto would become:

>       goto **(cur_opcode += 4);

Many thanks for this hint. I did write "I won't do it" but ehem, yes,
here it is:

CGP MOps          34.5
CGP MOps -O3:     92.3

This is now the sub_i_i_i (-O3)

     0x80e9060 <cgp_core+9808>:   mov    0x4(%esi),%ecx
     0x80e9063 <cgp_core+9811>:   mov    0x8(%esi),%edx
     0x80e9066 <cgp_core+9814>:   mov    0xc(%esi),%eax
     0x80e9069 <cgp_core+9817>:   add    $0x10,%esi
     0x80e906c <cgp_core+9820>:   mov    (%eax),%eax
     0x80e906e <cgp_core+9822>:   mov    (%edx),%edx
     0x80e9070 <cgp_core+9824>:   sub    %eax,%edx
     0x80e9072 <cgp_core+9826>:   mov    %edx,(%ecx)
     0x80e9074 <cgp_core+9828>:   jmp    *(%esi)

So this saved 2 instructions including the memory access

This is the branch from mops.pasm
     0x80e7ff0 <cgp_core+5600>:   mov    0x4(%esi),%eax
     0x80e7ff3 <cgp_core+5603>:   cmpl   $0x0,(%eax)
     0x80e7ff6 <cgp_core+5606>:   je     0x80e8010 <cgp_core+5632>
     0x80e7ff8 <cgp_core+5608>:   mov    0x8(%esi),%eax
     0x80e7ffb <cgp_core+5611>:   mov    (%eax),%eax
     0x80e7ffd <cgp_core+5613>:   shl    $0x2,%eax
     0x80e8000 <cgp_core+5616>:   add    %eax,%esi
     0x80e8002 <cgp_core+5618>:   jmp    *(%esi)

leo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Simon Glover  
View profile  
 More options Feb 10 2003, 4:48 pm
Newsgroups: perl.perl6.internals
From: s...@amnh.org (Simon Glover)
Date: Mon, 10 Feb 2003 15:52:49 -0500 (EST)
Local: Mon, Feb 10 2003 3:52 pm
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

 Hi all.

 The new CGP code causes compilation of interpreter.c to fail if
 HAVE_COMPUTED_GOTO is undefined. This is because it references
 Parrot_DynOp_core_cgp_0_0_9, which is defined in core_ops_cgp.h,
 and the latter is only included when HAVE_COMPUTED_GOTO is defined. This
 bug appears to be responsible for most of the current breakage in the
 tinderbox; however, it can also be provoked by a standard Linux/x86/gcc
 combination if you configure without computed goto.

 The patch below fixes the problem, but I'd prefer some comments from the
 experts before I commit it.

 Cheers,
 Simon

--- interpreter.c.old   Mon Feb 10 15:41:10 2003
+++ interpreter.c       Mon Feb 10 15:41:21 2003
@@ -15,13 +15,13 @@
 #include "parrot/interp_guts.h"
 #include "parrot/oplib/core_ops.h"
 #include "parrot/oplib/core_ops_prederef.h"
+#include "parrot/oplib/core_ops_cgp.h"
 #include "parrot/runops_cores.h"
 #ifdef HAS_JIT
 #  include "parrot/jit.h"
 #endif
 #ifdef HAVE_COMPUTED_GOTO
 #  include "parrot/oplib/core_ops_cg.h"
-#  include "parrot/oplib/core_ops_cgp.h"
 #endif
 #include "parrot/method_util.h"


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 11 2003, 3:48 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Tue, 11 Feb 2003 09:06:18 +0100
Local: Tues, Feb 11 2003 3:06 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Simon Glover wrote:
>  Hi all.

>  The new CGP code causes compilation of interpreter.c to fail if
>  HAVE_COMPUTED_GOTO is undefined.

Ah, yes thanks. I have checked in a fix.

leo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 11 2003, 5:50 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Tue, 11 Feb 2003 10:49:14 +0100
Local: Tues, Feb 11 2003 4:49 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Leopold Toetsch wrote:
> Nicholas Clark wrote:
>> Inside a cgoto core have 1 extra op - enter JITted section.
> Or go the other way round: Run from JIT. If there is a sequence of non
> JITable ops, convert these to a CGP section, which returns to JIT when
> finished. This would save a lot of function calls to jit_normal_op.

I have thought about this, with a little help from ddd:

1)
opcode_t *
cgp_core(opcode_t *cur_op, struct Parrot_Interp *interpreter)
{
#ifdef __GNUC__
     register opcode_t *cur_opcode asm ("esi") = cur_op;
#else
     opcode_t *cur_opcode = cur_op;
#endif

This produces unoptimized almost the same code quality and speed as -O3.

The cur_opcode is in %esi, all operand access is done like in the posted
-O3 example:
$ parrot -P mops.pbc
82.019517 M op/s

2) There is one new opcode:

    B<jmp_to_eip inconst INT>

The argument is the native_ptr in JIT code, where to return from a
section of non JITed code.

    goto **(cur_opcode + 1);

The address is filled in by the JIT emit functions.

3) The Parrot_jit_begin() emits code to call cgp_core (alas setting up
the same stack frame as cgp_core) and B<jmp_to_eip> back to the address
after the function call

4) When there is a seqence (more then 1) non JITed ops,
Parrot_jit_normal_op emits code to calculate %esi (the *cur_opcode) in
the prederefed jump table and *jumps* there. The end of the section is
above jmp_to_eip instruction. This implies, that after a non JITed
section, the JITed section is at least two opcodes sized, to have room
to fill in this jump.

5) non JITed branches do not fit very nicely in this scheme, but there
are several possible ways to handle these:
- make all branches JITted
- generate another core (cgp_jit_core), which does the right thing
- always emit code by Parrot_jit_cpcf_op() for the last opcode in the
section (cgp_core would only be used if the nonJITed section is >= 3
instructions)
- if both ends of the branch are non JITed sections do nothing, just
stay in the cgp_core and patch the ends of these branches to return to
JIT (brrr)

6) and finally the prederefed B<end> opcode gets jumped to by code
emitted from Parrot_end_jit, to clean up the cgp_core stack frame and
return from JIT.

So we would not have any function call overhead and getting the best
performance by combining the 2 fastest run cores.

This approach would of course need some architecture/compiler specific
hacks, but JIT is such a hack anyway. OTOH it is almost totally
encapsulated in the architecture jit file, so it *can* be implemented
but there is no need to do so.

Comments welcome,
leo


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicholas Clark  
View profile  
 More options Feb 11 2003, 4:50 pm
Newsgroups: perl.perl6.internals
From: n...@unfortu.net (Nicholas Clark)
Date: Tue, 11 Feb 2003 20:41:14 +0000
Local: Tues, Feb 11 2003 3:41 pm
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

On Tue, Feb 11, 2003 at 10:49:14AM +0100, Leopold Toetsch wrote:
> Leopold Toetsch wrote:

> >Nicholas Clark wrote:

> >>Inside a cgoto core have 1 extra op - enter JITted section.

> >Or go the other way round: Run from JIT. If there is a sequence of non
> >JITable ops, convert these to a CGP section, which returns to JIT when
> >finished. This would save a lot of function calls to jit_normal_op.

I failed to follow most of the specific details, and all of the x86 specific
stuff.

> So we would not have any function call overhead and getting the best
> performance by combining the 2 fastest run cores.

> This approach would of course need some architecture/compiler specific
> hacks, but JIT is such a hack anyway. OTOH it is almost totally
> encapsulated in the architecture jit file, so it *can* be implemented
> but there is no need to do so.

> Comments welcome,

The idea actually works at all?
And it goes faster than the prederef computed goto core?

So, in comparison, how fast is Python...

Nicholas Clark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 11 2003, 5:49 pm
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Tue, 11 Feb 2003 23:13:14 +0100
Local: Tues, Feb 11 2003 5:13 pm
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Nicholas Clark wrote:
> On Tue, Feb 11, 2003 at 10:49:14AM +0100, Leopold Toetsch wrote:
> I failed to follow most of the specific details, and all of the x86 specific
> stuff.

Basically, all non JITed opcodes, which are now called functions
(Parrot_jit_normal_op) would instead get jumped to in the cgp_core and
from there operation in JITcode again will continue with a jump.

> The idea actually works at all?

I think so.

> And it goes faster than the prederef computed goto core?

Should be so, yes. When there are a lot of JITed ops (mainly integers
and floats, kept in registers) JIT is faster then native -O3 compiled C.
(I hope that the optimizer in imcc can turn a lot of plain perl code
into native ints/floats).
When there are many nonJITted functions calls (e.g. string_concat), the
CGP core wins compared to JIT. So combining (again :) the 2 fastest
concepts give you the best of both.

> So, in comparison, how fast is Python...

A little slower then perl5 (2.0 MOps)

$ python examples/mops/mops.py
M op/s:        1.70850480747

$ imcc -P examples/assembly/mops_p.pasm
M op/s:        40.195826

$ imcc -j examples/assembly/mops_p.pasm
M op/s:        68.099593

(I took the PMC based mops for comparison, and of course, this is

opcode dispatch only - or almost only)

parrot/imcc: -O3 compiled on i386/linux, Athlon 800.

> Nicholas Clark

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 13 2003, 10:51 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Thu, 13 Feb 2003 16:43:37 +0100
Local: Thurs, Feb 13 2003 10:43 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Nicholas Clark wrote:
> On Tue, Feb 11, 2003 at 10:49:14AM +0100, Leopold Toetsch wrote:

>>Leopold Toetsch wrote:
>>>Or go the other way round: Run from JIT. If there is a sequence of non
>>>JITable ops, convert these to a CGP section, which returns to JIT when
>>>finished. This would save a lot of function calls to jit_normal_op.
> The idea actually works at all?

I have it running now.
JIT/i386 uses the stackframe of CGP for its own. When there is a
sequence of non-JITed functions, JIT code braanches directly into the
CGP core and executes the CGP ops. Going back to JIT is a asm("ret"),
which is a new (the 1000th !) opcode.

> And it goes faster than the prederef computed goto core?

Yep, though I don't have many test cases:
$ parrot -P life.pbc
5000 generations in 5.629500 seconds. 888.178341 generations/sec

$ parrot -j life.pbc
5000 generations in 5.290360 seconds. 945.115271 generations/sec      

$ cd languages/perl6 ; perl6 -C -O3 ../../life.pasm
5000 generations in 5.154097 seconds. 970.102045 generations/sec

parrot/imcc: -O3 compiled on i386/linux, Athlon 800.

Should I commit it or send to the list first?

> Nicholas Clark

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicholas Clark  
View profile  
 More options Feb 13 2003, 11:49 am
Newsgroups: perl.perl6.internals
From: n...@ccl4.org (Nicholas Clark)
Date: Thu, 13 Feb 2003 15:50:37 +0000
Local: Thurs, Feb 13 2003 10:50 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

On Thu, Feb 13, 2003 at 04:43:37PM +0100, Leopold Toetsch wrote:
> Nicholas Clark wrote:
> > The idea actually works at all?

> I have it running now.

Can you write an opcode that solves the halting problem? :-)

> JIT/i386 uses the stackframe of CGP for its own. When there is a
> sequence of non-JITed functions, JIT code braanches directly into the
> CGP core and executes the CGP ops. Going back to JIT is a asm("ret"),
> which is a new (the 1000th !) opcode.

That sounds hairy. :-)

> > And it goes faster than the prederef computed goto core?

> Yep, though I don't have many test cases:
> $ parrot -P life.pbc
> 5000 generations in 5.629500 seconds. 888.178341 generations/sec

> $ parrot -j life.pbc
> 5000 generations in 5.290360 seconds. 945.115271 generations/sec      

> $ cd languages/perl6 ; perl6 -C -O3 ../../life.pasm
> 5000 generations in 5.154097 seconds. 970.102045 generations/sec

> parrot/imcc: -O3 compiled on i386/linux, Athlon 800.

Presumably life.pasm mainly uses ops that already have JIT implementations.
Do you have examples of code that doesn't have many OPs, and "currently"
goes faster on x86 under CGoto rather than JIT? These would be the
interesting ones.

> Should I commit it or send to the list first?

I don't know. Are you confident in it? Does it pass all Parrot's regression
tests?

Nicholas Clark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 13 2003, 1:09 pm
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Thu, 13 Feb 2003 18:34:55 +0100
Local: Thurs, Feb 13 2003 12:34 pm
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Nicholas Clark wrote:
> On Thu, Feb 13, 2003 at 04:43:37PM +0100, Leopold Toetsch wrote:

>>Nicholas Clark wrote:
>>JIT/i386 uses the stackframe of CGP for its own. When there is a
>>sequence of non-JITed functions, JIT code braanches directly into the
>>CGP core and executes the CGP ops. Going back to JIT is a asm("ret"),
>>which is a new (the 1000th !) opcode.
> That sounds hairy. :-)

It's not utterly complex. Though single-stepping through a short program
with ddd might be helpful ;-)

> Presumably life.pasm mainly uses ops that already have JIT implementations.

Yep.

> Do you have examples of code that doesn't have many OPs, and "currently"
> goes faster on x86 under CGoto rather than JIT? These would be the
> interesting ones.

IIRC perl6/examples/life.p6 was slower with JIT. Now JIT wins, but of
course mainly due to more JITed opcodes.
No I don't have a good test for this.

>>Should I commit it or send to the list first?

> I don't know. Are you confident in it? Does it pass all Parrot's regression
> tests?

It passes parrot's, imcc's and perl6 tests, with -g and with -O3.

> Nicholas Clark

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicholas Clark  
View profile  
 More options Feb 13 2003, 5:48 pm
Newsgroups: perl.perl6.internals
From: n...@unfortu.net (Nicholas Clark)
Date: Thu, 13 Feb 2003 22:20:27 +0000
Local: Thurs, Feb 13 2003 5:20 pm
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

On Thu, Feb 13, 2003 at 06:34:55PM +0100, Leopold Toetsch wrote:
> Nicholas Clark wrote:

> >On Thu, Feb 13, 2003 at 04:43:37PM +0100, Leopold Toetsch wrote:
> >>Should I commit it or send to the list first?

> >I don't know. Are you confident in it? Does it pass all Parrot's regression
> >tests?

> It passes parrot's, imcc's and perl6 tests, with -g and with -O3.

I'd suggest committing it, rather than sending it to the list.
Unless there are extra comments you'd want to make to the list. But even
then, I think it would be better to write them into a design doc (or whatever
the correct place is) and just tell the list where to read it from the
checkout.

Sorry. Thinking "aloud"

Nicholas Clark


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Leopold Toetsch  
View profile  
 More options Feb 14 2003, 7:48 am
Newsgroups: perl.perl6.internals
From: l...@toetsch.at (Leopold Toetsch)
Date: Fri, 14 Feb 2003 13:23:59 +0100
Local: Fri, Feb 14 2003 7:23 am
Subject: Re: [CVS ci] CGP - CGoto Prederefed runloop

Nicholas Clark wrote:

> I'd suggest committing it, rather than sending it to the list.

Ok, I'll do that after checking again on a second machine

> ... I think it would be better to write them into a design doc

I'll include docs/dev/jit_i386.dev with has most of the gory details.

> Nicholas Clark

leo

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »