Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ARM Jit v2

10 views
Skip to first unread message

Nicholas Clark

unread,
Jul 29, 2002, 6:03:05 PM7/29/02
to perl6-i...@perl.org
Here's a very minimal ARM jit framework. It does work (at least as far as
passing all 10 t/op/basic.t subtests, and running mops.pbc)

As you can see from the patch all it does is implement the end and noop ops.
Everything else is being called. Interestingly, JITing like this is slower
than computed goto:

computed goto:

$ ./parrot examples/assembly/mops.pbc
Iterations: 100000000
Estimated ops: 200000000
Elapsed time: 37.209835
M op/s: 5.374923

no computed goto:

$ ./parrot -g examples/assembly/mops.pbc
Iterations: 100000000
Estimated ops: 200000000
Elapsed time: 71.245085
M op/s: 2.807211

JIT:

$ ./parrot -j examples/assembly/mops.pbc
Iterations: 100000000
Estimated ops: 200000000
Elapsed time: 53.474880
M op/s: 3.740074

JIT with ARM_K_BUG, to generate code that doesn't tickle the page faulting
related bug in the K StrongARM:

$ ./parrot -j examples/assembly/mops.pbc
Iterations: 100000000
Estimated ops: 200000000
Elapsed time: 56.142425
M op/s: 3.562368

I doubt in its current form this is quite ready to go in. Points I'd like to
raise

0: I've only implemented generator code fully for 1 class of instructions
(load/store multiple registers), partially for a second (load/store
single registers, and hard coded the minimal set of other things I
needed. I'll replaced these with fully featured versions, now that I'm
happy that the concept works

1: The most optimal code I could think of to call external functions sets
everything up by loading arguments into registers and function address
into PC a single load multiple instruction. (plus setting the return
address in the link register, by using the link register as the base
register for the load). All that in 1 instruction, plus a second to prime
LR for the load. (This is why I like it)

However, this is the form of instruction that can trigger bugs on the
(very early) K version StrongARMs. (if it page faults midway) Probably
the rest of the world doesn't have these (unless they have machines
dating from 1996 or so) but I do have one, so it is an important itch for
me. ARM_K_BUG is a symbol to define to generate code that cannot cause
the bug.

2: This code probably is the ARM assembler version of a JAPH, in that I've
not actually found the need (yet) to use any branch instructions. They
do exist! It's just that I find I can do it all so far with loads. :-)

3: The code as is issues casting warnings and 3 warnings about unprototyped
functions. (which I think can be static)

4: I'd really like the type of the pointer for the native code to be
machine chosen. char* isn't the most appropriate type for ARM code -
all instructions are word sized (32 bits) and must all be word aligned,
so I'd really like to be fabricating them in ints, and writing to an int*
in one blat.

5: The symbol TESTING was so that I could #include "jit_emit.h" in a test C
program to check my generator (by spitting a buffer out into a $file, and
then disassembling it with objdump -b binary -m arm -D $file

6: ARMs with separate I and D caches need to sync them before running code.
(else it all goes SEGV shaped with really really weird backtraces)
I don't think there's any official Linux function wrapper round the
ARM Linux syscal necessary to do this, hence the function with the inline
assembler. I'm not sure if there is a better way to do this.
[optional .s file in the architecture's jit directory, which the jit
installer can copy if it finds?]

7: Debian define the archname on their perl as "arm", whereas building from
the source tree gets me armv4l (from uname) hence the substitution for
armv[34]l? down to arm. I do have a machine with an ARM3 here (which I
think would be armv2) but it is 14 years old, and doesn't currently have
Linux on it (or a compiler for RISC OS, and I'm not feeling up to
attempting a RISC OS port for parrot just to experiment with JITs)
It's probably quite feasible to make the JIT work on everything back to
the ARM2 (ARM1 was the prototype and I believe was never used in any
hardware available outside Acorn, and IIRC all ARM1 doesn't have is the
multiply instruction, so it could be done)

Apart from all of that, the JIT version 2 looks much more flexible than
JIT version 1 - thanks Daniel.

I'll start writing some real JIT ops over the next few days, although
possibly only for the ops mops and life use :-)
[although I strongly suspect that JITting the ops the regexps compile down
to would be the first real world JIT priority. How fast would perl6 regexps
be with that?]

Oh, and prepare an acceptable version of this patch once people decide what
is acceptable

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

--- /dev/null Mon Jul 16 22:57:44 2001
+++ jit/arm/core.jit Mon Jul 29 00:14:30 2002
@@ -0,0 +1,26 @@
+;
+; arm/core.jit
+;
+; $Id: core.jit,v 1.4 2002/05/20 05:32:58 grunblatt Exp $
+;
+
+Parrot_noop {
+ emit_nop(jit_info->native_ptr);
+}
+
+; ldmea fp, {r4, r5, r6, r7, fp, sp, pc
+; but K bug Grr if I load pc direct.
+
+Parrot_end {
+ jit_info->native_ptr = emit_ldmstm (jit_info->native_ptr,
+ cond_AL, is_load, dir_EA, 0, 0,
+ REG11_fp,
+ reg2mask(4) | reg2mask(REG11_fp)
+ | reg2mask(REG13_sp)
+ #ifndef ARM_K_BUG
+ | reg2mask(REG15_pc));
+ #else
+ | reg2mask(REG14_lr));
+ emit_mov(jit_info->native_ptr, REG15_pc, REG14_lr);
+ #endif
+}
--- /dev/null Mon Jul 16 22:57:44 2001
+++ jit/arm/jit_emit.h Mon Jul 29 22:23:37 2002
@@ -0,0 +1,293 @@
+/*
+** jit_emit.h
+**
+** ARM (v3 and later - maybe this can easily be unified to v1)
+**
+** $Id: jit_emit.h,v 1.3 2002/07/04 21:32:12 mrjoltcola Exp $
+**/
+
+/* I'll use mov r0, r0 as my NOP for now. */
+
+typedef enum {
+ cond_EQ = 0x00,
+ cond_NE = 0x10,
+ cond_CS = 0x20,
+ cond_CC = 0x30,
+ cond_MI = 0x40,
+ cond_PL = 0x50,
+ cond_VS = 0x60,
+ cond_VC = 0x70,
+ cond_HI = 0x80,
+ cond_LS = 0x90,
+ cond_GE = 0xA0,
+ cond_LT = 0xB0,
+ cond_GT = 0xC0,
+ cond_LE = 0xD0,
+ cond_AL = 0xE0,
+/* cond_NV = 0xF0, */
+ cond_HS = 0x20,
+ cond_LO = 0x30
+} cont_t;
+
+typedef enum {
+ REG10_sl = 10,
+ REG11_fp = 11,
+ REG12_ip = 12,
+ REG13_sp = 13,
+ REG14_lr = 14,
+ REG15_pc = 15
+} arm_register_t;
+
+#define emit_nop(pc) emit_mov (pc, 0, 0)
+
+#define emit_mov(pc, dest, src) { \
+ *(pc++) = 0x00 | src; \
+ *(pc++) = dest << 4; \
+ *(pc++) = 0xA0; \
+ *(pc++) = cond_AL | 1; }
+
+#define emit_sub4(pc, dest, src) { \
+ *(pc++) = 0x04; \
+ *(pc++) = dest << 4; \
+ *(pc++) = 0x40 | src; \
+ *(pc++) = cond_AL | 2; }
+
+#define emit_add4(pc, dest, src) { \
+ *(pc++) = 0x04; \
+ *(pc++) = dest << 4; \
+ *(pc++) = 0x80 | src; \
+ *(pc++) = cond_AL | 2; }
+
+#define emit_dcd(pc, word) { \
+ *((int *)pc) = word; \
+ pc+=4; }
+
+#define reg2mask(reg) (1<<(reg))
+
+#define is_store 0x00
+#define is_load 0x10
+#define is_writeback 0x20
+#define is_caret 0x40 /* assembler syntax is ^ - load sets status flags in
+ USR mode, or load/store use user bank registers
+ in other mode. IIRC. */
+#define is_byte 0x40
+#define is_pre 0x01 /* pre index addressing. */
+#define is_post 0x00 /* post indexed addressing. ie arithmetic for free */
+
+/* multiple register transfer direction.
+ D = decrease, I = increase
+ A = after, B = before
+ or the stack notation
+ FD = full descending (the usual)
+ ED = empty descending
+ FA = full ascending
+ FD = full descending
+ values for stack notation are 0x10 | (ldm type) << 2 | (stm type)
+*/
+typedef enum {
+ dir_DA = 0,
+ dir_IA = 1,
+ dir_DB = 2,
+ dir_IB = 3,
+ dir_FD = 0x10 | (1 << 2) | 2,
+ dir_FA = 0x10 | (0 << 2) | 3,
+ dir_ED = 0x10 | (3 << 2) | 0,
+ dir_EA = 0x10 | (2 << 2) | 1
+} ldm_stm_dir_t;
+
+typedef enum {
+ dir_Up = 0x80,
+ dir_Down = 0x00
+} ldr_str_dir_t;
+
+char *
+emit_ldmstm(char *pc,
+ int cond,
+ int l_s,
+ ldm_stm_dir_t direction,
+ int caret,
+ int writeback,
+ int base,
+ int regmask) {
+ if ((l_s == is_load) && (direction & 0x10))
+ direction >>= 2;
+
+ *(pc++) = regmask;
+ *(pc++) = regmask >> 8;
+ /* bottom bit of direction is the up/down flag. */
+ *(pc++) = ((direction & 1) << 7) | caret | writeback | l_s | base;
+ /* binary 100x is code for stm/ldm. */
+ /* Top bit of direction is pre/post increment flag. */
+ *(pc++) = cond | 0x8 | ((direction >> 1) & 1);
+ return pc;
+}
+
+char *
+emit_ldrstr(char *pc,
+ int cond,
+ int l_s,
+ ldr_str_dir_t direction,
+ int pre,
+ int writeback,
+ int byte,
+ int dest,
+ int base,
+ int offset_type,
+ unsigned int offset) {
+
+ *(pc++) = offset;
+ *(pc++) = ((offset >> 8) & 0xF) | (dest << 4);
+ *(pc++) = direction | byte | writeback | l_s | base;
+ *(pc++) = cond | 0x4 | offset_type | pre;
+ return pc;
+}
+
+char *
+emit_ldrstr_offset (char *pc,
+ int cond,
+ int l_s,
+ int pre,
+ int writeback,
+ int byte,
+ int dest,
+ int base,
+ int offset) {
+ ldr_str_dir_t direction = dir_Up;
+#ifndef TESTING
+ if (offset > 4095 || offset < -4095) {
+ internal_exception(JIT_ERROR,
+ "Unable to generate offsets > 4095\n" );
+ }
+#endif
+ if (offset < 0) {
+ direction = dir_Down;
+ offset = -offset;
+ }
+ return emit_ldrstr(pc, cond, l_s, direction, pre, writeback, byte, dest,
+ base, 0, offset);
+}
+
+void Parrot_jit_dofixup(Parrot_jit_info *jit_info,
+ struct Parrot_Interp * interpreter)
+{
+ /* Todo. */
+}
+/* My entry code is create a stack frame:
+ mov ip, sp
+ stmfd sp!, {r4, fp, ip, lr, pc}
+ sub fp, ip, #4
+ Then store the first parameter (pointer to the interpreter) in r4.
+ mov r4, r0
+*/
+
+void
+Parrot_jit_begin(Parrot_jit_info *jit_info,
+ struct Parrot_Interp * interpreter)
+{
+ emit_mov (jit_info->native_ptr, REG12_ip, REG13_sp);
+ jit_info->native_ptr = emit_ldmstm (jit_info->native_ptr,
+ cond_AL, is_store, dir_FD, 0,
+ is_writeback,
+ REG13_sp,
+ reg2mask(4) | reg2mask(REG11_fp)
+ | reg2mask(REG12_ip)
+ | reg2mask(REG14_lr)
+ | reg2mask(REG15_pc));
+ emit_sub4 (jit_info->native_ptr, REG11_fp, REG12_ip);
+ emit_mov (jit_info->native_ptr, 4, 0);
+}
+
+/* I'm going to load registers to call functions in general like this:
+ adr r14, .L1
+ ldmia r14!, {r0, r1, r2, pc} ; register list built by jit
+ .L1: r0 data
+ r1 data
+ r2 data
+ <where ever> ; address of function.
+ .L2: ; next instruction - return point from func.
+
+ # here I'm going to do
+
+ mov r1, r4 ; current interpreter is arg 1
+ adr r14, .L1
+ ldmia r14!, {r0, pc}
+ .L1: address of current opcode
+ <where ever> ; address of function for op
+ .L2: ; next instruction - return point from func.
+*/
+
+/*
+XXX no.
+need to adr beyond:
+
+ mov r1, r4 ; current interpreter is arg 1
+ adr r14, .L1
+ ldmda r14!, {r0, ip}
+ mov pc, ip
+ .L1 address of current opcode
+ dcd <where ever> ; address of function for op
+ .L2: ; next instruction - return point from func.
+*/
+void
+Parrot_jit_normal_op(Parrot_jit_info *jit_info,
+ struct Parrot_Interp * interpreter)
+{
+ emit_mov (jit_info->native_ptr, 1, 4);
+#ifndef ARM_K_BUG
+ emit_mov (jit_info->native_ptr, REG14_lr, REG15_pc);
+#else
+ emit_add4 (jit_info->native_ptr, REG14_lr, REG15_pc);
+#endif
+ jit_info->native_ptr = emit_ldmstm (jit_info->native_ptr,
+ cond_AL, is_load, dir_IA, 0,
+ is_writeback,
+ REG14_lr,
+ reg2mask(0)
+#ifndef ARM_K_BUG
+ | reg2mask(REG15_pc)
+#else
+ | reg2mask(REG12_ip)
+#endif
+ );
+#ifdef ARM_K_BUG
+ emit_mov (jit_info->native_ptr, REG15_pc, REG12_ip);
+#endif
+ emit_dcd (jit_info->native_ptr, (int) jit_info->cur_op);
+ emit_dcd (jit_info->native_ptr,
+ (int) interpreter->op_func_table[*(jit_info->cur_op)]);
+}
+
+/* We get back address of opcode in bytecode.
+ We want address of equivalent bit of jit code, which is stored as an
+ address at the same offset in a jit table. */
+void Parrot_jit_cpcf_op(Parrot_jit_info *jit_info,
+ struct Parrot_Interp * interpreter)
+{
+ Parrot_jit_normal_op(jit_info, interpreter);
+
+ /* This is effectively the pseudo-opcode ldr - ie load relative to PC.
+ So offset includes pipeline. */
+ jit_info->native_ptr = emit_ldrstr_offset (jit_info->native_ptr, cond_AL,
+ is_load, is_pre, 0, 0,
+ REG14_lr, REG15_pc, 0);
+ /* ldr pc, [r14, r0] */
+ /* lazy. this is offset type 0, 0x000 which is r0 with zero shift */
+ jit_info->native_ptr = emit_ldrstr (jit_info->native_ptr, cond_AL,
+ is_load, dir_Up, is_pre, 0, 0,
+ REG15_pc, REG14_lr, 2, 0);
+ /* and this "instruction" is never reached, so we can use it to store
+ the constant that we load into r14 */
+ emit_dcd (jit_info->native_ptr,
+ ((long) jit_info->op_map) -
+ ((long) interpreter->code->byte_code));
+}
+
+/*
+ * Local variables:
+ * c-indentation-style: bsd
+ * c-basic-offset: 4
+ * indent-tabs-mode: nil
+ * End:
+ *
+ * vim: expandtab shiftwidth=4:
+ */
--- jit.c~ Tue Jul 23 19:18:41 2002
+++ jit.c Mon Jul 29 21:46:44 2002
@@ -128,6 +128,63 @@ optimize_jit(struct Parrot_Interp *inter
return optimizer;
}

+#ifdef ARM
+static void
+arm_sync_d_i_cache (void *start, void *end) {
+/* Strictly this is only needed for StrongARM and later (not sure about ARM8)
+ because earlier cores don't have separate D and I caches.
+ However there aren't that many ARM7 or earlier devices around that we'll be
+ running on. */
+#ifdef __linux
+#ifdef __GNUC__
+ int result;
+ /* swi call based on code snippet from Russell King. Description
+ verbatim: */
+ /*
+ * Flush a region from virtual address 'r0' to virtual address 'r1'
+ * _inclusive_. There is no alignment requirement on either address;
+ * user space does not need to know the hardware cache layout.
+ *
+ * r2 contains flags. It should ALWAYS be passed as ZERO until it
+ * is defined to be something else. For now we ignore it, but may
+ * the fires of hell burn in your belly if you break this rule. ;)
+ *
+ * (at a later date, we may want to allow this call to not flush
+ * various aspects of the cache. Passing '0' will guarantee that
+ * everything necessary gets flushed to maintain consistency in
+ * the specified region).
+ */
+
+ /* The value of the SWI is actually available by in
+ __ARM_NR_cacheflush defined in <asm/unistd.h>, but quite how to
+ get that to interpolate as a number into the ASM string is beyond
+ me. */
+ /* I'm actually passing in exclusive end address, so subtract 1 from
+ it inside the assembler. */
+ __asm__ __volatile__ (
+ "mov r0, %1\n"
+ "sub r1, %2, #1\n"
+ "mov r2, #0\n"
+ "swi 0x9f0002\n"
+ "mov %0, r0\n"
+ : "=r" (result)
+ : "r" ((long)start), "r" ((long)end)
+ : "r0","r1","r2");
+
+ if (result < 0) {
+ internal_exception(JIT_ERROR,
+ "Synchronising I and D caches failed with errno=%d\n",
+ -result);
+ }
+#else
+#error "ARM needs to sync D and I caches, and I don't know how to embed assmbler on this C compiler"
+#endif
+#else
+/* Not strictly true - on RISC OS it's OS_SynchroniseCodeAreas */
+#error "ARM needs to sync D and I caches, and I don't know how to on this OS"
+#endif
+}
+#endif

/*
** build_asm()
@@ -214,6 +271,9 @@ build_asm(struct Parrot_Interp *interpre
}
}

+#ifdef ARM
+ arm_sync_d_i_cache (jit_info.arena_start, jit_info.native_ptr);
+#endif
return (jit_f)jit_info.arena_start;
}

--- config/auto/jit.pl.orig Sat Jul 13 22:39:40 2002
+++ config/auto/jit.pl Mon Jul 29 00:08:22 2002
@@ -42,11 +42,14 @@ sub runstep {
$cpuarch = 'i386';
}

+ $cpuarch =~ s/armv[34]l?/arm/i;
+
Configure::Data->set(
archname => $archname,
cpuarch => $cpuarch,
osname => $osname,
);
+

my $jitarchname = "$cpuarch-$osname";
$jitarchname =~ s/i[456]86/i386/i;

Daniel Grunblatt

unread,
Jul 29, 2002, 9:34:00 PM7/29/02
to Nicholas Clark, perl6-i...@perl.org
On Mon, 29 Jul 2002, Nicholas Clark wrote:

> Here's a very minimal ARM jit framework. It does work (at least as far as
> passing all 10 t/op/basic.t subtests, and running mops.pbc)

Cool, I have also been playing with ARM but your approach is in better
shape. (I'll send you a copy of what I got here anyway because it's bit
more documented and you might want to merge it).

> As you can see from the patch all it does is implement the end and noop ops.
> Everything else is being called. Interestingly, JITing like this is slower
> than computed goto:

Yes, function calls are generally slower than computing a goto.

Ofcourse I didn't even noticed about all those problem, I'm using TD's
ARM.

>
> I'll start writing some real JIT ops over the next few days, although
> possibly only for the ops mops and life use :-)

Yay!, the ARM will be the first one with string opcodes jitted, I'm
looking forward to see if we get good speed up.

> [although I strongly suspect that JITting the ops the regexps compile down
> to would be the first real world JIT priority. How fast would perl6 regexps
> be with that?]

Yes, that should be one of the priorities.

Daniel Grunblatt

unread,
Jul 29, 2002, 10:04:58 PM7/29/02
to Nicholas Clark, perl6-i...@perl.org
I thing I forgot to tell is that I also have added a constant pool which
could be usefull for the ARM too, it's on my local tree,I don't know
exactly when I'm going to finish it.

Daniel Grunblatt.

Dan Sugalski

unread,
Jul 30, 2002, 1:21:30 PM7/30/02
to Daniel Grunblatt, Nicholas Clark, perl6-i...@perl.org
At 10:34 PM -0300 7/29/02, Daniel Grunblatt wrote:
>On Mon, 29 Jul 2002, Nicholas Clark wrote:
> > As you can see from the patch all it does is implement the end
>and noop ops.
>> Everything else is being called. Interestingly, JITing like this is slower
>> than computed goto:
>
>Yes, function calls are generally slower than computing a goto.

Yup. There's the function preamble and postamble that get executed,
which can slow things down relative to computed goto, which doesn't
have to execute them.

This brings up an interesting point. Should we consider making at
least some of the smaller utility functions JITtable? Not the opcode
functions, but things in string.c or pmc.c perhaps. (Or maybe getting
them inlined would be sufficient for us)
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Daniel Grunblatt

unread,
Jul 30, 2002, 2:14:48 PM7/30/02
to Dan Sugalski, Nicholas Clark, perl6-i...@perl.org
Yes, we can do that, we can also try to go in and out from the computed
goto core if available.

Daniel Grunblatt.

Nicholas Clark

unread,
Jul 30, 2002, 5:20:51 PM7/30/02
to Daniel Grunblatt, perl6-i...@perl.org
On Mon, Jul 29, 2002 at 10:34:00PM -0300, Daniel Grunblatt wrote:
> On Mon, 29 Jul 2002, Nicholas Clark wrote:
>
> > Here's a very minimal ARM jit framework. It does work (at least as far as
> > passing all 10 t/op/basic.t subtests, and running mops.pbc)
>
> Cool, I have also been playing with ARM but your approach is in better
> shape. (I'll send you a copy of what I got here anyway because it's bit
> more documented and you might want to merge it).

It's very documented, and I did merge it. Thanks. Expect to recognise large
chunks of it in a day or two when I get to a suitable point to submit a
better patch.

> Yes, function calls are generally slower than computing a goto.

> > 7: Debian define the archname on their perl as "arm", whereas building from
> > the source tree gets me armv4l (from uname) hence the substitution for
> > armv[34]l? down to arm. I do have a machine with an ARM3 here (which I
> > think would be armv2) but it is 14 years old, and doesn't currently have
> > Linux on it (or a compiler for RISC OS, and I'm not feeling up to
> > attempting a RISC OS port for parrot just to experiment with JITs)
> > It's probably quite feasible to make the JIT work on everything back to
> > the ARM2 (ARM1 was the prototype and I believe was never used in any
> > hardware available outside Acorn, and IIRC all ARM1 doesn't have is the
> > multiply instruction, so it could be done)
>
> Ofcourse I didn't even noticed about all those problem, I'm using TD's
> ARM.

Well, I didn't notice it the first time I worked on the JIT. I found that
/usr/bin/perl decided I was on an "arm", and /usr/local/bin/perl decided
"arm4l". (version 4 instructions, plus long multiply)

> > I'll start writing some real JIT ops over the next few days, although
> > possibly only for the ops mops and life use :-)
>
> Yay!, the ARM will be the first one with string opcodes jitted, I'm
> looking forward to see if we get good speed up.

Er, because I'm going to be writing the string opcodes? :-)

> > Oh, and prepare an acceptable version of this patch once people decide what
> > is acceptable

Hint. Please. (Dan?)

Useful. I suspect I can live without it, with the temporary pain of extra
branches round inlined constants.

Nicholas Clark

unread,
Jul 30, 2002, 6:10:08 PM7/30/02
to perl6-i...@perl.org, Daniel Grunblatt, Dan Sugalski
On Mon, Jul 29, 2002 at 09:47:53PM +0000, Angel Faus wrote:
> I've made a patch for the regex engine, designed with the single goal
> of seriously cheating for speed. :-)

And what would be unperlish about this "cheating" concept? :-)

> Anyway, this patch has brought me the personal conviction that parrot
> regexes can be as fast or faster than their perl equivalents, if we
> put a bit of effort on optimitzation.

This sounds very promising. Thanks for doing this.

On Mon, Jul 29, 2002 at 10:34:00PM -0300, Daniel Grunblatt wrote:

On Tue, Jul 30, 2002 at 01:21:30PM -0400, Dan Sugalski wrote:
> At 10:34 PM -0300 7/29/02, Daniel Grunblatt wrote:

> >On Mon, 29 Jul 2002, Nicholas Clark wrote:
> > > As you can see from the patch all it does is implement the end
> >and noop ops.
> >> Everything else is being called. Interestingly, JITing like this is
> >> slower
> >> than computed goto:
> >
> >Yes, function calls are generally slower than computing a goto.
>

> Yup. There's the function preamble and postamble that get executed,
> which can slow things down relative to computed goto, which doesn't
> have to execute them.
>
> This brings up an interesting point. Should we consider making at
> least some of the smaller utility functions JITtable? Not the opcode
> functions, but things in string.c or pmc.c perhaps. (Or maybe getting
> them inlined would be sufficient for us)

I'm not sure. Effectively we'd need to define them well enough that we
can support n parallel implementations - 1 in C for "everyone else", and
1 per JIT architecture.

On Tue, Jul 30, 2002 at 03:14:48PM -0300, Daniel Grunblatt wrote:
> Yes, we can do that, we can also try to go in and out from the computed
> goto core if available.

This sounds rather scary to me.


I've managed to delete a message (I think by Simon Cozens) which said that
perl5 used to have a good speed advantage over the competition, but they'd
all been adding optimisations to their regexp engines (that we had already)
and now they're as fast.


My thoughts are:

If on a particular platform we don't have a JIT implementation for a
particular op then we have to JIT a call to that op's C implementation.
Do enough of these (what proportion?) and the computed goto core is faster
than the JIT.
I think (I could look at the source code, but that would break a fine Usenet
tradition) that for ops not in the computed goto core we have to make a call
from either JIT or computed goto core, so there's no speed difference on them.

*BUG* Also, we are currently failing to distinguish the size of INTVALs
and NUMVALs in our JIT names (so this makes 4 possible JIT variants on most
CPUs)

IIRC there are currently >500 ops in the core, which makes it unlikely that
we'll have sufficient people to JIT most ops on most CPUs. So we have the
danger of the JIT being slower in the general case.

IIRC the intent was that regexps would compile to regular parrot subs
I would assume (please correct me) that the regexp engine is only going to
use a small subset of parrot's ops in the subroutines it generates.

Therefore, can we make it so that on a JIT-enabled platform the JIT can be
run optionally over individual subs, replacing them with JITted versions?
That way, perl could compile its regexps to a sub of opcodes, and then we can
JIT that sub into native code. We'd only have to ensure that the majority
of regexp related ops (ie rx and core ops used) to be fast than the GC core.
(Rather than the majority of all the ops commonly used)

[We might want to make it a pragmatic hint flag on the regexp to say "please
try harder to optimise this regexp", with one form of "harder" being the JIT.
Also, we might like to provide an attribute to hint that a particular sub
would like to be optimised more]

Hopefully regexps run as native code would give perl6 a stonking great
advantage in the regexp speed arms race :-)

Nicholas Clark

unread,
Jul 31, 2002, 7:01:39 PM7/31/02
to Daniel Grunblatt, perl6-i...@perl.org
On Tue, Jul 30, 2002 at 10:20:51PM +0100, Nicholas Clark wrote:
> On Mon, Jul 29, 2002 at 10:34:00PM -0300, Daniel Grunblatt wrote:
> > On Mon, 29 Jul 2002, Nicholas Clark wrote:
> >
> > > Here's a very minimal ARM jit framework. It does work (at least as far as
> > > passing all 10 t/op/basic.t subtests, and running mops.pbc)
> >
> > Cool, I have also been playing with ARM but your approach is in better
> > shape. (I'll send you a copy of what I got here anyway because it's bit
> > more documented and you might want to merge it).
>
> It's very documented, and I did merge it. Thanks. Expect to recognise large
> chunks of it in a day or two when I get to a suitable point to submit a
> better patch.

Here goes. This *isn't* functional - it's the least amount of work I could
get away with (before midnight) that gets the inner loop of mops.pasm JITted.

including this judicious bit of cheating:

--- ./examples/assembly/mops.pasm~ Wed Jan 2 13:48:40 2002
+++ ./examples/assembly/mops.pasm Wed Jul 31 23:29:46 2002
@@ -36,7 +36,8 @@ DONE: time N5
print N2
print "\n"

- if I4, BUG
+ set N1, I4
+ if N1, BUG

set N1, I5
div N1, N1, N2

because I need if_i_ic JITted but mops only needs backwards jumps (which are
easy, and don't require me to write the fixup routine yet)

$ ./parrot examples/assembly/mops.pbc
Iterations: 100000000
Estimated ops: 200000000

Elapsed time: 37.217333
M op/s: 5.373840

$ ./parrot -j examples/assembly/mops.pbc
Iterations: 100000000
Estimated ops: 200000000

Elapsed time: 4.963865
M op/s: 40.291184


$ ./parrot -j examples/assembly/mops.pbc
Iterations: 100000000
Estimated ops: 200000000

Elapsed time: 5.018550
M op/s: 39.852148


$ ./parrot -j examples/assembly/mops.pbc
Iterations: 100000000
Estimated ops: 200000000

Elapsed time: 5.002693
M op/s: 39.978467

This is about right. 202MHz chip, 4 instructions, 5 cycles for the sub_i_i_i,
3 instructions, 5 cycles for the if_i_ic (including the pipeline stall for
the branch)

TODO, from memory

1: fixups, to allow forward branches
2: code to emit MUL and MLA instructions
3: load integer constants
(which in turn means writing a routine to work out what values can be
expressed as (8 bit val) rotated by (2 * n), which others can be
represented by the logical not of that, and what to do with the rest.
This isn't that hard)
4: remove the compiler warnings

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

--- /dev/null Mon Jul 16 22:57:44 2001
+++ jit/arm/core.jit Wed Jul 31 23:35:52 2002
@@ -0,0 +1,71 @@


+;
+; arm/core.jit
+;
+; $Id: core.jit,v 1.4 2002/05/20 05:32:58 grunblatt Exp $
+;
+
+Parrot_noop {

+ jit_info->native_ptr = emit_nop(jit_info->native_ptr);


+}
+
+; ldmea fp, {r4, r5, r6, r7, fp, sp, pc
+; but K bug Grr if I load pc direct.
+
+Parrot_end {

+ #ifndef ARM_K_BUG


+ jit_info->native_ptr = emit_ldmstm (jit_info->native_ptr,

+ cond_AL, is_load, dir_EA, no_writeback,


+ REG11_fp,
+ reg2mask(4) | reg2mask(REG11_fp)
+ | reg2mask(REG13_sp)

+ | reg2mask(REG15_pc));
+ #else


+ jit_info->native_ptr = emit_ldmstm (jit_info->native_ptr,

+ cond_AL, is_load, dir_EA, no_writeback,


+ REG11_fp,
+ reg2mask(4) | reg2mask(REG11_fp)
+ | reg2mask(REG13_sp)

+ | reg2mask(REG14_lr));
+ jit_info->native_ptr = emit_mov(jit_info->native_ptr, REG15_pc, REG14_lr);
+ #endif
+}
+
+; This shows why it would be nice in the future to have a way to have ops
+; broken into 1 to 3 of:
+;
+; -1) get values from parrot registers into CPU registers
+; 0) do stuff
+; +1) write values back to parrot registers
+;
+; that way, a JIT optimiser could punt -1 and +1 outside loops leaving
+; intermediate values in CPU registers. It could collate -1 and +1 [leaving
+; nothing :-)] and choose how to maximise use of as many real CPU registers as
+; possible.
+
+Parrot_set_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, 2, r0);
+ Parrot_jit_int_store(jit_info, interpreter, 1, r0);
+}
+
+Parrot_add_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, 3, r1);
+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,
+ ADD, 0, r2, r0, r1);
+ Parrot_jit_int_store(jit_info, interpreter, 1, r2);
+}
+
+Parrot_sub_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, 3, r1);
+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,
+ SUB, 0, r2, r0, r1);
+ Parrot_jit_int_store(jit_info, interpreter, 1, r2);
+}
+
+Parrot_if_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, 1, r0);
+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,
+ CMP, 0, 0, r0, 0, 0);
+ emit_jump_to_op (jit_info, cond_NE, 0, *INT_CONST[2]);


+}
--- /dev/null Mon Jul 16 22:57:44 2001

+++ jit/arm/jit_emit.h Wed Jul 31 23:42:52 2002
@@ -0,0 +1,653 @@
+/*
+ * jit_emit.h
+ *
+ * ARM (I think this is all ARM2 or later, although it is APCS-32)
+ *
+ * $Id: $
+ */
+
+/* Registers
+ *
+ * r0 Argument/result/scratch register 0.
+ * r1 Argument/result/scratch register 1.
+ * r2 Argument/result/scratch register 2.
+ * r3 Argument/result/scratch register 3.
+ * r4 Variable register 1.
+ * r5 Variable register 2.
+ * r6 Variable register 3.
+ * r7 Variable register 4.
+ * r8 Variable register 5.
+ * r9 ARM State variable register 6. Static Base in PID, re-entrant
+ * shared-library variants.
+ * r10 ARM State variable register 7. Stack limit pointer in stack-checked
+ * variants.
+ * r11 ARM State variable register 8. ARM state frame pointer.
+ * r12 The Intra-Procedure call scratch register.
+ * r13 The Stack Pointer.
+ * r14 The Link Register.
+ * r15 The Program Counter.
+ *
+ * r0-r3 are used to pass in first 4 arguments, and are not preserved by a
+ * function. Results (that would fit) are returned in r0
+ * Other registers are preserved across calls, although (by implication) r14
+ * and r15 are used by the call process. I don't think that it is mandated
+ * that r14 on return must hold the link address.
+ * r12 (ip) is only used on subroutine entry for stack frame calculations -
+ * after then it is a useful scratch register. If you push r14 you get
+ * another scratch register quickly.
+ *
+ * Most things nowadays are StrongARM or later. StrongARM is v4 of the
+ * architecture. ARM6 and ARM7 cores are v3, which introduced the 32 bit
+ * address bus. Earlier cores (which you won't encounter) used a 26 bit address
+ * bus, with program counter and status register combined in r15
+ */
+
+typedef enum {
+ r0,
+ r1,
+ r2,
+ r3,
+ r4,
+ r5,
+ r6,
+ r7,
+ r8,
+ r9,
+ r10,
+ r11,
+ r12,
+ r13,
+ r14,
+ r15,
+ REG10_sl = 10, REG11_fp = 11,


+ REG12_ip = 12,
+ REG13_sp = 13,
+ REG14_lr = 14,
+ REG15_pc = 15
+

+} arm_register_t;


+
+typedef enum {
+ cond_EQ = 0x00,
+ cond_NE = 0x10,
+ cond_CS = 0x20,
+ cond_CC = 0x30,
+ cond_MI = 0x40,
+ cond_PL = 0x50,
+ cond_VS = 0x60,
+ cond_VC = 0x70,
+ cond_HI = 0x80,
+ cond_LS = 0x90,
+ cond_GE = 0xA0,
+ cond_LT = 0xB0,
+ cond_GT = 0xC0,
+ cond_LE = 0xD0,
+ cond_AL = 0xE0,
+/* cond_NV = 0xF0, */

+ /* synonyms for CS and CC: */


+ cond_HS = 0x20,
+ cond_LO = 0x30

+} arm_cond_t;
+
+/* I've deliberately shifted these right by 1 bit so that I can forcibly
+ set the status flag on ops such as CMP. It's easy to forget (the assembler
+ doesn't mandate you explicitly write CMPS, it just sets the bit for you).
+ I got an illegal instruction trap on a StrongARM for a CMP without S, but
+ I think some of the other comparison operators have legal weird effects
+ with no S flag. */
+typedef enum {
+ AND = 0x00,
+ EOR = 0x02,
+ SUB = 0x04, /* Subtract rd = rn - op2 */
+ RSB = 0x06, /* Reverse SUbtract rd = op2 - rn ; op2 is more flexible. */
+ ADD = 0x08,
+ ADC = 0x0A, /* ADd with Carry. */
+ SBC = 0x0C, /* SuBtract with Carry. */
+ RSC = 0x0E, /* Reverse Subtract with Carry. */
+ TST = 0x11, /* TeST rn AND op2 (sets flags). */
+ TEQ = 0x13, /* Test EQuivalence rn XOR op2 (won't set V flag). */
+ CMP = 0x15, /* CoMPare rn - op2 */
+ CMN = 0x17, /* CoMpare Negated rn + op2 */
+ ORR = 0x18,
+ MOV = 0x1A, /* MOVe rd = op2 */
+ BIC = 0x1C, /* BIt Clear rd = rn AND NOT op2 */
+ MVN = 0x1E /* MoV Not rd = NOT op2 */
+} alu_op_t;
+
+/* note MVN is move NOT (ie logical NOT, 1s complement), whereas
+ CMN is compare NEGATIVE (ie arithmetic NEGATION, 2s complement) */
+
+#define arith_sets_S 0x10
+
+#define INTERP_STRUCT_ADDR_REG r4
+
+/* B / BL
+ *
+ * +--------------------------------------------------------------------+
+ * | cond | 1 0 1 | L | signed_immed_24 |
+ * +--------------------------------------------------------------------+
+ * 31 28 27 25 24 23 0
+ *
+ *
+ * The L bit
+ *
+ * If L == 1 the instruction will store a return address in the link
+ * register (R14). Otherwise L == 0, the instruction will simply branch without
+ * storing a return address.
+ *
+ * The target address
+ *
+ * Specifies the address to branch to. The branch target address is calculated
+ * by:
+ *
+ * - Sign-extending the 24-bit signed (two's complement) immediate to 32 bits.
+ *
+ * - Shifting the result left two bits.
+ *
+ * - Adding this to the contents of the PC, which contains the address of the
+ * branch instruction plus 8.
+ *
+ * The instruction can therefore specify a branch of approximately Ä…32MB.
+ *
+ * [Not the full 32 bit address range of the v3 and later cores.]
+ */
+
+/* IIRC bx is branch into thumb mode, so don't name this back to bx */
+
+char *emit_branch(char *pc,
+ arm_cond_t cond,
+ int L,
+ int imm) {
+ *(pc++) = imm;
+ *(pc++) = ((imm) >> 8);
+ *(pc++) = ((imm) >> 16);
+ *(pc++) = cond | 0xA | L;


+ return pc;
+}
+

+#define emit_b(pc, cond, imm) \
+ emit_branch(pc, cond, 0, imm)
+
+#define emit_bl(pc, cond, imm) \
+ emit_branch(pc, cond, 1, imm)
+


+
+#define reg2mask(reg) (1<<(reg))
+

+typedef enum {
+ is_store = 0x00,
+ is_load = 0x10,
+ is_writeback = 0x20,
+ no_writeback = 0,
+ is_caret = 0x40, /* assembler syntax is ^ - load sets status flags in


+ USR mode, or load/store use user bank registers
+ in other mode. IIRC. */

+ no_caret = 0,
+ is_byte = 0x40,
+ no_byte = 0, /* It's a B suffix for a byte load, no suffix for
+ word load, so this is more natural than is_word */
+ is_pre = 0x01, /* pre index addressing. */
+ is_post = 0x00 /* post indexed addressing. ie arithmetic for free */
+} transfer_flags;

+emit_ldmstm_x(char *pc,
+ arm_cond_t cond,


+ int l_s,
+ ldm_stm_dir_t direction,
+ int caret,
+ int writeback,
+ int base,
+ int regmask) {
+ if ((l_s == is_load) && (direction & 0x10))
+ direction >>= 2;
+
+ *(pc++) = regmask;
+ *(pc++) = regmask >> 8;
+ /* bottom bit of direction is the up/down flag. */
+ *(pc++) = ((direction & 1) << 7) | caret | writeback | l_s | base;
+ /* binary 100x is code for stm/ldm. */
+ /* Top bit of direction is pre/post increment flag. */
+ *(pc++) = cond | 0x8 | ((direction >> 1) & 1);
+ return pc;
+}
+

+/* Is is going to be rare to non existent that anyone needs to use the ^
+ syntax on LDM or STM, so make it easy to generate the normal form: */
+#define emit_ldmstm(pc, cond, l_s, direction, writeback, base, regmask) \
+ emit_ldmstm_x(pc, cond, l_s, direction, 0, writeback, base, regmask)
+
+/* Load / Store
+ *
+ * +--------------------------------------------------------------------+
+ * | cond | 0 1 | I | P | U | B | W | L | Rn | Rd | offset |
+ * +--------------------------------------------------------------------+
+ * 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+ *
+ *
+ * The P bit
+ *
+ * P == 0 indicates the use of post-indexed addressing. The base register value
+ * is used for the memory address, and the offset is then applied to the
+ * base register and written back to the base register.
+ *
+ * P == 1 indicates the use of offset addressing or pre-indexed addressing (the
+ * W bit determines which). The memory address is generated by applying
+ * the offset to the base register value.
+ *
+ * The U bit
+ *
+ * Indicates whether the offset is added to the base (U == 1) or is subtracted
+ * from the base (U == 0).
+ *
+ * The B bit
+ *
+ * Distinguishes between an unsigned byte (B == 1) and a word (B == 0) access.
+ *
+ * The W bit
+ *
+ * P == 0 If W == 0, the instruction is LDR, LDRB, STR or STRB and a normal
+ * memory access is performed. If W == 1, the instruction is LDRBT,
+ * LDRT, STRBT or STRT and an unprivileged (User mode) memory access is
+ * performed.
+ *
+ * P == 1 If W == 0, the base register is not updated (offset addressing). If
+ * W == 1, the calculated memory address is written back to the base
+ * register (pre-indexed addressing).
+ *
+ * The L bit
+ *
+ * Distinguishes between a Load (L == 1) and a Store (L == 0).
+ *
+ * <Rd> is the destination register.
+ * <Rn> is the base register.
+ *
+ * XXX need to detail addr mode, for I = 0 and I = 1
+ *
+ * Note that you can take advantage of post indexed addressing to get a free
+ * add onto the base register if you need it for some other purpose.
+ *
+ * Note that on StrongARM [and later? but not XScale :-(] if you don't use Rd
+ * next instruction then a load doesn't stall if it is from the cache (ie
+ * 1 cycle loads). You will want to re-order things where possible to take
+ * advantage of this.
+ *
+ * ARM1 had register shift register as possibilities for the offset (as the
+ * ALU ops still do. These took 1 more cycle, and were taken out as virtually
+ * no use was found for them. However, the bit patterns they represent
+ * certainly didn't used to fault as an illegal instruction on ARM2s, and
+ * probably later. So beware of generating illegal bit pattern offsets, as
+ * you'll get silent undefined behaviour.
+ */


+
+char *
+emit_ldrstr(char *pc,

+ arm_cond_t cond,


+ int l_s,
+ ldr_str_dir_t direction,
+ int pre,
+ int writeback,
+ int byte,
+ int dest,
+ int base,
+ int offset_type,
+ unsigned int offset) {
+
+ *(pc++) = offset;
+ *(pc++) = ((offset >> 8) & 0xF) | (dest << 4);
+ *(pc++) = direction | byte | writeback | l_s | base;
+ *(pc++) = cond | 0x4 | offset_type | pre;
+ return pc;
+}
+
+char *
+emit_ldrstr_offset (char *pc,

+ arm_cond_t cond,


+ int l_s,
+ int pre,
+ int writeback,
+ int byte,
+ int dest,
+ int base,
+ int offset) {
+ ldr_str_dir_t direction = dir_Up;

+ if (offset > 4095 || offset < -4095) {
+ internal_exception(JIT_ERROR,

+ "Unable to generate offset %d, larger than 4095\n",
+ offset);
+ }


+ if (offset < 0) {
+ direction = dir_Down;
+ offset = -offset;
+ }
+ return emit_ldrstr(pc, cond, l_s, direction, pre, writeback, byte, dest,
+ base, 0, offset);
+}
+

+/* Arithmetic
+ *
+ * +--------------------------------------------------------------------+
+ * | cond | 0 0 | I | ALU Opcode | S | Rn | Rd | shifted operand |
+ * +--------------------------------------------------------------------+
+ * 31 28 27 26 25 24 23 22 21 20 19 16 15 12 11 0
+ *
+ *
+ * The S bit
+ *
+ * Indicates if the CPSR will be updated (S == 1) or not (S == 0).
+ *
+ * Two types of CPSR updates can occur:
+ *
+ * - If <Rd> is not R15, the N and Z flags are set according to the result of
+ * of the addition, and C and V flags are set according to whether the
+ * addition generated a carry (unsigned overflow) and a signed overflow,
+ * respectively. The rest of the CPSR is unchanged.
+ *
+ * XXX shifted immediate values for the second operand can also set the C
+ * flag (and therefore presumably also clear it) when the S flag is set for
+ * certain ALU ops. (I think just the logical ops) This is obscure, but
+ * sometimes useful. No idea where this is documented.
+ *
+ * - If <Rd> is R15, the SPSR of the current mode is copied to the CPSR. This
+ * form of the instruction is UNPREDICTABLE if executed in User mode or
+ * System mode, because these do not have an SPSR.
+ *
+ *
+ */
+
+typedef enum {
+ shift_LSL = 0x00,
+ shift_LSR = 0x20,
+ shift_ASR = 0x40,
+ shift_ROR = 0x60,
+ shift_ASL = 0x00 /* Synonym - no sign extension (or not) on << */
+} barrel_shift_t;
+/* RRX (rotate right with extend - a 1 position 33 bit rotate including the
+ carry flag) is encoded as ROR by 0. */
+
+char *
+emit_arith(char *pc,
+ arm_cond_t cond,
+ alu_op_t op,
+ int status,
+ arm_register_t rd,
+ arm_register_t rn,
+ int operand2_type,
+ int operand2) {
+ *(pc++) = operand2;
+ *(pc++) = rd << 4 | ((operand2 >> 8) & 0xF);
+ *(pc++) = op << 4 | status | rn;
+ *(pc++) = cond | 0 | operand2_type | op >> 4;


+ return pc;
+}
+

+/* eg add r0, r3, r7 */
+#define emit_arith_reg(pc, cond, op, status, rd, rn, rm) \
+ emit_arith (pc, cond, op, status, rd, rn, 0, rm)
+
+/* eg sub r0, r3, r7 lsr #3 */
+#define emit_arith_reg_shift_const(pc, cond, op, status, rd, rn, rm, shift, by) \
+ emit_arith (pc, cond, op, status, rd, rn, 0, ((by) << 7) | shift | 0 | (rm))
+
+/* eg orrs r1, r2, r1 rrx */
+#define emit_arith_reg_rrx(pc, cond, op, status, rd, rn, rm) \
+ emit_arith (pc, cond, op, status, rd, rn, 0, shift_ROR | 0 | (rm))
+
+/* I believe these take 2 cycles (due to having to access a 4th register. */
+#define emit_arith_reg_shift_reg(pc, cond, op, status, rd, rn, rm, shift, rs) \
+ emit_arith (pc, cond, op, status, rd, rn, 0, ((rs) << 8) | shift | 0x10 | (rm))
+
+#define emit_arith_immediate(pc, cond, op, status, rd, rn, val, rotate) \
+ emit_arith (pc, cond, op, status, rd, rn, 2, ((rotate) << 8) | (val))


+
+/* I'll use mov r0, r0 as my NOP for now. */

+#define emit_nop(pc) emit_mov (pc, r0, r0)
+
+/* MOV ignores rn */
+#define emit_mov(pc, dest, src) emit_arith_reg (pc, cond_AL, MOV, 0, dest, 0, src)


+
+#define emit_dcd(pc, word) { \
+ *((int *)pc) = word; \
+ pc+=4; }
+

+static void Parrot_jit_int_load(Parrot_jit_info *jit_info,
+ struct Parrot_Interp *interpreter,
+ int param,
+ int hwreg)
+{
+ opcode_t op_type
+ = interpreter->op_info_table[*jit_info->cur_op].types[param];
+ int val = jit_info->cur_op[param];
+ int offset;
+
+ switch(op_type){
+ case PARROT_ARG_I:
+ offset = ((char *)&interpreter->ctx.int_reg.registers[val])
+ - (char *)interpreter;
+ if (offset > 4095) {
+ internal_exception(JIT_ERROR,
+ "integer load register %d generates offset %d, larger than 4095\n",
+ val, offset);
+ }


+ jit_info->native_ptr = emit_ldrstr_offset (jit_info->native_ptr,

+ cond_AL,
+ is_load,
+ is_pre,
+ 0, 0,
+ hwreg,
+ INTERP_STRUCT_ADDR_REG,
+ offset);
+ break;
+
+ case PARROT_ARG_IC:
+ default:
+ internal_exception(JIT_ERROR,
+ "Unsupported op parameter type %d in jit_int_load\n",
+ op_type);
+ }
+}
+
+static void Parrot_jit_int_store(Parrot_jit_info *jit_info,
+ struct Parrot_Interp *interpreter,
+ int param,
+ int hwreg)
+{
+ opcode_t op_type
+ = interpreter->op_info_table[*jit_info->cur_op].types[param];
+ int val = jit_info->cur_op[param];
+ int offset;
+
+ switch(op_type){
+ case PARROT_ARG_I:
+ offset = ((char *)&interpreter->ctx.int_reg.registers[val])
+ - (char *)interpreter;
+ if (offset > 4095) {
+ internal_exception(JIT_ERROR,
+ "integer store register %d generates offset %d, larger than 4095\n",
+ val, offset);
+ }


+ jit_info->native_ptr = emit_ldrstr_offset (jit_info->native_ptr,

+ cond_AL,
+ is_store,
+ is_pre,
+ 0, 0,
+ hwreg,
+ INTERP_STRUCT_ADDR_REG,
+ offset);
+ break;
+
+ case PARROT_ARG_N:
+ default:
+ internal_exception(JIT_ERROR,
+ "Unsupported op parameter type %d in jit_int_store\n",
+ op_type);
+ }
+}
+
+static void emit_jump_to_op(Parrot_jit_info *jit_info, arm_cond_t cond,
+ int L, opcode_t disp) {
+ opcode_t opcode = jit_info->op_i + disp;
+
+ if(opcode <= jit_info->op_i) {
+ int offset = jit_info->op_map[opcode].offset -
+ (jit_info->native_ptr - jit_info->arena_start);
+
+ jit_info->native_ptr
+ = emit_branch(jit_info->native_ptr, cond, L, (offset >> 2) - 2);
+ return;
+ }
+ internal_exception(JIT_ERROR, "Can't go forward yet\n");
+

+}
+
+void Parrot_jit_dofixup(Parrot_jit_info *jit_info,
+ struct Parrot_Interp * interpreter)
+{
+ /* Todo. */
+}
+/* My entry code is create a stack frame:
+ mov ip, sp
+ stmfd sp!, {r4, fp, ip, lr, pc}
+ sub fp, ip, #4
+ Then store the first parameter (pointer to the interpreter) in r4.
+ mov r4, r0
+*/
+
+void
+Parrot_jit_begin(Parrot_jit_info *jit_info,
+ struct Parrot_Interp * interpreter)
+{

+ jit_info->native_ptr = emit_mov (jit_info->native_ptr, REG12_ip, REG13_sp);


+ jit_info->native_ptr = emit_ldmstm (jit_info->native_ptr,
+ cond_AL, is_store, dir_FD,

+ is_writeback,
+ REG13_sp,
+ reg2mask(4) | reg2mask(REG11_fp)
+ | reg2mask(REG12_ip)
+ | reg2mask(REG14_lr)
+ | reg2mask(REG15_pc));

+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,
+ SUB, 0, REG11_fp, REG12_ip,
+ 4, 0);
+ jit_info->native_ptr = emit_mov (jit_info->native_ptr, 4, 0);

+ jit_info->native_ptr = emit_mov (jit_info->native_ptr, r1, r4);
+#ifndef ARM_K_BUG
+ jit_info->native_ptr = emit_mov (jit_info->native_ptr, REG14_lr, REG15_pc);


+ jit_info->native_ptr = emit_ldmstm (jit_info->native_ptr,
+ cond_AL, is_load, dir_IA,

+ is_writeback,
+ REG14_lr,
+ reg2mask(0) | reg2mask(REG15_pc));
+#else
+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,
+ ADD, 0, REG14_lr, REG15_pc,
+ 4, 0);


+ jit_info->native_ptr = emit_ldmstm (jit_info->native_ptr,
+ cond_AL, is_load, dir_IA,

+ is_writeback,
+ REG14_lr,
+ reg2mask(0) | reg2mask(REG12_ip));
+ jit_info->native_ptr = emit_mov (jit_info->native_ptr, REG15_pc, REG12_ip);

--- ../parrot-clean/jit.c Tue Jul 23 19:18:41 2002
+++ jit.c Tue Jul 30 23:32:06 2002
@@ -7,6 +7,12 @@
#include <parrot/parrot.h>
#include "parrot/jit.h"

+#ifdef ARM
+#ifdef __linux
+#include <asm/unistd.h>
+#endif
+#endif
+
/*
** optimize_jit()
** XXX Don't pay much attention to this yet.
@@ -128,6 +134,63 @@ optimize_jit(struct Parrot_Interp *inter

+ "swi " __sys1(__ARM_NR_cacheflush) "\n"


+ "mov %0, r0\n"
+ : "=r" (result)
+ : "r" ((long)start), "r" ((long)end)
+ : "r0","r1","r2");
+
+ if (result < 0) {
+ internal_exception(JIT_ERROR,
+ "Synchronising I and D caches failed with errno=%d\n",
+ -result);
+ }
+#else
+#error "ARM needs to sync D and I caches, and I don't know how to embed assmbler on this C compiler"
+#endif
+#else
+/* Not strictly true - on RISC OS it's OS_SynchroniseCodeAreas */
+#error "ARM needs to sync D and I caches, and I don't know how to on this OS"
+#endif
+}
+#endif

/*
** build_asm()

@@ -214,6 +277,9 @@ build_asm(struct Parrot_Interp *interpre

Daniel Grunblatt

unread,
Aug 2, 2002, 12:06:27 AM8/2/02
to Nicholas Clark, perl6-i...@perl.org

On Thu, 1 Aug 2002, Nicholas Clark wrote:

> Here goes. This *isn't* functional - it's the least amount of work I could
> get away with (before midnight) that gets the inner loop of mops.pasm JITted.
>

Applied, many many thanks.

> including this judicious bit of cheating:

> because I need if_i_ic JITted but mops only needs backwards jumps (which are
> easy, and don't require me to write the fixup routine yet)

We can wait, no problem.

Daniel Grunblatt.

Daniel Grunblatt

unread,
Aug 2, 2002, 12:25:13 AM8/2/02
to Nicholas Clark, perl6-i...@perl.org, Dan Sugalski

On Tue, 30 Jul 2002, Nicholas Clark wrote:

> On Tue, Jul 30, 2002 at 01:21:30PM -0400, Dan Sugalski wrote:
> > At 10:34 PM -0300 7/29/02, Daniel Grunblatt wrote:
> > >On Mon, 29 Jul 2002, Nicholas Clark wrote:
> > > > As you can see from the patch all it does is implement the end
> > >and noop ops.
> > >> Everything else is being called. Interestingly, JITing like this is
> > >> slower
> > >> than computed goto:
> > >
> > >Yes, function calls are generally slower than computing a goto.
> >
> > Yup. There's the function preamble and postamble that get executed,
> > which can slow things down relative to computed goto, which doesn't
> > have to execute them.
> >
> > This brings up an interesting point. Should we consider making at
> > least some of the smaller utility functions JITtable? Not the opcode
> > functions, but things in string.c or pmc.c perhaps. (Or maybe getting
> > them inlined would be sufficient for us)
>
> I'm not sure. Effectively we'd need to define them well enough that we
> can support n parallel implementations - 1 in C for "everyone else", and
> 1 per JIT architecture.
>
> On Tue, Jul 30, 2002 at 03:14:48PM -0300, Daniel Grunblatt wrote:
> > Yes, we can do that, we can also try to go in and out from the computed
> > goto core if available.
>
> This sounds rather scary to me.

We can try and see, I'm not 100% sure if this will be faster.

>
>
> I've managed to delete a message (I think by Simon Cozens) which said
> that perl5 used to have a good speed advantage over the competition,
> but they'd all been adding optimisations to their regexp engines (that
> we had already)
> and now they're as fast.
>
>
> My thoughts are:
>
> If on a particular platform we don't have a JIT implementation for a
> particular op then we have to JIT a call to that op's C implementation.
> Do enough of these (what proportion?) and the computed goto core is faster
> than the JIT.
> I think (I could look at the source code, but that would break a fine Usenet
> tradition) that for ops not in the computed goto core we have to make a call
> from either JIT or computed goto core, so there's no speed difference on them.
>
> *BUG* Also, we are currently failing to distinguish the size of INTVALs
> and NUMVALs in our JIT names (so this makes 4 possible JIT variants on most
> CPUs)
>
> IIRC there are currently >500 ops in the core, which makes it unlikely that
> we'll have sufficient people to JIT most ops on most CPUs. So we have the
> danger of the JIT being slower in the general case.

Ok, I don't know if I should say this ... and I'm not saying I'm going to
do it, but there is a possibility to make an intermediate language.

Daniel Grunblatt.

Nicholas Clark

unread,
Aug 3, 2002, 5:40:01 PM8/3/02
to Daniel Grunblatt, perl6-i...@perl.org

I wasn't actually expecting you to apply that :-)
It was more a "where I am at now" informational patch.

I think that this patch is at good point to pause and take stock. I believe
it JITs just about every integer op (including some i386 isn't JITting yet)
OK, it doesn't JIT the logical xor op, but that one scares me, and I'm unsure
how useful it is.

I've not done the floating point ops (or anything else) partly because I
don't have a good reference for the format of the floating point instructions.
[However, it's not that hard, as I have source code to both point emulators
supplied with ARM Linux, so I can see the decode code :-)]
But more because I'm not sure that it will give such a speed it.

I feel I've demonstrated to myself that it will be possible to generate all
forms of parrot ops without undue problems. However, I've hardly used any
registers in my ops so far (at most 3, but I only actually needed 2) when
there are up to 12 at my disposal. I've no real idea which will turn out to
be the most useful in real parrot programs, and hence where the effort in
JITting will get most reward, so I think it best to wait now and see what
is needed most.

My other thought is that with the current JIT architecture I'm loading
everything from RAM at the start of the ops (1 or 2 instructions), and save it
back at the end (1 instruction) with only 1 or 2 instructions need to actually
do the work. With 10 registers spare, and 60% of my instructions shifting data
around like some job creation scheme for out of work electrons, I think that
it might be best to wait and see what we (*) learn from JITs on other
platforms, then use that to design a third generation JIT that is capable of
mapping parrot registers onto hardware CPU registers.

* Er, "we" is probably just Daniel as I confess I don't feel motivated to
attempt to learn other assembly language to write JITs for hardware I don't
own. Hey, all you Mac fans, where's the PPC JIT? <ducks>

Nicholas Clark
--
Even better than the real thing: http://nms-cgi.sourceforge.net/

--- jit/arm/jit_emit.h.orig Fri Aug 2 04:25:05 2002
+++ jit/arm/jit_emit.h Sat Aug 3 18:39:12 2002
@@ -154,7 +154,8 @@ typedef enum {



/* IIRC bx is branch into thumb mode, so don't name this back to bx */

-char *emit_branch(char *pc,
+static char *
+emit_branch(char *pc,
arm_cond_t cond,
int L,
int imm) {
@@ -216,14 +217,16 @@ typedef enum {
dir_Down = 0x00
} ldr_str_dir_t;

-char *
+enum { JIT_ARMBRANCH };
+
+static char *
emit_ldmstm_x(char *pc,
arm_cond_t cond,
int l_s,
ldm_stm_dir_t direction,
int caret,
int writeback,
- int base,
+ arm_register_t base,
int regmask) {


if ((l_s == is_load) && (direction & 0x10))

direction >>= 2;
@@ -306,7 +309,7 @@ emit_ldmstm_x(char *pc,


* you'll get silent undefined behaviour.

*/

-char *
+static char *
emit_ldrstr(char *pc,
arm_cond_t cond,
int l_s,
@@ -314,8 +317,8 @@ emit_ldrstr(char *pc,
int pre,
int writeback,
int byte,
- int dest,
- int base,
+ arm_register_t dest,
+ arm_register_t base,
int offset_type,
unsigned int offset) {

@@ -326,15 +329,15 @@ emit_ldrstr(char *pc,
return pc;
}

-char *
+static char *
emit_ldrstr_offset (char *pc,
arm_cond_t cond,
int l_s,
int pre,
int writeback,
int byte,
- int dest,
- int base,
+ arm_register_t dest,
+ arm_register_t base,
int offset) {
ldr_str_dir_t direction = dir_Up;


if (offset > 4095 || offset < -4095) {

@@ -391,7 +394,7 @@ typedef enum {


/* RRX (rotate right with extend - a 1 position 33 bit rotate including the

carry flag) is encoded as ROR by 0. */

-char *
+static char *
emit_arith(char *pc,
arm_cond_t cond,
alu_op_t op,
@@ -407,6 +410,130 @@ emit_arith(char *pc,
return pc;
}

+static char *
+emit_mul(char *pc,
+ arm_cond_t cond,


+ int status,
+ arm_register_t rd,

+ arm_register_t rm,
+ arm_register_t rs) {
+ *(pc++) = 0x90 | rm;
+ *(pc++) = rs;
+ *(pc++) = status | rd;
+ *(pc++) = cond | 0;


+ return pc;
+}
+

+static char *
+emit_mla(char *pc,
+ arm_cond_t cond,


+ int status,
+ arm_register_t rd,

+ arm_register_t rm,
+ arm_register_t rs,
+ arm_register_t rn) {
+ *(pc++) = 0x90 | rm;
+ *(pc++) = rn << 4 | rs;
+ *(pc++) = 0x20 | status | rd;
+ *(pc++) = cond | 0;


+ return pc;
+}
+

+/* operand2 immedate constants are expressed as val rotate right (2 * n),
+ where val is 8 bits, n is 4 bits. This uses the 12 bits available to
+ generate many useful common constants, far more than would be given by a
+ 12 bit number 0 - 0xFFF.
+ Often, you're trying to use the immediate constant in an operand that could
+ be replaced its complement. So if MOV rd, #const doesn't work,
+ MVN rn, #~const might. And ADD rd, rn, #const may be impossible, but
+ SUB rd, rn, #-const will. So allow the return struct to flag this.
+
+ I believe that the only case where a 32 bit value and its converse is
+ representable in 8 shift 4 is for 0 and -(0)
+ So it's perfectly valid to try the inverse every time. */
+
+
+enum constant_state {doesnt_fit, fits_as_not, fits_as_neg, fits_as_is};
+
+/* Deliberate 4th char to pad the struct and hence silence a warning. */
+struct constant {
+ unsigned char value;
+ unsigned char rotation;
+ unsigned char state;
+ unsigned char pad;
+};
+/* XXX Future work would be to try to find a fast compramise way of looking
+ for the double instruction constants. eg 0xFFF is 0xF00 + 0xFF, or
+ 0xF00 | 0xFF
+ The problem comes that for building an immediate constant all combinations
+ are on (ie MOV followed by anything, including hacks with setting the carry
+ flag using non-standard rotations) but for add_i_i_ic you probably only
+ want to break down into two halves that are in turn each added/subtracted.
+*/
+
+static void
+constant_neg (int value, struct constant *result) {
+ result->rotation = 0;
+ while (1) {
+ if ((value & ~0xFF) == 0) {
+ /* No bits spill out. */
+ result->state = fits_as_is;
+ result->value = value;
+ return;
+ }
+ if (((-value) & ~0xFF) == 0) {
+ result->value = -value;
+ result->state = fits_as_neg;
+ return;
+ }
+ if (++result->rotation == 16)
+ break;
+
+ /* There is no rotate op in C, and to do it with 2 shifts and an or
+ would mean casting to unsigned to prevent sign extensions, and it's
+ exactly 1 arm instruction I need, so it's clearer like this: */
+ __asm__ (
+ "mov %0, %1, ror #30\n"
+ : "=r" (value)
+ : "r" (value));
+
+ }
+ result->state = doesnt_fit;
+ return;


+}
+
+
+static void

+constant_not (int value, struct constant *result) {
+ result->rotation = 0;
+ while (1) {
+ if ((value & ~0xFF) == 0) {
+ /* No bits spill out. */
+ result->state = fits_as_is;
+ result->value = value;
+ return;
+ }
+ if (((~value) & ~0xFF) == 0) {
+ result->value = ~value;
+ result->state = fits_as_not;
+ return;
+ }
+ if (++result->rotation == 16)
+ break;
+
+ /* There is no rotate op in C, and to do it with 2 shifts and an or
+ would mean casting to unsigned to prevent sign extensions, and it's
+ exactly 1 arm instruction I need, so it's clearer like this: */
+ __asm__ (
+ "mov %0, %1, ror #30\n"
+ : "=r" (value)
+ : "r" (value));
+
+ }
+ result->state = doesnt_fit;
+ return;
+}
+


/* eg add r0, r3, r7 */

#define emit_arith_reg(pc, cond, op, status, rd, rn, rm) \

emit_arith (pc, cond, op, status, rd, rn, 0, rm)

@@ -432,14 +559,84 @@ emit_arith(char *pc,
/* MOV ignores rn */


#define emit_mov(pc, dest, src) emit_arith_reg (pc, cond_AL, MOV, 0, dest, 0, src)

-#define emit_dcd(pc, word) { \
- *((int *)pc) = word; \
- pc+=4; }
+static char *
+emit_word(char *pc, unsigned int word) {
+ *(pc++) = word;
+ *(pc++) = word >> 8;
+ *(pc++) = word >> 16;
+ *(pc++) = word >> 24;


+ return pc;
+}
+

+static void emit_jump_to_op(Parrot_jit_info *jit_info, arm_cond_t cond,

+ opcode_t disp) {


+ opcode_t opcode = jit_info->op_i + disp;

+ int offset = 0;


+ if(opcode <= jit_info->op_i) {

+ offset = jit_info->op_map[opcode].offset -


+ (jit_info->native_ptr - jit_info->arena_start);

+ } else {
+ Parrot_jit_newfixup(jit_info);
+ jit_info->fixups->type = JIT_ARMBRANCH;
+ jit_info->fixups->param.opcode = opcode;
+ }
+
+ jit_info->native_ptr
+ = emit_branch(jit_info->native_ptr, cond, 0, (offset >> 2) - 2);
+}
+
+static char *
+emit_load_constant_from_pool (char *pc,
+ struct Parrot_Interp *interpreter,
+ arm_cond_t cond,
+ int value,
+ arm_register_t hwreg) {
+ /* can't do it in one. XXX this should use a constant pool.
+ ldr rd, [pc] ; pipelining makes this .L1
+ b L2
+ .L1 value
+ .L2 next
+ */
+
+ pc = emit_ldrstr_offset (pc, cond,
+ is_load, is_pre,


+ 0, 0,
+ hwreg,

+ REG15_pc, 0);
+ /* Must always jump round our inlined constant, if if we don't load it
+ (due to condition codes) */
+ pc = emit_b(pc, cond_AL, 0);
+ pc = emit_word (pc, value);


+ return pc;
+}
+

+static char *
+emit_load_constant (char *pc,
+ struct Parrot_Interp *interpreter,
+ arm_cond_t cond,
+ int value,
+ arm_register_t hwreg) {
+ struct constant immediate;
+
+ constant_not (value, &immediate);
+
+ if (immediate.state == fits_as_is) {
+ pc = emit_arith_immediate(pc, cond, MOV, 0, hwreg, 0,
+ immediate.value, immediate.rotation);
+ } else if (immediate.state == fits_as_not) {
+ pc = emit_arith_immediate(pc, cond, MVN, 0, hwreg, 0,
+ immediate.value, immediate.rotation);
+ } else {
+ pc = emit_load_constant_from_pool (pc, interpreter, cond, value, hwreg);
+ }
+ return pc;
+}

static void Parrot_jit_int_load(Parrot_jit_info *jit_info,
- struct Parrot_Interp *interpreter,
- int param,
- int hwreg)
+ struct Parrot_Interp *interpreter,
+ arm_cond_t cond,
+ int param,
+ arm_register_t hwreg)
{
opcode_t op_type


= interpreter->op_info_table[*jit_info->cur_op].types[param];

@@ -456,7 +653,7 @@ static void Parrot_jit_int_load(Parrot_j
val, offset);


}
jit_info->native_ptr = emit_ldrstr_offset (jit_info->native_ptr,

- cond_AL,
+ cond,
is_load,
is_pre,
0, 0,
@@ -464,8 +661,13 @@ static void Parrot_jit_int_load(Parrot_j
INTERP_STRUCT_ADDR_REG,
offset);
break;
-
case PARROT_ARG_IC:
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond,
+ val,
+ hwreg);
+ break;
default:
internal_exception(JIT_ERROR,


"Unsupported op parameter type %d in jit_int_load\n",

@@ -474,9 +676,10 @@ static void Parrot_jit_int_load(Parrot_j
}

static void Parrot_jit_int_store(Parrot_jit_info *jit_info,
- struct Parrot_Interp *interpreter,
- int param,
- int hwreg)
+ struct Parrot_Interp *interpreter,
+ arm_cond_t cond,
+ int param,
+ arm_register_t hwreg)
{
opcode_t op_type


= interpreter->op_info_table[*jit_info->cur_op].types[param];

@@ -493,7 +696,7 @@ static void Parrot_jit_int_store(Parrot_
val, offset);


}
jit_info->native_ptr = emit_ldrstr_offset (jit_info->native_ptr,

- cond_AL,
+ cond,
is_store,
is_pre,
0, 0,
@@ -510,26 +713,118 @@ static void Parrot_jit_int_store(Parrot_
}
}

-static void emit_jump_to_op(Parrot_jit_info *jit_info, arm_cond_t cond,
- int L, opcode_t disp) {
- opcode_t opcode = jit_info->op_i + disp;
+static void
+Parrot_jit_arith_const_alternate (Parrot_jit_info *jit_info,
+ struct Parrot_Interp *interpreter,
+ arm_cond_t cond,
+ enum constant_state alternate_on,
+ alu_op_t normal, alu_op_t alternative,
+ int dest, int src, int const_val) {
+ struct constant val;
+
+ Parrot_jit_int_load(jit_info, interpreter, cond, src, r0);
+
+ constant_neg (const_val, &val);
+
+ if (val.state == fits_as_is || val.state == alternate_on) {
+ /* We can use an immediate constant. */
+ /* say plus is ADD, minus is SUB
+ Then if value fits into an immediate constant, we add r0, r0, #value
+ If -value fits, then we sub, r0, r0, #-value
+ */
+ jit_info->native_ptr
+ = emit_arith_immediate (jit_info->native_ptr, cond,
+ val.state == fits_as_is
+ ? normal : alternative,
+ 0, r0, r0, val.value, val.rotation);
+ } else {
+ /* Else we load it into a reg the slow way. */
+ jit_info->native_ptr
+ = emit_load_constant_from_pool (jit_info->native_ptr, interpreter,
+ cond, const_val, r1);
+ jit_info->native_ptr
+ = emit_arith_reg (jit_info->native_ptr, cond, normal, 0,
+ r0, r0, r1);
+ }
+ Parrot_jit_int_store(jit_info, interpreter, cond, dest, r0);
+}

- if(opcode <= jit_info->op_i) {
- int offset = jit_info->op_map[opcode].offset -
- (jit_info->native_ptr - jit_info->arena_start);
+#define Parrot_jit_arith_const_neg(ji, i, cond, plus, minus, dest, src, const_val) \
+ Parrot_jit_arith_const_alternate (ji, i, cond, fits_as_neg, \
+ plus, minus, dest, src, const_val)
+
+#define Parrot_jit_arith_const_not(ji, i, cond, plus, minus, dest, src, const_val) \
+ Parrot_jit_arith_const_alternate (ji, i, cond, fits_as_not, \
+ plus, minus, dest, src, const_val)
+#define Parrot_jit_arith_const(ji, i, cond, plus, dest, src, const_val) \
+ Parrot_jit_arith_const_alternate (ji, i, cond, fits_as_is, \
+ plus, plus, dest, src, const_val)
+
+
+/* branching on if cannot (in future) be conditional (easily), because we
+ want to set the flags.
+ Yes, for seriously advanced stuff you can
+ 1: chain compatible comparisons (eg someting setting LE and something else
+ setting LE can be done with the second conditional)
+ 2: use TEQ which doesn't change the V flag (or C, IIRC), and chain that with
+ someting else that did set the V flag

+ but that's JIT v5 or later (where v3 can hold intermediate values in CPU
+ registers, and v4 can do some things conditionally)
+*/
+static void
+Parrot_jit_jumpif_const (Parrot_jit_info *jit_info,
+ struct Parrot_Interp *interpreter,
+ int src, int const_val, int where_to,
+ arm_cond_t when) {
+ struct constant val;
+
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, src, r0);
+
+ constant_neg (const_val, &val);
+
+ if (val.state == fits_as_is || val.state == fits_as_neg) {
+ /* We can use an immediate constant. */
+ jit_info->native_ptr
+ = emit_arith_immediate (jit_info->native_ptr, cond_AL,
+ val.state == fits_as_is ? CMP : CMN, 0,
+ 0, r0, val.value, val.rotation);
+ } else {
+ /* Else we load it into a reg the slow way. */
jit_info->native_ptr
- = emit_branch(jit_info->native_ptr, cond, L, (offset >> 2) - 2);
- return;
+ = emit_load_constant_from_pool (jit_info->native_ptr, interpreter,
+ cond_AL, const_val, r1);
+ jit_info->native_ptr
+ = emit_arith_reg (jit_info->native_ptr, cond_AL, CMP, 0, 0, r0, r1);
}
- internal_exception(JIT_ERROR, "Can't go forward yet\n");
-
+ emit_jump_to_op (jit_info, when, where_to);
}

void Parrot_jit_dofixup(Parrot_jit_info *jit_info,
struct Parrot_Interp * interpreter)
{
- /* Todo. */
+ Parrot_jit_fixup *fixup = jit_info->fixups;
+
+ while(fixup){
+ switch(fixup->type){
+ case JIT_ARMBRANCH:
+ {
+ char *fixup_ptr = Parrot_jit_fixup_target(jit_info, fixup);
+ int offset = jit_info->op_map[fixup->param.opcode].offset
+ - fixup->native_offset;
+ int disp = (offset >> 2) - 2;
+ *(fixup_ptr++) = disp;
+ *(fixup_ptr++) = disp >> 8;
+ *(fixup_ptr) = disp >> 16;
+ break;
+ }
+ default:
+ internal_exception(JIT_ERROR, "Unknown fixup type:%d\n",
+ fixup->type);
+ break;
+ }
+ fixup = fixup->next;
+ }


}
/* My entry code is create a stack frame:

mov ip, sp
@@ -612,19 +907,17 @@ Parrot_jit_normal_op(Parrot_jit_info *ji
reg2mask(0) | reg2mask(REG12_ip));


jit_info->native_ptr = emit_mov (jit_info->native_ptr, REG15_pc, REG12_ip);

#endif
- emit_dcd (jit_info->native_ptr, (int) jit_info->cur_op);
- emit_dcd (jit_info->native_ptr,
- (int) interpreter->op_func_table[*(jit_info->cur_op)]);
+ jit_info->native_ptr
+ = emit_word (jit_info->native_ptr, (int) jit_info->cur_op);
+ jit_info->native_ptr
+ = emit_word (jit_info->native_ptr,


+ (int) interpreter->op_func_table[*(jit_info->cur_op)]);
}

-/* We get back address of opcode in bytecode.
- We want address of equivalent bit of jit code, which is stored as an
- address at the same offset in a jit table. */
-void Parrot_jit_cpcf_op(Parrot_jit_info *jit_info,
- struct Parrot_Interp * interpreter)
-{
- Parrot_jit_normal_op(jit_info, interpreter);
-
+static void
+Parrot_jump_to_op_in_reg(Parrot_jit_info *jit_info,
+ struct Parrot_Interp * interpreter,
+ arm_register_t reg) {


/* This is effectively the pseudo-opcode ldr - ie load relative to PC.

So offset includes pipeline. */

jit_info->native_ptr = emit_ldrstr_offset (jit_info->native_ptr, cond_AL,

@@ -633,13 +926,24 @@ void Parrot_jit_cpcf_op(Parrot_jit_info

/* ldr pc, [r14, r0] */

/* lazy. this is offset type 0, 0x000 which is r0 with zero shift */

jit_info->native_ptr = emit_ldrstr (jit_info->native_ptr, cond_AL,

- is_load, dir_Up, is_pre, 0, 0,
+ is_load, dir_Up, is_pre, 0, reg,
REG15_pc, REG14_lr, 2, 0);


/* and this "instruction" is never reached, so we can use it to store

the constant that we load into r14 */

- emit_dcd (jit_info->native_ptr,
- ((long) jit_info->op_map) -
- ((long) interpreter->code->byte_code));
+ jit_info->native_ptr
+ = emit_word (jit_info->native_ptr,
+ ((int) jit_info->op_map) -
+ ((int) interpreter->code->byte_code));
+}
+


+/* We get back address of opcode in bytecode.
+ We want address of equivalent bit of jit code, which is stored as an
+ address at the same offset in a jit table. */
+void Parrot_jit_cpcf_op(Parrot_jit_info *jit_info,
+ struct Parrot_Interp * interpreter)
+{
+ Parrot_jit_normal_op(jit_info, interpreter);

+ Parrot_jump_to_op_in_reg(jit_info, interpreter, r0);
}

/*
--- jit/arm/core.jit.orig Fri Aug 2 04:25:05 2002
+++ jit/arm/core.jit Sat Aug 3 18:33:50 2002
@@ -42,30 +42,598 @@ Parrot_end {


; nothing :-)] and choose how to maximise use of as many real CPU registers as

; possible.

+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; set ops
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
Parrot_set_i_i {
- Parrot_jit_int_load(jit_info, interpreter, 2, r0);
- Parrot_jit_int_store(jit_info, interpreter, 1, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_set_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; comparison ops
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+Parrot_eq_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ CMP, 0, 0, r0, r1);
+ emit_jump_to_op (jit_info, cond_EQ, *INT_CONST[3]);
+}
+Parrot_eq_i_ic_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 1, *INT_CONST[2],
+ *INT_CONST[3], cond_EQ);
+}
+Parrot_eq_ic_i_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 2, *INT_CONST[1],
+ *INT_CONST[3], cond_EQ);
+}
+Parrot_eq_ic_ic_ic {
+ if (*INT_CONST[1] == *INT_CONST[2])
+ emit_jump_to_op (jit_info, cond_AL, *INT_CONST[3]);
+}
+
+Parrot_ne_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ CMP, 0, 0, r0, r1);
+ emit_jump_to_op (jit_info, cond_NE, *INT_CONST[3]);
+}
+Parrot_ne_i_ic_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 1, *INT_CONST[2],
+ *INT_CONST[3], cond_NE);
+}
+Parrot_ne_ic_i_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 2, *INT_CONST[1],
+ *INT_CONST[3], cond_NE);
+}
+Parrot_ne_ic_ic_ic {
+ if (*INT_CONST[1] != *INT_CONST[2])
+ emit_jump_to_op (jit_info, cond_AL, *INT_CONST[3]);
+}
+
+Parrot_lt_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ CMP, 0, 0, r0, r1);
+ emit_jump_to_op (jit_info, cond_LT, *INT_CONST[3]);
+}
+Parrot_lt_i_ic_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 1, *INT_CONST[2],
+ *INT_CONST[3], cond_LT);
+}
+Parrot_lt_ic_i_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 2, *INT_CONST[1],
+ *INT_CONST[3], cond_GE);
+}
+Parrot_lt_ic_ic_ic {
+ if (*INT_CONST[1] < *INT_CONST[2])
+ emit_jump_to_op (jit_info, cond_AL, *INT_CONST[3]);
+}
+
+Parrot_le_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ CMP, 0, 0, r0, r1);
+ emit_jump_to_op (jit_info, cond_LE, *INT_CONST[3]);
+}
+Parrot_le_i_ic_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 1, *INT_CONST[2],
+ *INT_CONST[3], cond_LE);
+}
+Parrot_le_ic_i_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 2, *INT_CONST[1],
+ *INT_CONST[3], cond_GT);
+}
+Parrot_le_ic_ic_ic {
+ if (*INT_CONST[1] <= *INT_CONST[2])
+ emit_jump_to_op (jit_info, cond_AL, *INT_CONST[3]);
+}
+
+Parrot_gt_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ CMP, 0, 0, r0, r1);
+ emit_jump_to_op (jit_info, cond_GT, *INT_CONST[3]);
+}
+Parrot_gt_i_ic_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 1, *INT_CONST[2],
+ *INT_CONST[3], cond_GT);
+}
+Parrot_gt_ic_i_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 2, *INT_CONST[1],
+ *INT_CONST[3], cond_LE);
+}
+Parrot_gt_ic_ic_ic {
+ if (*INT_CONST[1] > *INT_CONST[2])
+ emit_jump_to_op (jit_info, cond_AL, *INT_CONST[3]);
}

+Parrot_ge_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ CMP, 0, 0, r0, r1);
+ emit_jump_to_op (jit_info, cond_GE, *INT_CONST[3]);
+}
+Parrot_ge_i_ic_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 1, *INT_CONST[2],
+ *INT_CONST[3], cond_GE);
+}
+Parrot_ge_ic_i_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 2, *INT_CONST[1],
+ *INT_CONST[3], cond_LT);
+}
+Parrot_ge_ic_ic_ic {
+ if (*INT_CONST[1] >= *INT_CONST[2])
+ emit_jump_to_op (jit_info, cond_AL, *INT_CONST[3]);
+}
+
+Parrot_if_i_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 1, 0, *INT_CONST[2],
+ cond_NE);
+}
+Parrot_unless_i_ic {
+ Parrot_jit_jumpif_const (jit_info, interpreter, 1, 0, *INT_CONST[2],
+ cond_EQ);
+}
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; arithmetic ops
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+Parrot_abs_i_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter, cond_AL,
+ (*INT_CONST[2] < 0)
+ ? -*INT_CONST[2] : *INT_CONST[2],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+;
+; maybe not the best way:
+; cmp r0, #0
+; rsblt r0, r0, #0
+;
+Parrot_abs_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);


+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,
+ CMP, 0, 0, r0, 0, 0);

+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_LT,
+ RSB, 0, r0, r0, 0, 0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+;
+; rsbs r0, r0, #0
+; strgt r0, [...]
+;
+;Parrot_abs_i {
+; Parrot_jit_int_laod(jit_info, interpreter, cond_AL, 1, r0);


+; jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,

+; RSB, arith_sets_S,
+; r0, r0, 0, 0);
+; Parrot_jit_int_store(jit_info, interpreter, cond_GT, 1, r0);
+;}
+
Parrot_add_i_i_i {
- Parrot_jit_int_load(jit_info, interpreter, 2, r0);
- Parrot_jit_int_load(jit_info, interpreter, 3, r1);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);


jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

ADD, 0, r2, r0, r1);

- Parrot_jit_int_store(jit_info, interpreter, 1, r2);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
}
+Parrot_add_i_i_ic {
+ Parrot_jit_arith_const_neg (jit_info, interpreter, cond_AL, ADD, SUB,
+ 1, 2, *INT_CONST[3]);
+}
+Parrot_add_i_ic_i {
+ Parrot_jit_arith_const_neg (jit_info, interpreter, cond_AL, ADD, SUB,
+ 1, 3, *INT_CONST[2]);
+}
+Parrot_add_i_ic_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ *INT_CONST[2] + *INT_CONST[3],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_add_i_ic {
+ Parrot_jit_arith_const_neg (jit_info, interpreter, cond_AL, ADD, SUB,
+ 1, 1, *INT_CONST[2]);
+}
+
+Parrot_dec_i {
+ Parrot_jit_arith_const_neg (jit_info, interpreter, cond_AL, SUB, ADD,
+ 1, 1, 1);
+}
+Parrot_inc_i {
+ Parrot_jit_arith_const_neg (jit_info, interpreter, cond_AL, ADD, SUB,
+ 1, 1, 1);
+}
+
+; mul can't do immediate constants, and there are restrictions on which
+; registers you can use. (IIRC rd and rn can't be the same register)
+Parrot_mul_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r1);
+ jit_info->native_ptr = emit_mul (jit_info->native_ptr, cond_AL, 0,
+ r2, r0, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_mul_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);
+ jit_info->native_ptr = emit_mul (jit_info->native_ptr, cond_AL, 0,
+ r2, r0, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_mul_i_ic_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ *INT_CONST[2] * *INT_CONST[3],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+
+Parrot_neg_i {
+ Parrot_jit_arith_const (jit_info, interpreter, cond_AL, RSB, 1, 1, 0);
+}
+Parrot_neg_i_i {
+ Parrot_jit_arith_const (jit_info, interpreter, cond_AL, RSB, 1, 2, 0);
+}
+Parrot_neg_i_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ -*INT_CONST[2],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+

Parrot_sub_i_i_i {
- Parrot_jit_int_load(jit_info, interpreter, 2, r0);
- Parrot_jit_int_load(jit_info, interpreter, 3, r1);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);


jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

SUB, 0, r2, r0, r1);

- Parrot_jit_int_store(jit_info, interpreter, 1, r2);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_sub_i_i_ic {
+ Parrot_jit_arith_const_neg (jit_info, interpreter, cond_AL, SUB, ADD,
+ 1, 2, *INT_CONST[3]);
+}
+Parrot_sub_i_ic_i {
+ Parrot_jit_arith_const_neg (jit_info, interpreter, cond_AL, RSB, ADD,
+ 1, 3, *INT_CONST[2]);
+}
+Parrot_sub_i_ic_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ *INT_CONST[2] - *INT_CONST[3],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_sub_i_ic {
+ Parrot_jit_arith_const_neg (jit_info, interpreter, cond_AL, SUB, ADD,
+ 1, 1, *INT_CONST[2]);
}

-Parrot_if_i_ic {
- Parrot_jit_int_load(jit_info, interpreter, 1, r0);
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; bit ops
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+Parrot_band_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ AND, 0, r2, r0, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_band_i_i_ic {
+ Parrot_jit_arith_const_not (jit_info, interpreter, cond_AL, AND, BIC,
+ 1, 2, *INT_CONST[3]);
+}
+Parrot_band_i_ic_i {
+ Parrot_jit_arith_const_not (jit_info, interpreter, cond_AL, AND, BIC,
+ 1, 3, *INT_CONST[2]);
+}
+Parrot_band_i_ic_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ *INT_CONST[2] & *INT_CONST[3],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_band_i_ic {
+ Parrot_jit_arith_const_not (jit_info, interpreter, cond_AL, AND, BIC,
+ 1, 1, *INT_CONST[2]);
+}
+
+;Parrot_bnot_i {
+; Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+; jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,
+; MVN, 0, r0, 0, r0);
+; Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+;}
+Parrot_bnot_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ MVN, 0, r0, 0, r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_bnot_i_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ ~*INT_CONST[2],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+
+Parrot_bor_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ ORR, 0, r2, r0, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_bor_i_i_ic {
+ Parrot_jit_arith_const (jit_info, interpreter, cond_AL, ORR,
+ 1, 2, *INT_CONST[3]);
+}
+Parrot_bor_i_ic_i {
+ Parrot_jit_arith_const (jit_info, interpreter, cond_AL, ORR,
+ 1, 3, *INT_CONST[2]);
+}
+Parrot_bor_i_ic_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ *INT_CONST[2] | *INT_CONST[3],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_bor_i_ic {
+ Parrot_jit_arith_const(jit_info, interpreter, cond_AL, ORR,
+ 1, 1, *INT_CONST[2]);
+}
+
+Parrot_bxor_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);


+ jit_info->native_ptr = emit_arith_reg (jit_info->native_ptr, cond_AL,

+ EOR, 0, r2, r0, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_bxor_i_i_ic {
+ Parrot_jit_arith_const (jit_info, interpreter, cond_AL, EOR,
+ 1, 2, *INT_CONST[3]);
+}
+Parrot_bxor_i_ic_i {
+ Parrot_jit_arith_const (jit_info, interpreter, cond_AL, EOR,
+ 1, 3, *INT_CONST[2]);
+}
+Parrot_bxor_i_ic_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ *INT_CONST[2] | *INT_CONST[3],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_bxor_i_ic {
+ Parrot_jit_arith_const(jit_info, interpreter, cond_AL, EOR,
+ 1, 1, *INT_CONST[2]);
+}
+
+Parrot_shl_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);
+ jit_info->native_ptr
+ = emit_arith_reg_shift_reg (jit_info->native_ptr, cond_AL,
+ MOV, 0, r2, 0, r0, shift_LSL, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_shl_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ jit_info->native_ptr
+ = emit_arith_reg_shift_const (jit_info->native_ptr, cond_AL,
+ MOV, 0, r2, 0, r0, shift_LSL,
+ *INT_CONST[3]);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_shl_i_ic_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);
+ jit_info->native_ptr
+ = emit_arith_reg_shift_reg (jit_info->native_ptr, cond_AL,
+ MOV, 0, r2, 0, r0, shift_LSL, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_shl_i_ic_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ *INT_CONST[2] << *INT_CONST[3],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_shr_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);
+ jit_info->native_ptr
+ = emit_arith_reg_shift_reg (jit_info->native_ptr, cond_AL,
+ MOV, 0, r2, 0, r0, shift_ASR, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_shr_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ jit_info->native_ptr
+ = emit_arith_reg_shift_const (jit_info->native_ptr, cond_AL,
+ MOV, 0, r2, 0, r0, shift_ASR,
+ *INT_CONST[3]);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_shr_i_ic_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r1);
+ jit_info->native_ptr
+ = emit_arith_reg_shift_reg (jit_info->native_ptr, cond_AL,
+ MOV, 0, r2, 0, r0, shift_ASR, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r2);
+}
+Parrot_shr_i_ic_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ *INT_CONST[2] >> *INT_CONST[3],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; logical ops
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+Parrot_and_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);


+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,
+ CMP, 0, 0, r0, 0, 0);

+ Parrot_jit_int_load(jit_info, interpreter, cond_NE, 3, r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_and_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);


+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,
+ CMP, 0, 0, r0, 0, 0);

+ Parrot_jit_int_load(jit_info, interpreter, cond_NE, 3, r1);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_and_i_ic_i {
+ if (*INT_CONST[2]) {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r0);
+ } else {
+ /* *INT_CONST[2] is going to be 0 anyway, but this does generate the
+ shortest code to load 0. */
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ }
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_and_i_ic_ic {
+ if (*INT_CONST[2]) {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r0);
+ } else {
+ /* *INT_CONST[2] is going to be 0 anyway, but this does generate the
+ shortest code to load 0. */
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ }
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+
+Parrot_or_i_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);


+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,
+ CMP, 0, 0, r0, 0, 0);

+ Parrot_jit_int_load(jit_info, interpreter, cond_EQ, 3, r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_or_i_i_ic {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);


jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,

CMP, 0, 0, r0, 0, 0);

- emit_jump_to_op (jit_info, cond_NE, 0, *INT_CONST[2]);
+ Parrot_jit_int_load(jit_info, interpreter, cond_EQ, 3, r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_or_i_ic_i {
+ if (*INT_CONST[2]) {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ } else {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r0);
+ }
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+Parrot_or_i_ic_ic {
+ if (*INT_CONST[2]) {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);
+ } else {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 3, r0);
+ }
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+;
+; no or_i_i :-(
+;
+;
+; RSBS r1, r0, #1
+; ie flags set for CMP #1, r0
+; think unsigned - if r0 is 0, #1 is HIgher
+; for all other cases it's Lower or Same.
+; if r0 is 0, then #1 - #0 is 1, and r1 is set correctly.
+; else
+; MOVLS r1, #0
+; (actaully it's also set correctly for r0 is #1)
+Parrot_not_i_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 2, r0);


+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_AL,

+ RSB, arith_sets_S,
+ r1, r0, 1, 0);
+ jit_info->native_ptr = emit_arith_immediate (jit_info->native_ptr, cond_LS,
+ MOV, 0,
+ r1, 0, 0, 0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r1);
}
+Parrot_not_i_ic {
+ jit_info->native_ptr = emit_load_constant (jit_info->native_ptr,
+ interpreter,
+ cond_AL,
+ !*INT_CONST[2],
+ r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+
+; XXX TODO - the non constant versions of this
+Parrot_xor_i_ic_ic {
+ jit_info->native_ptr
+ = emit_load_constant (jit_info->native_ptr, interpreter, cond_AL,
+ (*INT_CONST[2] && ! *INT_CONST[3])
+ ? *INT_CONST[2]
+ : (*INT_CONST[3] && ! *INT_CONST[2])
+ ? *INT_CONST[3] : 0, r0);
+ Parrot_jit_int_store(jit_info, interpreter, cond_AL, 1, r0);
+}
+
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+; branches
+;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+
+Parrot_branch_i {
+ Parrot_jit_int_load(jit_info, interpreter, cond_AL, 1, r0);
+ Parrot_jump_to_op_in_reg(jit_info, interpreter, r0);
+}
+Parrot_branch_ic {
+ emit_jump_to_op (jit_info, cond_AL, *INT_CONST[1]);

Daniel Grunblatt

unread,
Aug 4, 2002, 2:30:09 PM8/4/02
to Nicholas Clark, perl6-i...@perl.org
On Sat, 3 Aug 2002, Nicholas Clark wrote:

> I wasn't actually expecting you to apply that :-)
> It was more a "where I am at now" informational patch.

Sorry :)

>
> I think that this patch is at good point to pause and take stock. I believe
> it JITs just about every integer op (including some i386 isn't JITting yet)

Great job!

I'm working on it, as I'm working on the register allocator too.

Daniel Grunblatt.

0 new messages