signal handler for unaligned memory accesses??

35 views
Skip to first unread message

Rayson Ho

unread,
Apr 27, 2009, 3:24:40 PM4/27/09
to loongs...@googlegroups.com
On Mon, Apr 27, 2009 at 12:54 PM, Zhang Le <r0b...@gentoo.org> wrote:
> I am thinking maybe we should implemente a signal handler to emulate this
> instruction.

I think it is doable, since Lexra has done that before. But there are
at least 2 complications:

1) patent related
- a quick patent search will return tons of results on efficient
handling of unaligned memory load/store operations in software &
hardware.
- (US patent 4,814,976 owned by MIPS technologies Inc expired in 2006
(1986+20), so that's good!)

http://www.mba.intercol.edu/Entrepreneurship/UT Computer Science
Course/mips-vs-lexra-MPR-dec1999.pdf
http://www.google.com/patents?id=6egZAAAAEBAJ&dq=4814976


2) Another issue is to identify whether the load/store is dealing with
a valid memory location, or it is an invalid pointer. Also, I *think*
we should be concerned about unaligned memory accesses in branch delay
slots. We will need to check what the processor (or the OS) tells us
when the instruction in the delay slot is causing the bus error... and
what if control is not passed from the branch before the load/store
instruction, but for example an unconditional jump that jumps to the
delay slot?? All those can affect the restart address after the signal
handler finishes execution.

SPARC has similar issues, but I could not find the code for handling
unaligned memory accesses at runtime. For compile time handling of
unaligned memory access in Solaris, see:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libbc/libc/crt/sparc/misalign.s

Rayson


>
> --
> Zhang, Le
> Gentoo/Loongson Developer
> http://zhangle.is-a-geek.org
> 0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973
>

Zhang Le

unread,
Apr 28, 2009, 1:50:28 AM4/28/09
to loongs...@googlegroups.com
On 14:24 Mon 27 Apr , Rayson Ho wrote:
>
> On Mon, Apr 27, 2009 at 12:54 PM, Zhang Le <r0b...@gentoo.org> wrote:
> > I am thinking maybe we should implemente a signal handler to emulate this
> > instruction.
>
> I think it is doable, since Lexra has done that before. But there are
> at least 2 complications:
>
> 1) patent related
> - a quick patent search will return tons of results on efficient
> handling of unaligned memory load/store operations in software &
> hardware.
> - (US patent 4,814,976 owned by MIPS technologies Inc expired in 2006
> (1986+20), so that's good!)
>
> http://www.mba.intercol.edu/Entrepreneurship/UT Computer Science
> Course/mips-vs-lexra-MPR-dec1999.pdf
> http://www.google.com/patents?id=6egZAAAAEBAJ&dq=4814976

I know your point is to avoid any potential lawsuit. But I really don't think
this is a thing to worry about.

If someone infringes the patent, it would be CAS, ICT and ST. They implemented
ldl/ldr, sdl/sdr in the hardware in the first place. MIPS world can't afford going
through another lawsuit. I think MTI knows this.

Actually, even if it really matters, I don't give a shit.
All I care is to make firefox work as expected.
So, please, don't bring this on any more. We all knew it.

> 2) Another issue is to identify whether the load/store is dealing with
> a valid memory location, or it is an invalid pointer. Also, I *think*
> we should be concerned about unaligned memory accesses in branch delay
> slots. We will need to check what the processor (or the OS) tells us
> when the instruction in the delay slot is causing the bus error... and
> what if control is not passed from the branch before the load/store
> instruction, but for example an unconditional jump that jumps to the
> delay slot?? All those can affect the restart address after the signal
> handler finishes execution.
>
> SPARC has similar issues, but I could not find the code for handling
> unaligned memory accesses at runtime. For compile time handling of
> unaligned memory access in Solaris, see:
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libbc/libc/crt/sparc/misalign.s

Thanks, I will try whatever it takes to make it work.

Rayson Ho

unread,
Apr 28, 2009, 2:15:40 AM4/28/09
to loongs...@googlegroups.com
On Tue, Apr 28, 2009 at 12:50 AM, Zhang Le <r0b...@gentoo.org> wrote:
> I know your point is to avoid any potential lawsuit. But I really don't think
> this is a thing to worry about.
>
> If someone infringes the patent, it would be CAS, ICT and ST. They implemented
> ldl/ldr, sdl/sdr in the hardware in the first place. MIPS world can't afford going
> through another lawsuit. I think MTI knows this.

No, the reason why I brought it up is because there are many existing
patents, owned by MIPS and other hardware/software companies, on
unaligned load/stores, the Lexra case is just an example of how using
a software routine can infringe a hardware patent.

If we are not careful and put a routine for the MIPS port, then not
ICT, but you and all others who distribute, or even those use MIPS
Linux could be in trouble.

And lawsuits like those can take years, and remember when did the IBM
vs. SCO lawsuit start??

Rayson

Zhang Le

unread,
Apr 28, 2009, 10:36:31 AM4/28/09
to loongs...@googlegroups.com
On 01:15 Tue 28 Apr , Rayson Ho wrote:
>
> On Tue, Apr 28, 2009 at 12:50 AM, Zhang Le <r0b...@gentoo.org> wrote:
> > I know your point is to avoid any potential lawsuit. But I really don't think
> > this is a thing to worry about.
> >
> > If someone infringes the patent, it would be CAS, ICT and ST. They implemented
> > ldl/ldr, sdl/sdr in the hardware in the first place. MIPS world can't afford going
> > through another lawsuit. I think MTI knows this.
>
> No, the reason why I brought it up is because there are many existing
> patents, owned by MIPS and other hardware/software companies, on
> unaligned load/stores, the Lexra case is just an example of how using
> a software routine can infringe a hardware patent.
>
> If we are not careful and put a routine for the MIPS port, then not
> ICT, but you and all others who distribute, or even those use MIPS
> Linux could be in trouble.

Thanks for your kind reminder.
However, I just found linux kernel already have unaliged access emulation
implementation.
http://lxr.linux.no/linux+v2.6.29/arch/mips/kernel/unaligned.c
So, I guess it should be just ok. Otherwise, many people would have been in
trouble already.

And, my unaligned ldc1/sdc1 emulation patch works now.
http://repo.or.cz/w/linux-2.6/linux-loongson.git?a=commitdiff;h=ebf014b4782b57c98be20d849fe6de3748ac6d76;hp=9c54bef784d1401aa60a2a61f14665d80900361d
Besides this, cairo can't be compiled with -O2 enabled.

Now N32 firefox's score in acid3 test could go to 70/100. Previously crashes
around 51/100.

Zhang Le

unread,
Apr 28, 2009, 10:40:30 AM4/28/09
to loongs...@googlegroups.com

A little test program for this patch:
#include <stdio.h>
int main()
{
char buf[12];
double *d = &buf[1];
*d = 0xff;
printf("%f\n", *d);
}

Without this patch, this program will get bus error.

Rayson Ho

unread,
Apr 28, 2009, 11:51:26 AM4/28/09
to loongs...@googlegroups.com
On Tue, Apr 28, 2009 at 9:40 AM, Zhang Le <r0b...@gentoo.org> wrote:
>> And, my unaligned ldc1/sdc1 emulation patch works now.
>> http://repo.or.cz/w/linux-2.6/linux-loongson.git?a=commitdiff;h=ebf014b4782b57c98be20d849fe6de3748ac6d76;hp=9c54bef784d1401aa60a2a61f14665d80900361d

Good!!

(And thanks for pointing me to the Emulating Instructions section in
See MIPS Run.)

Are we going to fix unaligned memory accesses in Firefox and other
programs, or we will just leave them once we have this change in the
kernel??

One minor thing, in your patch:

+ /* Cannot handle 64-bit instructions in 32-bit kernel */
+ goto sigill;

I think load/store FP-double can execute on a 32-bit processor (See
MIPS Run, 7.5 Floating-Point Registers), so it should go to sigbus
instead of sigill.

Rayson

Zhang Le

unread,
Apr 28, 2009, 11:57:40 AM4/28/09
to loongs...@googlegroups.com
On 10:51 Tue 28 Apr , Rayson Ho wrote:
>
> On Tue, Apr 28, 2009 at 9:40 AM, Zhang Le <r0b...@gentoo.org> wrote:
> >> And, my unaligned ldc1/sdc1 emulation patch works now.
> >> http://repo.or.cz/w/linux-2.6/linux-loongson.git?a=commitdiff;h=ebf014b4782b57c98be20d849fe6de3748ac6d76;hp=9c54bef784d1401aa60a2a61f14665d80900361d
>
> Good!!
>
> (And thanks for pointing me to the Emulating Instructions section in
> See MIPS Run.)
>
> Are we going to fix unaligned memory accesses in Firefox and other
> programs, or we will just leave them once we have this change in the
> kernel??
>
> One minor thing, in your patch:
>
> + /* Cannot handle 64-bit instructions in 32-bit kernel */
> + goto sigill;
>
> I think load/store FP-double can execute on a 32-bit processor (See
> MIPS Run, 7.5 Floating-Point Registers), so it should go to sigbus
> instead of sigill.

It doesn't matter actually.
Ralf refuse to accept this patch.

Chi

unread,
Apr 29, 2009, 12:31:31 AM4/29/09
to loongson-dev
What were his reasons to reject the patch?

I could not find the discussions of this patch on the LKML.

--Chi


On Apr 28, 10:57 am, Zhang Le <r0be...@gentoo.org> wrote:
> It doesn't matter actually.
> Ralf refuse to accept this patch.
>
> --
> Zhang, Le
> Gentoo/Loongson Developerhttp://zhangle.is-a-geek.org
> 0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973
>
> application_pgp-signature_part
> < 1KViewDownload

Zhang Le

unread,
Apr 29, 2009, 12:35:07 AM4/29/09
to loongs...@googlegroups.com
On 21:31 Tue 28 Apr , Chi wrote:
>
> What were his reasons to reject the patch?
>
> I could not find the discussions of this patch on the LKML.

It happened on #mipslinux@freenode

20:48 <Ralf> Why not fix firefox?
20:52 <Ralf> is it hard?
20:56 <r0bertz> tried many way, can't fix it, it just can't be get aligned
20:57 <r0bertz> maybe the problem is in gcc
20:57 <Ralf> I doubt it.
20:57 <Ralf> How are these structures allocated?
20:58 <r0bertz> they are in a class, so should be 'new'ed
20:59 <Ralf> Presumably new is sitting on top of something like malloc.
20:59 <r0bertz> yup, allocated on heap
21:00 <Ralf> And that allocator in new will put some information to near the
beginning of th allocated header.
21:01 <r0bertz> so?
21:02 <Ralf> Presumably some math there is a bit off.
21:03 <Ralf> Btw, the emulated loads and stores should be 1000x - 2000x slower.
21:03 <Ralf> That's why I'm objecting so hard.

--
Zhang, Le
Gentoo/Loongson Developer

Rayson Ho

unread,
Apr 29, 2009, 1:24:40 AM4/29/09
to loongs...@googlegroups.com
On Tue, Apr 28, 2009 at 11:35 PM, Zhang Le <r0b...@gentoo.org> wrote:
> 20:57 <Ralf> How are these structures allocated?
> 20:58 <r0bertz> they are in a class, so should be 'new'ed
> 20:59 <Ralf> Presumably new is sitting on top of something like malloc.

The address returned by glibc malloc() should always be at least
8-byte aligned in 32-bit mode and 16-byte aligned in 64-bit mode.
Given that struct/class member layout could not be changed at runtime
(BTW, some JVMs do this kind of on-the-fly layout optimizations, but
not in the current version of gcc/g++), I strongly believe that the
memory returned by the overloading new operator is allocated from an
internal memory pool -- otherwise it would be hard to explain why the
object sometimes is not placed at the 8-byte boundry while sometimes
it is.

I have trouble finding the correct method that overloads class
nsSVGGlyphFrame's new operator -- I think I need better tools than vi
& grep to deal with multiple inheritance!!

Rayson

Zhang Le

unread,
Apr 29, 2009, 2:01:00 AM4/29/09
to loongs...@googlegroups.com
On 00:24 Wed 29 Apr , Rayson Ho wrote:
>
> On Tue, Apr 28, 2009 at 11:35 PM, Zhang Le <r0b...@gentoo.org> wrote:
> > 20:57 <Ralf> How are these structures allocated?
> > 20:58 <r0bertz> they are in a class, so should be 'new'ed
> > 20:59 <Ralf> Presumably new is sitting on top of something like malloc.
>
> The address returned by glibc malloc() should always be at least
> 8-byte aligned in 32-bit mode and 16-byte aligned in 64-bit mode.
> Given that struct/class member layout could not be changed at runtime
> (BTW, some JVMs do this kind of on-the-fly layout optimizations, but
> not in the current version of gcc/g++), I strongly believe that the
> memory returned by the overloading new operator is allocated from an
> internal memory pool -- otherwise it would be hard to explain why the
> object sometimes is not placed at the 8-byte boundry while sometimes
> it is.
>
> I have trouble finding the correct method that overloads class
> nsSVGGlyphFrame's new operator -- I think I need better tools than vi
> & grep to deal with multiple inheritance!!

Actually, it is a placement new.

In function NS_NewSVGGlyphFrame() in layout/svg/base/src/nsSVGGlyphFrame.cpp:
return new (aPresShell) nsSVGGlyphFrame(aContext);

I added printf to print the returned value of this function. I found this was
exactly the cause of the misalignment.

aPresShell: 0x1172c678
newFrame: 0x111f2cd0
aPresShell: 0x1172c678
newFrame: 0x111f34a0
aPresShell: 0x1157c1b0
newFrame: 0x116ae6f0
aPresShell: 0x1157c1b0
newFrame: 0x115886b8
aPresShell: 0x1157c1b0
newFrame: 0x11588710
aPresShell: 0x1157c1b0
newFrame: 0x115887a4 <-- 4 bytes but not 8 bytes aligned

Rayson Ho

unread,
Apr 29, 2009, 11:02:48 AM4/29/09
to loongs...@googlegroups.com
On Wed, Apr 29, 2009 at 1:01 AM, Zhang Le <r0b...@gentoo.org> wrote:
> Actually, it is a placement new.
>
> In function NS_NewSVGGlyphFrame() in layout/svg/base/src/nsSVGGlyphFrame.cpp:
> return new (aPresShell) nsSVGGlyphFrame(aContext);

Yes, "placement new" and "overloaded new operator" refer to the same thing.

Anyway, this is not the C++ runtime's "new". If you put a gdb
breakpoint in NS_NewSVGGlyphFrame(), and then step (s, not n) into
new, you should be able to find the actual function deep in the call
stack that returns the unaligned memory.

Rayson

Zhang Le

unread,
Apr 29, 2009, 11:35:36 AM4/29/09
to loongs...@googlegroups.com
On 10:02 Wed 29 Apr , Rayson Ho wrote:
>
> On Wed, Apr 29, 2009 at 1:01 AM, Zhang Le <r0b...@gentoo.org> wrote:
> > Actually, it is a placement new.
> >
> > In function NS_NewSVGGlyphFrame() in layout/svg/base/src/nsSVGGlyphFrame.cpp:
> > return new (aPresShell) nsSVGGlyphFrame(aContext);
>
> Yes, "placement new" and "overloaded new operator" refer to the same thing.
>
> Anyway, this is not the C++ runtime's "new". If you put a gdb
> breakpoint in NS_NewSVGGlyphFrame(), and then step (s, not n) into
> new, you should be able to find the actual function deep in the call
> stack that returns the unaligned memory.

Yes.

Honestly, it is the first time I see this kind of new (I am almost a C++ idiot).
And I just learned that it is just overloaded new operator.
http://glenmccl.com./nd_cmp.htm

I remember that you have mentioned this before. I am sorry I didn't pay enough
attention to it.

I just found this overloaded new's definition. I will study it first.

hashao

unread,
Apr 29, 2009, 11:51:56 PM4/29/09
to loongson-dev


On Apr 29, 11:35 pm, Zhang Le <r0be...@gentoo.org> wrote:
> On 10:02 Wed 29 Apr     , Rayson Ho wrote:
>
>
>
> > On Wed, Apr 29, 2009 at 1:01 AM, Zhang Le <r0be...@gentoo.org> wrote:
> > > Actually, it is a placement new.
>
> > > In function NS_NewSVGGlyphFrame() in layout/svg/base/src/nsSVGGlyphFrame.cpp:
> > >  return new (aPresShell) nsSVGGlyphFrame(aContext);
>
> > Yes, "placement new" and "overloaded new operator" refer to the same thing.
>
> > Anyway, this is not the C++ runtime's "new". If you put a gdb
> > breakpoint in NS_NewSVGGlyphFrame(), and then step (s, not n) into
> > new, you should be able to find the actual function deep in the call
> > stack that returns the unaligned memory.
>
> Yes.
>
> Honestly, it is the first time I see this kind of new (I am almost a C++ idiot).
> And I just learned that it is just overloaded new operator.http://glenmccl.com./nd_cmp.htm
>
> I remember that you have mentioned this before. I am sorry I didn't pay enough
> attention to it.
>
> I just found this overloaded new's definition. I will study it first.
>

See http://blog.pavlov.net/2008/03/11/firefox-3-memory-usage/

Zhang Le

unread,
Apr 30, 2009, 12:16:26 AM4/30/09
to loongs...@googlegroups.com

Forgive me for being blunt, but your point is?
Thanks for providing this link, anyway.

hashao

unread,
Apr 30, 2009, 1:20:55 AM4/30/09
to loongson-dev


On Apr 30, 12:16 pm, Zhang Le <r0be...@gentoo.org> wrote:
>
> Forgive me for being blunt, but your point is?
> Thanks for providing this link, anyway.
>

firefox 3.x replace system allocator with the jemalloc to minimize
stack fragmentation.

Rayson Ho

unread,
Apr 30, 2009, 2:05:32 AM4/30/09
to loongs...@googlegroups.com
On Thu, Apr 30, 2009 at 12:20 AM, hashao <has...@gmail.com> wrote:
> firefox 3.x replace system allocator with the jemalloc to minimize
> stack fragmentation.

Stack space is allocated at runtime by instructions generated
(logically moving the address in register $29 up or down) by the
compiler to store things like call frames, local variables, return
addresses, memory allocated via alloca(), etc... jemalloc could not
help with stack fragmentation.

And jemalloc does not change the alignment requirements expected by
most applications, otherwise lots of code would break. In
memory/jemalloc/jemalloc.c :

/* Minimum alignment of allocations is 2^QUANTUM_2POW_MIN bytes. */
# define QUANTUM_2POW_MIN 4

(2^4 = 16 bytes)

Since Robert said that he found the overloaded new operator that
returns non 8-byte aligned memory, we should be patient and just wait
for his new findings. :-)

Rayson

Zhang Le

unread,
Apr 30, 2009, 2:36:11 AM4/30/09
to loongs...@googlegroups.com

I may have found the answer. Now I am compiling the source.
Please stay tuned.

Zhang Le

unread,
Apr 30, 2009, 3:59:18 AM4/30/09
to loongs...@googlegroups.com
On 14:36 Thu 30 Apr , Zhang Le wrote:
> On 01:05 Thu 30 Apr , Rayson Ho wrote:
> >
> > On Thu, Apr 30, 2009 at 12:20 AM, hashao <has...@gmail.com> wrote:
> > > firefox 3.x replace system allocator with the jemalloc to minimize
> > > stack fragmentation.
> >
> > Stack space is allocated at runtime by instructions generated
> > (logically moving the address in register $29 up or down) by the
> > compiler to store things like call frames, local variables, return
> > addresses, memory allocated via alloca(), etc... jemalloc could not
> > help with stack fragmentation.
> >
> > And jemalloc does not change the alignment requirements expected by
> > most applications, otherwise lots of code would break. In
> > memory/jemalloc/jemalloc.c :
> >
> > /* Minimum alignment of allocations is 2^QUANTUM_2POW_MIN bytes. */
> > # define QUANTUM_2POW_MIN 4
> >
> > (2^4 = 16 bytes)
> >
> > Since Robert said that he found the overloaded new operator that
> > returns non 8-byte aligned memory, we should be patient and just wait
> > for his new findings. :-)
>
> I may have found the answer. Now I am compiling the source.
> Please stay tuned.

And it is working!

http://www.gentoo-cn.org/gitweb/?p=loongson.git;a=blob_plain;f=net-libs/xulrunner/files/xulrunner-mips-bus-error.patch;hb=e2eca4738cce8cb2d8adb78164794f982a388862

With this patch, there will be no unaligned ldc1/sdc1. So that kernel patch is
not needed.

Reply all
Reply to author
Forward
0 new messages