I think it is doable, since Lexra has done that before. But there are
at least 2 complications:
1) patent related
- a quick patent search will return tons of results on efficient
handling of unaligned memory load/store operations in software &
hardware.
- (US patent 4,814,976 owned by MIPS technologies Inc expired in 2006
(1986+20), so that's good!)
http://www.mba.intercol.edu/Entrepreneurship/UT Computer Science
Course/mips-vs-lexra-MPR-dec1999.pdf
http://www.google.com/patents?id=6egZAAAAEBAJ&dq=4814976
2) Another issue is to identify whether the load/store is dealing with
a valid memory location, or it is an invalid pointer. Also, I *think*
we should be concerned about unaligned memory accesses in branch delay
slots. We will need to check what the processor (or the OS) tells us
when the instruction in the delay slot is causing the bus error... and
what if control is not passed from the branch before the load/store
instruction, but for example an unconditional jump that jumps to the
delay slot?? All those can affect the restart address after the signal
handler finishes execution.
SPARC has similar issues, but I could not find the code for handling
unaligned memory accesses at runtime. For compile time handling of
unaligned memory access in Solaris, see:
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libbc/libc/crt/sparc/misalign.s
Rayson
>
> --
> Zhang, Le
> Gentoo/Loongson Developer
> http://zhangle.is-a-geek.org
> 0260 C902 B8F8 6506 6586 2B90 BC51 C808 1E4E 2973
>
I know your point is to avoid any potential lawsuit. But I really don't think
this is a thing to worry about.
If someone infringes the patent, it would be CAS, ICT and ST. They implemented
ldl/ldr, sdl/sdr in the hardware in the first place. MIPS world can't afford going
through another lawsuit. I think MTI knows this.
Actually, even if it really matters, I don't give a shit.
All I care is to make firefox work as expected.
So, please, don't bring this on any more. We all knew it.
> 2) Another issue is to identify whether the load/store is dealing with
> a valid memory location, or it is an invalid pointer. Also, I *think*
> we should be concerned about unaligned memory accesses in branch delay
> slots. We will need to check what the processor (or the OS) tells us
> when the instruction in the delay slot is causing the bus error... and
> what if control is not passed from the branch before the load/store
> instruction, but for example an unconditional jump that jumps to the
> delay slot?? All those can affect the restart address after the signal
> handler finishes execution.
>
> SPARC has similar issues, but I could not find the code for handling
> unaligned memory accesses at runtime. For compile time handling of
> unaligned memory access in Solaris, see:
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/lib/libbc/libc/crt/sparc/misalign.s
Thanks, I will try whatever it takes to make it work.
No, the reason why I brought it up is because there are many existing
patents, owned by MIPS and other hardware/software companies, on
unaligned load/stores, the Lexra case is just an example of how using
a software routine can infringe a hardware patent.
If we are not careful and put a routine for the MIPS port, then not
ICT, but you and all others who distribute, or even those use MIPS
Linux could be in trouble.
And lawsuits like those can take years, and remember when did the IBM
vs. SCO lawsuit start??
Rayson
Thanks for your kind reminder.
However, I just found linux kernel already have unaliged access emulation
implementation.
http://lxr.linux.no/linux+v2.6.29/arch/mips/kernel/unaligned.c
So, I guess it should be just ok. Otherwise, many people would have been in
trouble already.
And, my unaligned ldc1/sdc1 emulation patch works now.
http://repo.or.cz/w/linux-2.6/linux-loongson.git?a=commitdiff;h=ebf014b4782b57c98be20d849fe6de3748ac6d76;hp=9c54bef784d1401aa60a2a61f14665d80900361d
Besides this, cairo can't be compiled with -O2 enabled.
Now N32 firefox's score in acid3 test could go to 70/100. Previously crashes
around 51/100.
A little test program for this patch:
#include <stdio.h>
int main()
{
char buf[12];
double *d = &buf[1];
*d = 0xff;
printf("%f\n", *d);
}
Without this patch, this program will get bus error.
Good!!
(And thanks for pointing me to the Emulating Instructions section in
See MIPS Run.)
Are we going to fix unaligned memory accesses in Firefox and other
programs, or we will just leave them once we have this change in the
kernel??
One minor thing, in your patch:
+ /* Cannot handle 64-bit instructions in 32-bit kernel */
+ goto sigill;
I think load/store FP-double can execute on a 32-bit processor (See
MIPS Run, 7.5 Floating-Point Registers), so it should go to sigbus
instead of sigill.
Rayson
It doesn't matter actually.
Ralf refuse to accept this patch.
It happened on #mipslinux@freenode
20:48 <Ralf> Why not fix firefox?
20:52 <Ralf> is it hard?
20:56 <r0bertz> tried many way, can't fix it, it just can't be get aligned
20:57 <r0bertz> maybe the problem is in gcc
20:57 <Ralf> I doubt it.
20:57 <Ralf> How are these structures allocated?
20:58 <r0bertz> they are in a class, so should be 'new'ed
20:59 <Ralf> Presumably new is sitting on top of something like malloc.
20:59 <r0bertz> yup, allocated on heap
21:00 <Ralf> And that allocator in new will put some information to near the
beginning of th allocated header.
21:01 <r0bertz> so?
21:02 <Ralf> Presumably some math there is a bit off.
21:03 <Ralf> Btw, the emulated loads and stores should be 1000x - 2000x slower.
21:03 <Ralf> That's why I'm objecting so hard.
--
Zhang, Le
Gentoo/Loongson Developer
The address returned by glibc malloc() should always be at least
8-byte aligned in 32-bit mode and 16-byte aligned in 64-bit mode.
Given that struct/class member layout could not be changed at runtime
(BTW, some JVMs do this kind of on-the-fly layout optimizations, but
not in the current version of gcc/g++), I strongly believe that the
memory returned by the overloading new operator is allocated from an
internal memory pool -- otherwise it would be hard to explain why the
object sometimes is not placed at the 8-byte boundry while sometimes
it is.
I have trouble finding the correct method that overloads class
nsSVGGlyphFrame's new operator -- I think I need better tools than vi
& grep to deal with multiple inheritance!!
Rayson
Actually, it is a placement new.
In function NS_NewSVGGlyphFrame() in layout/svg/base/src/nsSVGGlyphFrame.cpp:
return new (aPresShell) nsSVGGlyphFrame(aContext);
I added printf to print the returned value of this function. I found this was
exactly the cause of the misalignment.
aPresShell: 0x1172c678
newFrame: 0x111f2cd0
aPresShell: 0x1172c678
newFrame: 0x111f34a0
aPresShell: 0x1157c1b0
newFrame: 0x116ae6f0
aPresShell: 0x1157c1b0
newFrame: 0x115886b8
aPresShell: 0x1157c1b0
newFrame: 0x11588710
aPresShell: 0x1157c1b0
newFrame: 0x115887a4 <-- 4 bytes but not 8 bytes aligned
Yes, "placement new" and "overloaded new operator" refer to the same thing.
Anyway, this is not the C++ runtime's "new". If you put a gdb
breakpoint in NS_NewSVGGlyphFrame(), and then step (s, not n) into
new, you should be able to find the actual function deep in the call
stack that returns the unaligned memory.
Rayson
Yes.
Honestly, it is the first time I see this kind of new (I am almost a C++ idiot).
And I just learned that it is just overloaded new operator.
http://glenmccl.com./nd_cmp.htm
I remember that you have mentioned this before. I am sorry I didn't pay enough
attention to it.
I just found this overloaded new's definition. I will study it first.
Stack space is allocated at runtime by instructions generated
(logically moving the address in register $29 up or down) by the
compiler to store things like call frames, local variables, return
addresses, memory allocated via alloca(), etc... jemalloc could not
help with stack fragmentation.
And jemalloc does not change the alignment requirements expected by
most applications, otherwise lots of code would break. In
memory/jemalloc/jemalloc.c :
/* Minimum alignment of allocations is 2^QUANTUM_2POW_MIN bytes. */
# define QUANTUM_2POW_MIN 4
(2^4 = 16 bytes)
Since Robert said that he found the overloaded new operator that
returns non 8-byte aligned memory, we should be patient and just wait
for his new findings. :-)
Rayson
And it is working!
With this patch, there will be no unaligned ldc1/sdc1. So that kernel patch is
not needed.