On 2014-06-25, Andrew Falanga <
af30...@gmail.com> wrote:
> Hi everyone,
>
> Some code that my team produced generates SIGBUS occasionally (not at all at
> regular intervals). Since our release code is optimized, I decided to
> deliver an unoptimized build to the customer for a core file which would be
> easier to diagnose (or so was the theory). Well, the debug build didn't give
> an error so the theory was the optimization flags we were using were not good
> (is -Os and we changed to -O2 in gcc). The problem just happened again.
Optimization is usually not the root cause of a bug; but it can change
the behavior so that hidden bugs are revealed. Or possibly vice versa.
Basically, if there is a defect in the program, different optimization settings
change the conditions under which the defect manifests itself.
On architectures that enforce alignment, SIGBUS denotes alignment exceptions.
A SIGBUS also occurs in the situation that a memory mapping extends
beyond the end of a file and is accessed.
And of course, something can always raise SIGBUS by accident.
int uninited_var; /* contains garbage equivalent to SIGBUS */
raise(uninited_var);
> for the assembler code, I've even produced code which makes a bus error all
> the time). The problem is, as I understand it, the x86 architecture protects
> against this unless you specifically embed assembler to enable the trap
> (which we're not doing). Further more, the one core file I've examined which
> came from SIGBUS showed all pointers having DWORD aligned addresses (this was
> on a 32-bit Linux build).
So alignment is ruled out. If you can rule out mmap'ed files as being the
cause, then the next thing to investigate is various Linux-kernel-specific
SIGBUS situations.
A good way to proceed here is to hunt down kernel code that possibly generates
SIGBUS and add a printk and a dump_stack() call there.
Are you accessing memory-mapped hardware? The "fault" virtual function
in a "struct vm_operations" can return VM_FAULT_SIGBUS, which the kernel
will translate to a SIGBUS. Numerous places return this value.
Oh, and one situation in the kernel that looks like can trigger a SIGBUS: when
a stack crashes into another memory mapping. Normally, the usual main thread
stack will not crash into anything when you have runaway recursion: it its the
process limit, and that's a SIGSEGV. But looks like there is code in Linux
which turns the crashing situation into a SIGBUS. See the do_anonymous_page
function in mm/memory.c, where it calls check_stack_guard_page.
Still, that does not give rise to easy theories about -Os versus -O2.