.raSearch in STACK WIN programs

55 views
Skip to first unread message

Benjamin Smedberg

unread,
Feb 3, 2012, 4:21:30 PM2/3/12
to google-br...@googlegroups.com
We just switched Firefox to use MSVC2010 and we've started seeing much
worse backtraces from the stack walker. I've done some investigating and
discovered at least one problem: the symbol files produced from these
binaries have stack walking programs which use a new builtin:

STACK WIN 4 119e 17 0 0 0 4 0 0 1 $T0 .raSearch = $eip $T0 ^ = $esp $T0
4 + = $23 $T0 4 - ^ =

Breakpad has .raSearchStart as a builtin directive but not .raSearch. I
have been combing through the MS docs and haven't found anything
explaining what .raSearch is. Has anyone else been looking at this or
have ideas on it?

Also I'm seeing programs such as this:

STACK WIN 4 7556 179 20 0 8 c a4 0 1 $T1 .raSearch = $T0 $T1 4 - 8 @ =
$ebp $T1 4 - ^ = $eip $T1 ^ = $esp $T1 4 + = $20 $T0 168 - ^ = $23 $T0
172 - ^ = $24 $T0 176 - ^ =

This contains the operator "@", which is new, unimplemented in breakpad,
and it isn't obvious to me what this operator is supposed to do.

--BDS

Jim Blandy

unread,
Feb 4, 2012, 12:20:07 AM2/4/12
to google-br...@googlegroups.com

I don't remember where I came across this, but I think @ is a round-to-multiple-of operator. I don't know if it rounds up or down, though. No idea on .raSearch.

--
You received this message because you are subscribed to the Google Groups "google-breakpad-dev" group.
To post to this group, send email to google-breakpad-dev@googlegroups.com.
To unsubscribe from this group, send email to google-breakpad-dev+unsub...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-breakpad-dev?hl=en.

Ted Mielczarek

unread,
Feb 5, 2012, 4:01:02 PM2/5/12
to google-br...@googlegroups.com
On Sat, Feb 4, 2012 at 12:20 AM, Jim Blandy <ji...@red-bean.com> wrote:
> I don't remember where I came across this, but I think @ is a
> round-to-multiple-of operator. I don't know if it rounds up or down, though.
> No idea on .raSearch.

@ seems to be an alignment operator. From a small test program I compiled:

STACK WIN 4 11c0 85 1c 0 0 8 110 0 1 $T1 .raSearch = $T0 $T1 4 - 8 @
= $ebp $T1 4 - ^ = $eip $T1 ^ = $esp $T1 4 + = $23 $T0 276 - ^ = $24
$T0 280 - ^ =

The prologue looks like:
00E211C0 55 push ebp
00E211C1 8B EC mov ebp,esp
00E211C3 83 E4 F8 and esp,0FFFFFFF8h
00E211C6 81 EC 10 01 00 00 sub esp,110h
00E211CC A1 00 30 E2 00 mov eax,dword ptr
[___security_cookie (0E23000h)]
00E211D1 33 C4 xor eax,esp
00E211D3 89 84 24 0C 01 00 00 mov dword ptr [esp+10Ch],eax
00E211DA 56 push esi
00E211DB 57 push edi

so that "and esp,0FFFFFFF8h" matches nicely with the "8 @" in there. I
have a patch that adds this to the stackwalker.

I've been doing a combination of disassembly and looking at the
program string to try to sort out .raSearch, but I'm not entirely
sure. It seems to just be "the start of the frame", but I'm not sure
of the precise value it wants.

-Ted

Jim Blandy

unread,
Feb 5, 2012, 4:46:22 PM2/5/12
to google-br...@googlegroups.com

One way to reverse-engineer this might be to take some crashes with stacks we unwound badly, and edit the .sym files until the unwind looks better. That's a quick edit-test cycle, and you'd figure out what the operators actually mean.

--
You received this message because you are subscribed to the Google Groups "google-breakpad-dev" group.
To post to this group, send email to google-br...@googlegroups.com.
To unsubscribe from this group, send email to google-breakpad...@googlegroups.com.

Ted Mielczarek

unread,
Feb 6, 2012, 7:07:35 AM2/6/12
to google-br...@googlegroups.com
On Sun, Feb 5, 2012 at 4:46 PM, Jim Blandy <ji...@red-bean.com> wrote:
> One way to reverse-engineer this might be to take some crashes with stacks
> we unwound badly, and edit the .sym files until the unwind looks better.
> That's a quick edit-test cycle, and you'd figure out what the operators
> actually mean.

I'm trying to do something similar, I have a small test program
(mentioned previously) compiled with VC++ 2010, whose unwind data
contains the @ operator and the .raSearch builtin. I generated a
minidump from the VS debugger, and have also been studying the
disassembly. The minidump, symbol files, and a dump of the assembly
for the function at the top of the stack are here, if you're
interested:
http://people.mozilla.com/~tmielczarek/unwindtest.tar.bz2

The test program also calls into DbgHelp's StackTrace64 function, so
my fallback plan was to possibly poke around inside DbgHelp and see if
I can find out where it's getting .raSearch from.

-Ted

Ted Mielczarek

unread,
Feb 13, 2012, 9:00:47 AM2/13/12
to google-br...@googlegroups.com
I put a WIP patch up at
https://bugzilla.mozilla.org/show_bug.cgi?id=726570. I'm pretty sure
it's not correct, but it's a starting point.

-Ted

Ted Mielczarek

unread,
Feb 14, 2012, 11:11:57 AM2/14/12
to google-br...@googlegroups.com
On Fri, Feb 3, 2012 at 4:21 PM, Benjamin Smedberg <benj...@smedbergs.us> wrote:
> We just switched Firefox to use MSVC2010 and we've started seeing much worse
> backtraces from the stack walker. I've done some investigating and
> discovered at least one problem: the symbol files produced from these
> binaries have stack walking programs which use a new builtin:
>
> STACK WIN 4 119e 17 0 0 0 4 0 0 1 $T0 .raSearch = $eip $T0 ^ = $esp $T0 4 +
> =  $23 $T0 4 - ^ =

Only tangentially related, but in the course of investigating this we
figured out what those $23 variables are: caller-save registers. I
filed an issue on possibly recovering them, but it's not useful for
stack unwinding:
http://code.google.com/p/google-breakpad/issues/detail?id=466

-Ted

Benjamin Smedberg

unread,
Feb 14, 2012, 12:02:51 PM2/14/12
to google-br...@googlegroups.com, Ted Mielczarek
After poking at several different frames in a MSVC10 optimized
stackprograms, I think that the way we are calculating .raSearch is
wrong in some important cases, and that .raSearchStart may also be
wrong, although I'm less sure about this.

The stackwalking frame that leads me to this conclusion is:

CrashReporter::CreatePairedMinidumps

STACK WIN 4 6c5e39 22d 26 0 14 c c4 0 1 $T1 .raSearch = $T0 $T1 4 - 8 @
= $ebp $T1 4 - ^ = $eip $T1 ^ = $esp $T1 4 + = $20 $T0 200 - ^ = $23
$T0 204 - ^ = $24 $T0 208 - ^ =

Translated:

$T1 = .raSearch
$T0 = ($T1 - 4) @ 8
$ebp = *($T1 - 4)
$eip = *$T1
$esp = $T1 + 4
... other registers relative to $T0

Incoming:

eip = 0x15c5fce
esp = 0x12c9c0
ebp = 0x12caa0

Outgoing should be:

eip = 0x015c7156
esp = 0x0012caa8
ebp = 0x0012cac8

The function prolog is:

bool
CreatePairedMinidumps(ProcessHandle childPid,
ThreadId childBlamedThread,
nsAString* pairGUID,
nsILocalFile** childDump,
nsILocalFile** parentDump)
{
015C5E39 push ebp
015C5E3A mov ebp,esp
015C5E3C and esp,0FFFFFFF8h
015C5E3F sub esp,0C4h

Therefore .raSearch should be 0x0012CAA4. .raSearchStart is currently
0x0012CAA0. The difference is definitely caused by the doubleword stack
alignment (and esp,0FFFFFFF8h). The @ operator is the
align-to-lower operator which performs the same function in a
stackwalking program.

I believe that the "correct" way to set .raSearch in this case is to
check whether the function uses a frame pointer
(IDiaFrameData->get_allocatesBasePointer), then .raSearch should be EBP
- 4. If there is no frame pointer, then the existing math (ESP +
callee-parameter-size + locals-size + saved-register-size) is correct.

However, the current breakpad symbol file format does not save this
information, because it was apparently assumed that it doesn't matter
whether a function has a frame pointer if that function also has a
stackwalking program.

I am currently working on the following plan: add a new line type to
symbol files STACK WIN2 which saves the "allocates frame pointer"
information in all cases. Make the new processor accept the old STACK
WIN line assuming that the function does not allocate a base pointer.
This way caches of existing symbol files will not need to be
regenerated. If appropriate, I may try to add some actual search
semantics to .raSearch or .raSearchStart so that the old format
generates better results (it would search only up to 3 words "up" the
stack to check for doubleword and quadword alignment).

--BDS

Jim Blandy

unread,
Feb 14, 2012, 1:30:21 PM2/14/12
to google-br...@googlegroups.com
This is great fun. Reading:

> $T1 = .raSearch
> $T0 = ($T1 - 4) @ 8
> $ebp = *($T1 - 4)
> $eip = *$T1
> $esp = $T1 + 4
> ... other registers relative to $T0

I was thinking, "This makes no sense at all --- how can you have some
things relative to an unaligned address, and others relative to an
aligned address? What, do they align the stack pointer and then save
stuff?"

And indeed:

> 015C5E3C  and         esp,0FFFFFFF8h

Jim Blandy

unread,
Feb 14, 2012, 1:42:45 PM2/14/12
to google-br...@googlegroups.com
On Tue, Feb 14, 2012 at 9:02 AM, Benjamin Smedberg
<benj...@smedbergs.us> wrote:
> I believe that the "correct" way to set .raSearch in this case is to check
> whether the function uses a frame pointer
> (IDiaFrameData->get_allocatesBasePointer), then .raSearch should be EBP - 4.

Do you mean, EBP+4? It's pretty clear .raSearch is supposed to point
at the saved return address, which is after the saved frame pointer,
which is where ebp is pointing.

> However, the current breakpad symbol file format does not save this
> information, because it was apparently assumed that it doesn't matter
> whether a function has a frame pointer if that function also has a
> stackwalking program.
>
> I am currently working on the following plan: add a new line type to symbol
> files STACK WIN2 which saves the "allocates frame pointer" information in
> all cases. Make the new processor accept the old STACK WIN line assuming
> that the function does not allocate a base pointer. This way caches of
> existing symbol files will not need to be regenerated. If appropriate, I may
> try to add some actual search semantics to .raSearch or .raSearchStart so
> that the old format generates better results (it would search only up to 3
> words "up" the stack to check for doubleword and quadword alignment).

Are you sure that the "allocates frame pointer" data is even valid
when there is a program string? This page suggests that it isn't:

http://msdn.microsoft.com/en-us/library/ayc1cax3.aspx

Benjamin Smedberg

unread,
Feb 14, 2012, 2:07:55 PM2/14/12
to google-br...@googlegroups.com, Jim Blandy
On 2/14/2012 1:42 PM, Jim Blandy wrote:
On Tue, Feb 14, 2012 at 9:02 AM, Benjamin Smedberg
<benj...@smedbergs.us> wrote:
I believe that the "correct" way to set .raSearch in this case is to check
whether the function uses a frame pointer
(IDiaFrameData->get_allocatesBasePointer), then .raSearch should be EBP - 4.
Do you mean, EBP+4?  It's pretty clear .raSearch is supposed to point
at the saved return address, which is after the saved frame pointer,
which is where ebp is pointing.
Yeah, I meant EBP + 4.

But I've since tried this heuristic on a bunch of frames, and it turns out that the "allocates frame pointer" flag is either garbage or incorrect too much to make my solution work. For instance this frame, which we discussed yesterday on IRC, says that it allocates a base pointer even though it doesn't ever modify EBP:

--- e:\builds\moz2_slave\m-cen-w32-ntly\build\toolkit\crashreporter\google-breakpad\src\client\windows\handler\exception_handler.cc

STACK WIN 4 48375d 84 6 0 c 0 0 0 1 $T0 .raSearch = $eip $T0 ^ = $esp $T0 4 + =
range is 0x138375d - 0x13837e1


bool ExceptionHandler::WriteMinidumpOnHandlerThread(
    EXCEPTION_POINTERS* exinfo, MDRawAssertionInfo* assertion) {
0138375D  push        ebx

STACK WIN 4 48375e 29 5 0 c 4 0 0 1 $T0 .raSearch = $eip $T0 ^ = $esp $T0 4 + =  $20 $T0 4 - ^ =
range is 0x138375e - 0x1383787

 
0138375E  push        esi 

STACK WIN 4 48375f 25 4 0 c 8 0 0 1 $T0 .raSearch = $eip $T0 ^ = $esp $T0 4 + =  $20 $T0 4 - ^ =  $23 $T0 8 - ^ =
range is 0x138375f - 0x1383784


0138375F  mov         esi,dword ptr [esp+0Ch] 
01383763  push        edi 
  EnterCriticalSection(&handler_critical_section_);

STACK WIN 4 483764 1f 0 0 c c 0 0 1 $T0 .raSearch = $eip $T0 ^ = $esp $T0 4 + =  $20 $T0 4 - ^ =  $23 $T0 8 - ^ =  $24 $T0 12 - ^ =
range is 0x1383764 - 0x1383783

 
01383764  lea         edi,[esi+9Ch]
0138376A  push        edi 
0138376B  call        dword ptr ds:[5CDAC27Ch] 

  // There isn't much we can do if the handler thread
  // was not successfully created.
  if (handler_thread_ == NULL) {
01383771  xor         ebx,ebx 
01383773  cmp         dword ptr [esi+94h],ebx 
01383779  jne         google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread+2Dh (138378Ah) 
    LeaveCriticalSection(&handler_critical_section_);
0138377B  push        edi 
0138377C  call        dword ptr ds:[5CDAC280h] 
01383782  pop         edi 
01383783  pop         esi 
    return false;
01383784  mov         al,bl 
01383786  pop         ebx 

  LeaveCriticalSection(&handler_critical_section_);

  return status;
}
01383787  ret         0Ch 
  }

  // The handler thread should only be created when the semaphores are valid.
  assert(handler_start_semaphore_ != NULL);
  assert(handler_finish_semaphore_ != NULL);

  // Set up data to be passed in to the handler thread.
  requesting_thread_id_ = GetCurrentThreadId();
0138378A  call        dword ptr ds:[5CDAC26Ch] 
01383790  mov         dword ptr [esi+0BCh],eax 
  exception_info_ = exinfo;
01383796  mov         eax,dword ptr [esp+14h] 
  assertion_ = assertion;

  // This causes the handler thread to call WriteMinidumpWithException.
  ReleaseSemaphore(handler_start_semaphore_, 1, NULL);
0138379A  push        ebx 
0138379B  push        1 
0138379D  push        dword ptr [esi+0B4h] 
013837A3  mov         dword ptr [esi+0C0h],eax 
013837A9  mov         eax,dword ptr [esp+24h] 
013837AD  mov         dword ptr [esi+0C4h],eax 
013837B3  call        dword ptr ds:[5CDAC244h] 

  // Wait until WriteMinidumpWithException is done and collect its return value.
  WaitForSingleObject(handler_finish_semaphore_, INFINITE);
013837B9  push        0FFFFFFFFh 
013837BB  push        dword ptr [esi+0B8h] 
013837C1  call        dword ptr ds:[5CDAC294h] 
  bool status = handler_return_value_;

  // Clean up.
  requesting_thread_id_ = 0;
013837C7  mov         dword ptr [esi+0BCh],ebx 
  exception_info_ = NULL;
013837CD  mov         dword ptr [esi+0C0h],ebx 
  assertion_ = NULL;
013837D3  mov         dword ptr [esi+0C4h],ebx 
013837D9  mov         bl,byte ptr [esi+0C8h] 
013837DF  jmp         google_breakpad::ExceptionHandler::WriteMinidumpOnHandlerThread+1Eh (138377Bh)


This set of debug info, without some other .raSearch magic I don't understand, indicates that the compiler is not generating enough debug information for this frame, because everything from 0x0138378A to 0x013837DF should be covered by the last frame walker program, not the first, and in particular the "saved register size" is incorrect for this block, which means that we miscalculate .raSearch by 3 words. This seems like clearly a compiler bug, but MSVC seems to be able to walk this stack without incident, which means it's searching very well or has information that isn't exposed to us via DIA.

To fix just the alignment issue, I could just have us do a small ScanForReturnAddress search (1-3 words). I'm still trying to come up with a good way to determine which solution gives us the best overall results across our population of crash reports.

--BDS

Jim Blandy

unread,
Feb 14, 2012, 2:23:54 PM2/14/12
to google-br...@googlegroups.com
On Tue, Feb 14, 2012 at 11:07 AM, Benjamin Smedberg
<benj...@smedbergs.us> wrote:
> But I've since tried this heuristic on a bunch of frames, and it turns out
> that the "allocates frame pointer" flag is either garbage or incorrect too
> much to make my solution work.

Right, the MSDN page I linked to in my last message says that
"allocates frame pointer" is garbage if there's a program.

> To fix just the alignment issue, I could just have us do a small
> ScanForReturnAddress search (1-3 words). I'm still trying to come up with a
> good way to determine which solution gives us the best overall results
> across our population of crash reports.

What if our initial value for .raSearch was the result of a return
address scan for max three words starting at our current .raSearch
value? It seems like that would let us skip over alignments to
eight-byte boundaries: zero or one alignment words; saved frame
pointer; return address.

It irritates me that MSVC has the ability to say exactly what's going
on, but evidently doesn't... :(

Reply all
Reply to author
Forward
0 new messages