vtable corruption on mac when forgetting to declare a virtual destructor?

132 views
Skip to first unread message

Munjal Doshi

unread,
May 18, 2012, 4:03:36 PM5/18/12
to Chromium-dev
I spent three days debugging a test failure that only happened on mac in what-appeared-to-be platform independent code. It seems like the problem is that C++ compiler on mac is producing buggy vtable code in absence of a virtual destructor. But it is incredulous so I wanted to check what other think.

Here is what happened - it is a bit of a long story.


Note that I forgot to define a virtual destructor for that class.

It mocks two methods in WebAuthFlow::Delegate

All tests use the MockDelegate. See the first test, TEST_F(WebAuthFlowTest, SilentRedirectToChromiumAppUrl) {

The test failed with output that indicated that the destructor for MockDelegate was called twice. After debugging indirectly by using LOG statements and mac try bots, I narrowed it down to the following line in web_auth_flow.cc
delegate_->OnAuthFlowSuccess(result.spec());

And that line was calling the delegate destructor instead of OnAuthFlowSuccess method!

Caveat is that I didn't step into the code (I don't have a mac) but rather used LOG statements to find out.

After spending a couple of days trying many different things including using a TestDelegate class instead of a mock (which solved hte issue), I was pulling my hair out figuring out what is happening. Finally, today morning I noticed that I am missing a virtual destructor  As soon as I added that, the problem went away.

It also seems like the problem manifests itself only with a combination of missing virtual destructor and use of mocks; and only on mac.

Am I understanding this right? Is anyone aware of similar issues?

-Munjal

Ben Smith

unread,
May 18, 2012, 4:27:21 PM5/18/12
to chromi...@chromium.org
(from correct account)

Your MockDelegate is not in an anonymous namespace -- it seems that 
this MockDelegate isn't either. Maybe it's a ODR violation?

Jochen Eisinger

unread,
May 18, 2012, 4:43:46 PM5/18/12
to bi...@chromium.org, Nico Weber, chromi...@chromium.org
Not having a virtual destructor in a virtual class is a violation of the style guide anyway. I remember that there was some discussion about whether or not to have clang warn about this, but I forgot what the reason was we decided against?

-jochen

--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Munjal Doshi

unread,
May 18, 2012, 4:47:45 PM5/18/12
to joc...@chromium.org, bi...@chromium.org, Nico Weber, chromi...@chromium.org
Yes, I understand it was a mistake. But was taken by surprise by the way it manifested itself.

-Munjal

Fred Akalin

unread,
May 18, 2012, 4:49:42 PM5/18/12
to mun...@chromium.org, joc...@chromium.org, bi...@chromium.org, Nico Weber, chromi...@chromium.org
Once undefined behavior is invoked (via the ODR violation) all bets are off.

Drew Wilson

unread,
May 18, 2012, 4:49:52 PM5/18/12
to bi...@chromium.org, chromi...@chromium.org
BTW, I had a similar hair-raising issue with destructors and duplicate non-namespaced test class names a few weeks ago. I'm guessing that's exactly what this is.

Munjal Doshi

unread,
May 18, 2012, 4:55:40 PM5/18/12
to atwi...@chromium.org, bi...@chromium.org, chromi...@chromium.org
Right, ODR violation seems bad of course. I fixed that now.

But out of curiosity, shouldn't the compiler warn/error on that explicitly? And even for missing virtual destructors?

-Munjal

Victor Khimenko

unread,
May 18, 2012, 5:03:40 PM5/18/12
to mun...@chromium.org, atwi...@chromium.org, bi...@chromium.org, chromi...@chromium.org
On Sat, May 19, 2012 at 12:55 AM, Munjal Doshi <mun...@chromium.org> wrote:
Right, ODR violation seems bad of course. I fixed that now.

But out of curiosity, shouldn't the compiler warn/error on that explicitly?

It can't.
 
And even for missing virtual destructors?

This is not an error.

The problem here is not missing virtual destructor at all. This is just a red herring.

The problem is class with virtual functions and without even a single non-inline virtual function.

Usually compiler creates a vtable just once: in a compilation module where first non-inline virtual function is implemented (this may lead to funny linking errors but works fine in any correct program).

But if class does not have such an "anchor" function then compiler does the next best thing: it creates bazillion vtables (one per a compilation unit) and instructs linker that it can pick any one of them.

Combine this behavior with ODR violation - and you have a recipe for disaster.

Nico Weber

unread,
May 18, 2012, 5:09:19 PM5/18/12
to mun...@chromium.org, atwi...@chromium.org, bi...@chromium.org, chromi...@chromium.org
On Fri, May 18, 2012 at 1:55 PM, Munjal Doshi <mun...@chromium.org> wrote:
Right, ODR violation seems bad of course. I fixed that now.

But out of curiosity, shouldn't the compiler warn/error on that explicitly? And even for missing virtual destructors?

clang warns on missing virtual destructors if a polymorphic delete is used. I.e. clang will will warn on

class A {
  virtual void f() {}
};

void f() {
  A* a = FunctionReturningA();
  delete a;
}

test.cc:9:3: warning: delete called on 'A' that has virtual functions but non-virtual destructor [-Wdelete-non-virtual-dtor]
  delete a;
  ^
1 warning generated.


clang won't warn if A is never deleted, i.e. it won't warn on:

void f() {
  A a;  // Not an issue.
}


The compiler can't detect this ODR violation because it's happening at link time, in different translation units. Some linkers can detect them. The way linking usually works is:

1. The linker loads all object files on the link line and builds a list of missing symbols.
2. While that list is not empty:
* Pick one missing symbol
* Using the index at the front of every static library on the link line,
  determine if any static library contains that symbol
* Load the first object file from the first static library containing that symbol
* Update the list with newly defined and newly missing symbols from that object file

So if by bad luck the other Mock class is loaded first, ld might never realize that some other .o file defines that class too. OS X's ld can detect some more ODR violations if you pass -all_load, which causes it to load all .o files in all static libraries on the link line.

Nico

Munjal Doshi

unread,
May 18, 2012, 5:26:10 PM5/18/12
to Nico Weber, atwi...@chromium.org, bi...@chromium.org, chromi...@chromium.org
Thanks for the detailed explanation. It makes sense that the main issue was ODR violation and not missing virtual destructor.

But adding back missing virtual destructor fixed it (without fixing ODR violation). So I thought that was the main issue.

-Munjal

Evan Martin

unread,
May 18, 2012, 5:47:15 PM5/18/12
to tha...@chromium.org, mun...@chromium.org, atwi...@chromium.org, bi...@chromium.org, chromi...@chromium.org
On Fri, May 18, 2012 at 2:09 PM, Nico Weber <tha...@chromium.org> wrote:
> The compiler can't detect this ODR violation because it's happening at link
> time, in different translation units. Some linkers can detect them. The way
> linking usually works is:

FWIW, the gold linker we use on Linux is one such linker.
IIRC I experimented with turning on gold's ODR violation detector for
Chrome and found it slowed links by ~5 seconds, which is super
annoying when you're in an edit-compile cycle. It didn't seem worth
it given that our existing code had no (!) ODR violations.

It might make sense to turn on for release mode or something like that
except that that could lead to changes working locally and failing on
the bots, which is also super annoying.
Reply all
Reply to author
Forward
0 new messages