Abort during delete

James Kuyper

unread,

Jan 21, 2018, 1:39:28 PM1/21/18

to

I know a lot about C++98 in theory, and a little bit about more recent
versions of the language, but have relatively little practical
experience with it: For several years now, one of my responsibilities
has been the maintenance of several C++ programs linked to a set of C++
libraries, poorly designed and no documentation. They're written to
C++2003. I've never had a problem working with this code due to my lack
of C++ experience - all of my problems have been due to the poor design
and the lack of documentation.
However, I've run into a problem that is new to me, and I wasn't sure
how to investigate further. One of these programs aborts under certain
circumstances during an attempt to execute a delete expression.

I've checked the obvious: The argument of the delete is the name of a
pointer object that was initialized by a corresponding new-expression.
Execution of the code can't reach the delete-expression without first
passing through the new expression, and it only gets deleted once. That
pointer's non-null value was not changed between the new and delete
expressions. The pointer's value is passed to one subroutine, which
doesn't delete it, either. The object being allocated is an integer
array, so the implicit destructor should be a nop, with no failure modes
of it's own. What else can I check?

By analogy with C, which I'm more familiar with, I'd expect a delete
expression to have some relevant similarities to free() in C: the
new-expression needs to store information somewhere about what it
allocated, so the corresponding delete-expression can deallocate it
correctly. For some kinds of code that have undefined behavior, that
undefined behavior might take the form of corrupting the allocation
information - particularly if it's code that writes past the end (or
before the beginning) of other memory also allocated using 'new'. Such
corruption could cause deallocation failure when deleting a completely
unrelated allocation, even if there's nothing wrong with any of the code
involving that other allocation.
Is this in fact the case? Are there any other things that might cause a
delete-expression to fail?

Melzzzzz

unread,

Jan 21, 2018, 1:41:53 PM1/21/18

to

On 2018-01-21, James Kuyper <james...@verizon.net> wrote:
> I know a lot about C++98 in theory, and a little bit about more recent
> versions of the language, but have relatively little practical
> experience with it: For several years now, one of my responsibilities
> has been the maintenance of several C++ programs linked to a set of C++
> libraries, poorly designed and no documentation. They're written to
> C++2003. I've never had a problem working with this code due to my lack
> of C++ experience - all of my problems have been due to the poor design
> and the lack of documentation.
> However, I've run into a problem that is new to me, and I wasn't sure
> how to investigate further. One of these programs aborts under certain
> circumstances during an attempt to execute a delete expression.
>
> I've checked the obvious: The argument of the delete is the name of a
> pointer object that was initialized by a corresponding new-expression.
> Execution of the code can't reach the delete-expression without first
> passing through the new expression, and it only gets deleted once. That
> pointer's non-null value was not changed between the new and delete
> expressions. The pointer's value is passed to one subroutine, which
> doesn't delete it, either. The object being allocated is an integer
> array, so the implicit destructor should be a nop, with no failure modes
> of it's own. What else can I check?

Care to show that new/delete in question?

--
press any key to continue or any other to quit...

Ian Collins

unread,

Jan 21, 2018, 2:07:52 PM1/21/18

to

On 01/22/2018 07:39 AM, James Kuyper wrote:
> I know a lot about C++98 in theory, and a little bit about more recent
> versions of the language, but have relatively little practical
> experience with it: For several years now, one of my responsibilities
> has been the maintenance of several C++ programs linked to a set of C++
> libraries, poorly designed and no documentation. They're written to
> C++2003. I've never had a problem working with this code due to my lack
> of C++ experience - all of my problems have been due to the poor design
> and the lack of documentation.
> However, I've run into a problem that is new to me, and I wasn't sure
> how to investigate further. One of these programs aborts under certain
> circumstances during an attempt to execute a delete expression.

The key is probably how it aborts.

Is there an unhandled exception?

Is there a memory access violation?

Can you trap it in a debugger?

> I've checked the obvious: The argument of the delete is the name of a
> pointer object that was initialized by a corresponding new-expression.
> Execution of the code can't reach the delete-expression without first
> passing through the new expression, and it only gets deleted once. That
> pointer's non-null value was not changed between the new and delete
> expressions. The pointer's value is passed to one subroutine, which
> doesn't delete it, either. The object being allocated is an integer
> array, so the implicit destructor should be a nop, with no failure modes
> of it's own. What else can I check?

Could it be the old favourite something else corrupting the heap?
Have/can you try running the code under something like valgrind?

> By analogy with C, which I'm more familiar with, I'd expect a delete
> expression to have some relevant similarities to free() in C: the
> new-expression needs to store information somewhere about what it
> allocated, so the corresponding delete-expression can deallocate it
> correctly. For some kinds of code that have undefined behavior, that
> undefined behavior might take the form of corrupting the allocation
> information - particularly if it's code that writes past the end (or
> before the beginning) of other memory also allocated using 'new'. Such
> corruption could cause deallocation failure when deleting a completely
> unrelated allocation, even if there's nothing wrong with any of the code
> involving that other allocation.
> Is this in fact the case? Are there any other things that might cause a
> delete-expression to fail?

Not for an array of int.

--
Ian.

Gareth Owen

unread,

Jan 21, 2018, 2:30:23 PM1/21/18

to

James Kuyper <james...@verizon.net> writes:

Did you check the

foo = new int[];
was matched by a

delete[] foo;

rather than a

delete foo;

???

> By analogy with C, which I'm more familiar with, I'd expect a delete
> expression to have some relevant similarities to free() in C: the
> new-expression needs to store information somewhere about what it
> allocated, so the corresponding delete-expression can deallocate it
> correctly. For some kinds of code that have undefined behavior, that
> undefined behavior might take the form of corrupting the allocation
> information - particularly if it's code that writes past the end (or
> before the beginning) of other memory also allocated using 'new'. Such
> corruption could cause deallocation failure when deleting a completely
> unrelated allocation, even if there's nothing wrong with any of the code
> involving that other allocation.
> Is this in fact the case?

Yes. 100% yes. Seen it more times than I care to remember
(self-inflicted and otherwise)

> Are there any other things that might cause a delete-expression to
> fail?

Assuredly, but I'd check the two above first.

James Kuyper

unread,

Jan 21, 2018, 2:32:22 PM1/21/18

to

On 01/21/2018 02:24 PM, Stefan Ram wrote:

> James Kuyper <james...@verizon.net> writes:
>> doesn't delete it, either. The object being allocated is an integer
>> array, so the implicit destructor should be a nop, with no failure modes
>> of it's own. What else can I check?
>

> I assume you know that there is a special delete operator
> "delete[]" for arrays?

Yes. In fact, my group was responsible (though I wasn't involved in the
process at that time) for adding [] where appropriate - the original
code lacked it.

Öö Tiib

unread,

Jan 21, 2018, 2:49:05 PM1/21/18

to

On Sunday, 21 January 2018 20:39:28 UTC+2, James Kuyper wrote:
>
> By analogy with C, which I'm more familiar with, I'd expect a delete
> expression to have some relevant similarities to free() in C: the
> new-expression needs to store information somewhere about what it
> allocated, so the corresponding delete-expression can deallocate it
> correctly. For some kinds of code that have undefined behavior, that
> undefined behavior might take the form of corrupting the allocation
> information - particularly if it's code that writes past the end (or
> before the beginning) of other memory also allocated using 'new'. Such
> corruption could cause deallocation failure when deleting a completely
> unrelated allocation, even if there's nothing wrong with any of the code
> involving that other allocation.

Corrupted memory management information is one of the most probable
reasons of program crashing during delete. Other is delete twice.
Less probable reason is attempt to delete something that wasn't
allocated using new. There are also sometimes mismatch like using new[]
with delete or (new with delete[]) but it does not usually result with
crash.

> Is this in fact the case? Are there any other things that might cause a
> delete-expression to fail?

May be. I would just debug it with some special tool for example with
memory sanitizer turned on.

Paavo Helde

unread,

Jan 21, 2018, 3:00:19 PM1/21/18

to

On 21.01.2018 20:39, James Kuyper wrote:
> I know a lot about C++98 in theory, and a little bit about more recent
> versions of the language, but have relatively little practical
> experience with it: For several years now, one of my responsibilities
> has been the maintenance of several C++ programs linked to a set of C++
> libraries, poorly designed and no documentation. They're written to
> C++2003. I've never had a problem working with this code due to my lack
> of C++ experience - all of my problems have been due to the poor design
> and the lack of documentation.
> However, I've run into a problem that is new to me, and I wasn't sure
> how to investigate further. One of these programs aborts under certain
> circumstances during an attempt to execute a delete expression.
>
> I've checked the obvious: The argument of the delete is the name of a
> pointer object that was initialized by a corresponding new-expression.
> Execution of the code can't reach the delete-expression without first
> passing through the new expression, and it only gets deleted once. That
> pointer's non-null value was not changed between the new and delete
> expressions. The pointer's value is passed to one subroutine, which
> doesn't delete it, either. The object being allocated is an integer
> array, so the implicit destructor should be a nop, with no failure modes
> of it's own. What else can I check?

Looks like memory corruption. If they do things like 'arr = new int[N]'
they might also do things like 'for (int i=0; i<=N; ++i) arr[i] = 42;'.
Note that this needs to be done not to the 'arr', but to the allocation
before arr in memory so that the memory manager control block for arr
gets corrupted.

The correct C++ way is of course to use:

std::vector<int> arr(N, 42);

or

std::vector<int> arr(N);
std::fill(arr.begin(), arr.end(), 42)

or

std::vector<int> arr(N);
for (auto& ref: arr) {
ref = 42;
}

which all are much better in avoiding buffer overrun errors.

hth
Paavo

James Kuyper

unread,

Jan 21, 2018, 3:28:10 PM1/21/18

to

On 01/21/2018 02:07 PM, Ian Collins wrote:
> On 01/22/2018 07:39 AM, James Kuyper wrote:
>> I know a lot about C++98 in theory, and a little bit about more recent
>> versions of the language, but have relatively little practical
>> experience with it: For several years now, one of my responsibilities
>> has been the maintenance of several C++ programs linked to a set of C++
>> libraries, poorly designed and no documentation. They're written to
>> C++2003. I've never had a problem working with this code due to my lack
>> of C++ experience - all of my problems have been due to the poor design
>> and the lack of documentation.
>> However, I've run into a problem that is new to me, and I wasn't sure
>> how to investigate further. One of these programs aborts under certain
>> circumstances during an attempt to execute a delete expression.
>
> The key is probably how it aborts.

It exits with a unix status of 34304=0x8600, indicating abnormal
termination by signal number of 6, which is SIGABRT. When I said
"abort", I was being very specific.

> Is there an unhandled exception?
>
> Is there a memory access violation?

There's no indication of either possibility.

> Can you trap it in a debugger?

I'm just starting to investigate this problem, and am learning things
about the program that I hadn't previously needed to know since becoming
responsible for it. It's run from a complicated script that sets up it's
running environment, and I haven't yet figured out how to set up the
environment so I can run it on it's own. I may just create a modified
version of the script to run it inside the debugger.

>> I've checked the obvious: The argument of the delete is the name of a
>> pointer object that was initialized by a corresponding new-expression.
>> Execution of the code can't reach the delete-expression without first
>> passing through the new expression, and it only gets deleted once. That
>> pointer's non-null value was not changed between the new and delete
>> expressions. The pointer's value is passed to one subroutine, which
>> doesn't delete it, either. The object being allocated is an integer
>> array, so the implicit destructor should be a nop, with no failure modes
>> of it's own. What else can I check?
>
> Could it be the old favourite something else corrupting the heap?

That's what I'm assuming, by analogy with C, even though I know that
new/delete don't have any necessary connection with malloc() and free().
That's what I was implying in more generic form when talking about
writing past the end or before the beginning of a block of allocated
memory. I'm not looking forward to another heap corruption hunt through
unfamiliar code.

> Have/can you try running the code under something like valgrind?

I may have to - but my prior experience with valgrind suggests that it
produces too much information to make it easy to find the particular
issue that's actually causing a problem, particularly on poorly written
code like this.

James Kuyper

unread,

Jan 21, 2018, 3:36:27 PM1/21/18

to

You know that, I know that, but the authors of this code were apparently
more C-bound than you or me. The part of the code i'm looking at has no
containers, no iterators, no algorithms. Just lots of C-style arrays,
pointers, and arrays. Other parts of the code make heavy use of
C++-specific language features, which is the kind of thing you get when
code has multiple authors.

I've thought of "fixing" this problem by doing such a re-write, but it
would take a lot of time, and I was hoping that a more focused approach
to debugging it would be possible.

James Kuyper

unread,

Jan 21, 2018, 4:00:22 PM1/21/18

to

On 01/21/2018 03:36 PM, James Kuyper wrote:
...

> You know that, I know that, but the authors of this code were apparently
> more C-bound than you or me. The part of the code i'm looking at has no
> containers, no iterators, no algorithms. Just lots of C-style arrays,
> pointers, and arrays.

That was supposed to read "and for loops".

Christian Gollwitzer

unread,

Jan 21, 2018, 5:45:59 PM1/21/18

to

Am 21.01.18 um 21:27 schrieb James Kuyper:

>> Have/can you try running the code under something like valgrind?
>
> I may have to - but my prior experience with valgrind suggests that it
> produces too much information to make it easy to find the particular
> issue that's actually causing a problem, particularly on poorly written
> code like this.

I can only second this suggestion. Valgrind has a very low false
positive rate. If it signals an error, usually you should fix it.
"Invalid read" means read from an uninitialized memory, which can be
harmless, but "invalid write" always means there is an error. It is
totally worth to fix all of them, even if it appears that there are no
problems.

Another tool, in case you build with clang, is -fsanitize=address

Christian

>

Jerry Stuckle

unread,

Jan 21, 2018, 6:03:39 PM1/21/18

to

Memory can be reused. For instance, it is also possible that some other
code allocated memory and got a block at "X" back. Later, when it
deletes the memory, "X" is now available for reuse.

Now your code allocates a new object and gets it at "X". No problem -
except maybe the pointers to that old object are still around someone
else tries to use or even delete that old object.

Problems like this are hard to find and almost always require a
debugger. OTOH, they occur more often than many programmers realize.

--
==================
Remove the "x" from my email address
Jerry Stuckle
jstu...@attglobal.net
==================

Manfred

unread,

Jan 21, 2018, 6:58:38 PM1/21/18

to

On 01/21/2018 07:39 PM, James Kuyper wrote:
> I know a lot about C++98 in theory, and a little bit about more recent
> versions of the language, but have relatively little practical
> experience with it: For several years now, one of my responsibilities
> has been the maintenance of several C++ programs linked to a set of C++
> libraries, poorly designed and no documentation. They're written to
> C++2003. I've never had a problem working with this code due to my lack
> of C++ experience - all of my problems have been due to the poor design
> and the lack of documentation.
> However, I've run into a problem that is new to me, and I wasn't sure
> how to investigate further. One of these programs aborts under certain
> circumstances during an attempt to execute a delete expression.
>

I don't know if the following may apply, however:
The C++ runtime that implements new[] and delete[] must be the same - I
had this problem with libraries compiled with one version of the runtime
linked to a program using a different version.

Andrea Venturoli

unread,

Jan 22, 2018, 2:36:21 AM1/22/18

to

On 01/21/18 21:27, James Kuyper wrote:

> I may have to - but my prior experience with valgrind suggests that it
> produces too much information to make it easy to find the particular
> issue that's actually causing a problem, particularly on poorly written
> code like this.

I strongly disagree here.
While I'm not questioning your experience, mine was the opposite: every
time valgrind has brought something up, it was something that was worth
fixing.

Paavo Helde

unread,

Jan 22, 2018, 4:07:50 AM1/22/18

to

On 21.01.2018 22:36, James Kuyper wrote:

> The part of the code i'm looking at has no
> containers, no iterators, no algorithms. Just lots of C-style arrays,
> pointers, and arrays.
>

> I've thought of "fixing" this problem by doing such a re-write, but it
> would take a lot of time, and I was hoping that a more focused approach
> to debugging it would be possible.

If the bug is well reproducible, then it should be easy to find. If it
is always the same deallocation aborting, then first figure out where it
is allocated. Then put a breakpoint there, when reached it, put a data
breakpoint 4 or 8 bytes *before* the array, where the memory allocator
control block resides. Now continue the run and voila, the data
breakpoint ought to be triggered exactly by the culprit buffer overrun code.

If your program is multi-threaded then it becomes more tricky as the
memory allocations will become much more randomly placed and the
location of crash/abort will change all the time.

hth
Paavo

Robert Wessel

unread,

Jan 22, 2018, 4:58:37 AM1/22/18

to

I'm going to have to agree with James here. The first time you use
something like Valgrind or Purify on a big piece of existing code
that's never gotten that treatment, is a lot like the first time a
decade old, million line project, gets linted.

You really don't want to be around for that if you can help it.
Besides, getting management to buy into months or years of effort to
eliminate the lint or Valgrind warnings that are "obviously" not
causing any problems in the field can be a challenge.

Scott Lurndal

unread,

Jan 22, 2018, 10:22:21 AM1/22/18

to

James Kuyper <james...@verizon.net> writes:
>On 01/21/2018 02:07 PM, Ian Collins wrote:

>> The key is probably how it aborts.
>
>It exits with a unix status of 34304=0x8600, indicating abnormal
>termination by signal number of 6, which is SIGABRT. When I said
>"abort", I was being very specific.

SIGABRT is never generated by the OS, but rather generated by
the abort(3) and/or raise(3) library functions. In other words,
the application (or one of its libraries) are causing the abort
(perhaps via an assertion failure).

James R. Kuyper

unread,

Jan 23, 2018, 9:29:03 AM1/23/18

to

On 01/21/2018 01:41 PM, Melzzzzz wrote:
> On 2018-01-21, James Kuyper <james...@verizon.net> wrote:

...

>> I've checked the obvious: The argument of the delete is the name of a
>> pointer object that was initialized by a corresponding new-expression.
>> Execution of the code can't reach the delete-expression without first
>> passing through the new expression, and it only gets deleted once. That
>> pointer's non-null value was not changed between the new and delete
>> expressions. The pointer's value is passed to one subroutine, which
>> doesn't delete it, either. The object being allocated is an integer
>> array, so the implicit destructor should be a nop, with no failure modes
>> of it's own. What else can I check?
>
> Care to show that new/delete in question?

Sorry - yours was the very first response, and I just breezed past it
without making a response. To be fair, I didn't see how the answer would
be particularly useful, but here it is:

// Allocate continuous 2-dim engdata array
uint8_t **engdata = new uint8_t *[maxsc];
engdata[0] = new uint8_t[9318*maxsc];
for (int i=1; i<maxsc; i++) engdata[i] = engdata[i-1] + 9318;
...

delete[] engdata[0];
delete[] engdata;

Note, in particular, the use of the "magic number" 9318. Lovely code.

James R. Kuyper

unread,

Jan 23, 2018, 9:33:32 AM1/23/18

to

I eventually learned that the abort() was the result of an assert() in a
free() wrapper triggered when it detected memory corruption. I hadn't
known that the program was linked to such a utility, and wouldn't have
been sure that delete would be implemented in terms of free(). By the
time I found that out, I'd already located the corrupting statement.

James R. Kuyper

unread,

Jan 23, 2018, 9:43:20 AM1/23/18

to

On 01/22/2018 04:07 AM, Paavo Helde wrote:
> On 21.01.2018 22:36, James Kuyper wrote:
>
>> The part of the code i'm looking at has no
>> containers, no iterators, no algorithms. Just lots of C-style arrays,
>> pointers, and arrays.
>>
>> I've thought of "fixing" this problem by doing such a re-write, but it
>> would take a lot of time, and I was hoping that a more focused approach
>> to debugging it would be possible.
>
> If the bug is well reproducible, then it should be easy to find. If it
> is always the same deallocation aborting, then first figure out where it
> is allocated. Then put a breakpoint there, when reached it, put a data
> breakpoint 4 or 8 bytes *before* the array, where the memory allocator
> control block resides. Now continue the run and voila, the data
> breakpoint ought to be triggered exactly by the culprit buffer overrun code.

I've used precisely that approach before with C code, but somehow I
didn't think of doing it when working with C++ code. It worked
perfectly. A different array with room for 513 packets was dynamically
allocated prior to the one whose deletion aborted. 513 is the absolute
maximum number of packets that could validly be collected during one
scan of data (and it's used as a "magic number" in several locations in
the code, with no corresponding named macro). The code added packets to
the array without bothering to test for the end of the array. Due to an
error in a different piece of code, there were actually 959 packets in
one particular scan, and the code just kept merrily adding them to it.
I'm amazed that execution of the code ever successfully reached the
delete statement, I would have expected something else to go
catastrophically wrong first.

Paavo Helde

unread,

Jan 23, 2018, 10:07:45 AM1/23/18

to

Glad to hear you found the bug!

C++ memory management is often just a thin wrapper around
malloc()/free() and even if it isn't there is a good chance that there
is still some kind of a control block before the allocated array which
can get corrupted, so that this debugging technique still works.

Cheers
Paavo

Gareth Owen

unread,

Jan 23, 2018, 4:35:27 PM1/23/18

to

"James R. Kuyper" <james...@verizon.net> writes:

> I'm amazed that execution of the code ever successfully reached the
> delete statement, I would have expected something else to go
> catastrophically wrong first.

Oh, it did. Sometimes the most catastrophic that can possibly happen is
for the code to keep happily running :)

Vir Campestris

unread,

Jan 24, 2018, 4:48:50 PM1/24/18

to

I've had ... discussions ... with some of my colleagues about this. Most
of them are in favour of "fail hard, fail early" so we get system dumps
to tell us what's wrong. We too have some abort during delete - and they
only happen every couple of months on any given system. One or two
people do not want asserts in production code.

Andy

Jorgen Grahn

unread,

Jan 25, 2018, 9:17:55 AM1/25/18

to

"Fail hard, fail early" is the norm around here, too.

Trends like microservice architecture point in that direction, too: if
you can just let a small component die and be replaced by a working
one, there's no point in trying to patch over problems and continue.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .