Potential memory leak

37 views
Skip to first unread message

Michael Alexeev

unread,
Mar 14, 2012, 9:50:04 PM3/14/12
to voltd...@googlegroups.com
Hi All,

Below is excerpt from the SeqScanExecutor::p_init method

        TupleSchema* schema = node->generateTupleSchema(true);
        int column_count = static_cast<int>(node->getOutputSchema().size());
        std::string* column_names = new std::string[column_count];                  // line 1 memory allocated
        for (int ctr = 0; ctr < column_count; ctr++)
        {
            column_names[ctr] = node->getOutputSchema()[ctr]->getColumnName();
        }
        node->setOutputTable(TableFactory::getTempTable(node->databaseId(),
                                                        node->getTargetTable()->name(),
                                                        schema,
                                                        column_names,
                                                        limits));
        delete[] column_names;                                                                      // line 2 memory released                                                                       


memory is allocated at line1 and released at line 2. If an exception is thrown in between there would be a leak. Maybe it's not the case in this particular method but still, possibility exists. Why not use either boost::scoped_array or std analog if we are going to adopt the new C++11 standard.

        TupleSchema* schema = node->generateTupleSchema(true);
        int column_count = static_cast<int>(node->getOutputSchema().size());
        boost::scoped_array<std::string> column_names(new std::string[column_count]);                  // line 1 memory allocated using RAII
        for (int ctr = 0; ctr < column_count; ctr++)
        {
            column_names[ctr] = node->getOutputSchema()[ctr]->getColumnName();
        }
        node->setOutputTable(TableFactory::getTempTable(node->databaseId(),
                                                        node->getTargetTable()->name(),
                                                        schema,
                                                        column_names.get(),
                                                        limits));
         // no need to delete  

There are actually quite a few similar examples in the code where resource is allocated in one place and released at another. RAII idiom is a much safer in such cases.

Thanks,
Mike

John Hugg

unread,
Mar 14, 2012, 11:17:00 PM3/14/12
to voltd...@googlegroups.com
I think what you've written is sensible. Thoughts:

One bit of history that might be pertinent is that very early h-store prototypes (that became VoltDB 1.0) used boost shared pointers and their ilk extensively. The judgement at the time was that it might have been overboard for the core performance code in VoltDB. A deep purge was made and there was measurable improvements. Boost shared pointers are not generally slow, and the near-total purge may have been an overcorrection.

But we've also gotten a little more careful about how we allocate memory since those early days. The number of allocations in fast paths is pretty small. I like the idea behind changes like this. I suspect we'd accept a patch that made such changes, with the assumption that it didn't have a measurable affect on performance.

Note that for measurement for things like this, we typically use the TPC-C-like benchmark we have, as it stresses the C++ code much more than our other benchmarks. It would be nice to have some C++ only micro benchmarks that exercised core loops, but we have a history of TPC-C numbers on common hardware. The results of daily performance runs aren't public, which is too bad (but also a bit of work).

-John

Ariel Weisberg

unread,
Mar 14, 2012, 11:22:54 PM3/14/12
to voltd...@googlegroups.com
Hi,

You can use boost pointers for RAII and still leak the internal pointer for fast lookups e.g. get more cake and still have some.

Ariel

Michael Alexeev

unread,
Mar 15, 2012, 7:21:50 PM3/15/12
to voltd...@googlegroups.com
On Wed, Mar 14, 2012 at 11:22 PM, Ariel Weisberg <awei...@voltdb.com> wrote:
Hi,

You can use boost pointers for RAII and still leak the internal pointer for fast lookups e.g. get more cake and still have some.

Yes, but it's much harder now unless you do explicit release or delete.
 

Ariel


On Wed, Mar 14, 2012 at 11:17 PM, John Hugg <jh...@voltdb.com> wrote:
I think what you've written is sensible. Thoughts:

One bit of history that might be pertinent is that very early h-store prototypes (that became VoltDB 1.0) used boost shared pointers and their ilk extensively. The judgement at the time was that it might have been overboard for the core performance code in VoltDB. A deep purge was made and there was measurable improvements. Boost shared pointers are not generally slow, and the near-total purge may have been an overcorrection.

Agree, shared pointer does have some bookkeeping overhead. But operations on boost::scoped_ptr or std::unique_ptr should be as fast as with naked one. 


But we've also gotten a little more careful about how we allocate memory since those early days. The number of allocations in fast paths is pretty small. I like the idea behind changes like this. I suspect we'd accept a patch that made such changes, with the assumption that it didn't have a measurable affect on performance.

Yes, performance is very important, no questions there. But going back to the original code, how do you know that no exception is thrown between allocation and release points? There are plenty of throw testaments in the code. Even if nothing is thrown now in between it doesn't guarantee that it will stay that way in the future.
 
Mike

Ryan Betts

unread,
Mar 15, 2012, 7:46:27 PM3/15/12
to voltd...@googlegroups.com
Hey Mike,

These are valid defects and become more significant as we use more
adhoc SQL. Perhaps we haven't been strict enough here as this is an
initialization path and a failed plan fragment would usually terminate
DB startup in most use cases.

Thanks for the code review -- we'd love a patch or we can file a
defect to make a pass through these p_init functions and fix these
error path leaks.

Ryan.

Michael Alexeev

unread,
Mar 15, 2012, 9:30:10 PM3/15/12
to voltd...@googlegroups.com
Sure, no problem.

Paul Martel

unread,
Mar 17, 2012, 3:12:49 PM3/17/12
to voltd...@googlegroups.com
A minor point: I believe that the original ' = ' syntax for initialization is equivalent to the changed syntax using '(...), and I, for one, find ' = ' more natural for smart pointers. That is, I prefer

        boost::scoped_array<std::string> column_names = new std::string[column_count]);

to

        boost::scoped_array<std::string> column_names(new std::string[column_count]));

for use with (smart) pointers, primitives, enums, and "value" classes.

I lean towards the '(...)' syntax for more involved classes whose single initializer represents only part of its state.
I'm on the fence when it comes to which form to use for copy constructors for such classes.

I don't know how much of this sentiment is widely shared or if it rates documentation as a suggestion/preference in our coding standards.

Also, AFAIK, there's no mention of support for C++11 features or similar/other "recent" gcc features in the C++ coding standards for VoltDB internals or in the user documentation for the C++ clients (and so for the provided client headers).

My recommendation would be that we document that we support the feature set of some minimum rev. of gcc.

We appear to be still using gcc 4.1.2 on some in-house machines and supporting the same in the field, which seems to rule out use of the C++0x/C++11 features for VoltDB internal code or any client app examples we may use for testing, until those machines -- and official support for those older platform revs -- gets phased out.

Naturally, users are free to build the community edition with other/newer compilers and to write client apps that use newer language features, but VoltDB should only accept patches that we can build and validate on all of our supported platforms.

--paul

On Wed, Mar 14, 2012 at 11:22 PM, Ariel Weisberg <awei...@voltdb.com> wrote:

Andrew Wilson

unread,
Mar 17, 2012, 3:34:08 PM3/17/12
to voltd...@googlegroups.com, voltd...@googlegroups.com
Hi,

This sounds pretty useful. I'm not sure I like the automatic deletion part though. 

Is this something better suited as an extension of views?

Do other traditional db's support this kind of table or some other construct?



Sent from my iPhone

Andrew Wilson

unread,
Mar 17, 2012, 3:34:54 PM3/17/12
to voltd...@googlegroups.com, voltd...@googlegroups.com
Sorry, wrong thread:)

Sent from my iPhone

On Mar 15, 2012, at 9:30 PM, Michael Alexeev <michael...@gmail.com> wrote:

Michael Alexeev

unread,
Mar 18, 2012, 6:19:27 PM3/18/12
to voltd...@googlegroups.com
Hi Paul,

On Sat, Mar 17, 2012 at 3:12 PM, Paul Martel <pma...@voltdb.com> wrote:
A minor point: I believe that the original ' = ' syntax for initialization is equivalent to the changed syntax using '(...), and I, for one, find ' = ' more natural for smart pointers. That is, I prefer

        boost::scoped_array<std::string> column_names = new std::string[column_count]);

to

        boost::scoped_array<std::string> column_names(new std::string[column_count]));

for use with (smart) pointers, primitives, enums, and "value" classes.

I lean towards the '(...)' syntax for more involved classes whose single initializer represents only part of its state.
I'm on the fence when it comes to which form to use for copy constructors for such classes.

The former is copy initialization and the latter is value initialization. in practice, they are equivalent since all modern compliers elide the intermediate copy constructor. I personally tend to use the second form all the time but it's just my personal preferences and nothing more. If VoltDb has C++ coding standards/guidelines, please let me know and I would gladly follow them

Mike

Paul Martel

unread,
Mar 18, 2012, 7:25:40 PM3/18/12
to voltd...@googlegroups.com
Mike,
  Thanks for the clarification on copy vs. value initialization -- as you say, the compilers are good at making them a wash. So much that I have come to think of them as equivalent for practical purposes.

The VoltDB coding standards are now documented on the developer wiki https://github.com/VoltDB/voltdb/wiki/Code-conventions-%26-licensing
They're pretty easy to live with.  Other standards may be added as we reach consensus about them -- most likely in the form of guidelines/direction/preferences rather than new "Hard Rules".

Your feedback on this or other developer wiki pages is welcome. The wiki is in its early days and is still very much a work in progress.

--paul

Michael Alexeev

unread,
Mar 18, 2012, 8:33:46 PM3/18/12
to voltd...@googlegroups.com
Paul,

On Sun, Mar 18, 2012 at 7:25 PM, Paul Martel <pma...@voltdb.com> wrote:

  Thanks for the clarification on copy vs. value initialization -- as you say, the compilers are good at making them a wash. So much that I have come to think of them as equivalent for practical purposes.

! have to correct myself. 

boost::shared_ptr<std::string> sp = new std::string();

won't compile for a simple reason of lucking conversion from pointer type to boost::shared_ptr. If it were, then the path would be Pointer to shared_ptr conversion - copy construction (which would be elided :))

Sorry for the confusion.


The VoltDB coding standards are now documented on the developer wiki https://github.com/VoltDB/voltdb/wiki/Code-conventions-%26-licensing
They're pretty easy to live with.  Other standards may be added as we reach consensus about them -- most likely in the form of guidelines/direction/preferences rather than new "Hard Rules".

Your feedback on this or other developer wiki pages is welcome. The wiki is in its early days and is still very much a work in progress.

Thanks for the reference. makes perfect sense.
Mike

Paul Martel

unread,
Mar 18, 2012, 9:26:51 PM3/18/12
to voltd...@googlegroups.com
Mike,
I'm sorry I steered us down this path.
I missed the fact that the shared_ptr(T*) constructor is declared explicit, which breaks my version.
I'm still a little surprised at that.
I must have only used the " = " syntax with other RAII implementations.
Note to self: "Compile code first, then share it."
Thanks (again) for the education.
--paul
Reply all
Reply to author
Forward
0 new messages