am I understanding this right?

2 views
Skip to first unread message

Nathan Smith

unread,
Dec 11, 2018, 6:51:53 AM12/11/18
to MOO Talk
Hi there,
So as you are probably aware, I am trying to track down a server crashing bug right now that seems to occur after checkpoints.
From a physical point of view, the server suddenly decides to dump to database.db.new.AEY, or some random letters, rather than database.db.new.
I ran valgrind, at the advise of various people, and got the log I have shown below.
If I am understanding this right, the error is memory relate, with storage.cc on line 79, which is to do with with mallock.
I have looked at the line, comparede it with the master branch of stunt, and they are EXACTLY THE SAME.
I looked on google, and one reason that this error may occur is if a negative int is sent to to it.
The line is:
    memptr = (char *) malloc(offs + size);
So for instance, according to what I read, if you do:
malloc(-2);
It will potentially go nuts and cause a server crash due to returning bnull, which then is converted to "", which causes this code to execute:
    if (!memptr) {
sprintf(msg, "memory allocation (size %u) failed!", size);
panic(msg);
    }

In the logs of the database, it only ever says signal 11 received, and then panics.
I have pasted a part of the logs of valgrind at the bottom, because I can't figure out the way to attach files in google groups, (It's one of those days I think).
I would appreciate any help with this, as I may be barking up the wrong tree.
Thanks!
Nate

==23308== 450 bytes in 35 blocks are possibly lost in loss record 7 of 10
==23308==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23308==    by 0x45F784: mymalloc(unsigned int, Memory_Type) (storage.cc:79)
==23308==    by 0x45F94E: str_dup(char const*) (storage.cc:129)
==23308==    by 0x42AAE4: dbpriv_build_prep_table() (db_verbs.cc:112)
==23308==    by 0x42128B: db_initialize(int*, char***) (db_file.cc:1259)
==23308==    by 0x45DCD5: main (server.cc:1640)
==23308== 
==23308== LEAK SUMMARY:
==23308==    definitely lost: 0 bytes in 0 blocks
==23308==    indirectly lost: 0 bytes in 0 blocks
==23308==      possibly lost: 534 bytes in 39 blocks
==23308==    still reachable: 74,816 bytes in 32 blocks
==23308==         suppressed: 0 bytes in 0 blocks
==23308== Reachable blocks (those to which a pointer was found) are not shown.
==23308== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==23308== 
==23308== For counts of detected and suppressed errors, rerun with: -v
==23308== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)

Littlefield, Tyler

unread,
Dec 11, 2018, 11:23:31 AM12/11/18
to Nathan Smith, MOO Talk

On 12/11/2018 6:51 AM, Nathan Smith wrote:
Hi there,
So as you are probably aware, I am trying to track down a server crashing bug right now that seems to occur after checkpoints.
From a physical point of view, the server suddenly decides to dump to database.db.new.AEY, or some random letters, rather than database.db.new.
I ran valgrind, at the advise of various people, and got the log I have shown below.
If I am understanding this right, the error is memory relate, with storage.cc on line 79, which is to do with with mallock.
I have looked at the line, comparede it with the master branch of stunt, and they are EXACTLY THE SAME.
I looked on google, and one reason that this error may occur is if a negative int is sent to to it.
The line is:
    memptr = (char *) malloc(offs + size);
So for instance, according to what I read, if you do:
malloc(-2);
It will potentially go nuts and cause a server crash due to returning bnull, which then is converted to "", which causes this code to execute:
    if (!memptr) {
sprintf(msg, "memory allocation (size %u) failed!", size);
panic(msg);
    }

Just because a function in storage.cc calls this does not mean that the function itself is wrong. Move farther up your call stack.
If this doesn't happen in core stunt, chances are it's whatever changes you're trying to make to your own copy.


In the logs of the database, it only ever says signal 11 received, and then panics.
I have pasted a part of the logs of valgrind at the bottom, because I can't figure out the way to attach files in google groups, (It's one of those days I think).

Signal 11 is segfault.

I would appreciate any help with this, as I may be barking up the wrong tree.
Thanks!
Nate

==23308== 450 bytes in 35 blocks are possibly lost in loss record 7 of 10
==23308==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==23308==    by 0x45F784: mymalloc(unsigned int, Memory_Type) (storage.cc:79)
==23308==    by 0x45F94E: str_dup(char const*) (storage.cc:129)
==23308==    by 0x42AAE4: dbpriv_build_prep_table() (db_verbs.cc:112)
==23308==    by 0x42128B: db_initialize(int*, char***) (db_file.cc:1259)
==23308==    by 0x45DCD5: main (server.cc:1640)
==23308== 
==23308== LEAK SUMMARY:
==23308==    definitely lost: 0 bytes in 0 blocks
==23308==    indirectly lost: 0 bytes in 0 blocks
==23308==      possibly lost: 534 bytes in 39 blocks
==23308==    still reachable: 74,816 bytes in 32 blocks
==23308==         suppressed: 0 bytes in 0 blocks
==23308== Reachable blocks (those to which a pointer was found) are not shown.
==23308== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==23308== 
==23308== For counts of detected and suppressed errors, rerun with: -v
==23308== ERROR SUMMARY: 5 errors from 5 contexts (suppressed: 0 from 0)
--
You received this message because you are subscribed to the Google Groups "MOO Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to MOO-talk+u...@googlegroups.com.
To post to this group, send email to MOO-...@googlegroups.com.
Visit this group at https://groups.google.com/group/MOO-talk.
For more options, visit https://groups.google.com/d/optout.


Todd Sundsted

unread,
Dec 11, 2018, 6:53:49 PM12/11/18
to MOO Talk
There should be more log than that. You are not really concerned about memory leaks here. There should be some errors -- the log suggests 5 were reported ("ERROR SUMMARY: 5 errors from 5 contexts (suppressed....").
Reply all
Reply to author
Forward
0 new messages