Segfault in python using CPP implementation

141 views
Skip to first unread message

shamer

unread,
Feb 21, 2012, 5:45:25 PM2/21/12
to Protocol Buffers
I have found a segfault when using the CPP implementation from python
in 2.4.1. I can reproduce it in two different environments with a
small number of files.

The segfault is happening in google/protobuf/internal/cpp_message.py
in the ScalarProperty getter. There seems to be some interplay between
iterating through one repeated field and accessing a scalar property
in another message.

I have reduced the reproduction to a small set of .proto and py files.
I have bundled the files and uploaded the whole set here: (http://
dl.dropbox.com/u/24148866/py_cpp_pbcrash.tgz). To reproduce the
segfault run the crashtest.py script with the cpp protocol buffer
implementation:
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp PYTHONPATH=./ python
crashtest.py

Simply accessing the self.pb.s1 property multiple times isn't enough
to cause a segfault, nor is iterating over repeated astr field.

I have made a change to the python_generator.cc file to include the
package name in the python imports. I have included the file in the
tarball and the diff is below. This modification was made to the 2.4.1
codebase. I believe that this should only change where python packages
are imported from.

I have tested this on two separate systems. Their respective
configurations are below. They both experience the segfault after a
different number of iterations through the "for a in uar.str:" loop.
Adding imports like "from datetime import date" to the top of the
script also changes the number of iterations through the loop before
segfaulting.

Any thoughts on what might be causing this or things I can do to help
narrow down the root cause?

Cheers,
Stephen


crashtest.py:
from api_pb.ui_pb2 import UIR
from api_pb.ua_pb2 import UAR
from datetime import date

import base64

activities_blob = ... large b64 blob ...

class Container(object):
def __init__(self):
b64_user_blob =
'''EnwKDFN0ZXBoZW5IYW1lchIHU3RlcGhlbhoFSGFtZXIyTmh0dHA6Ly9hMS50d2ltZy5jb20vc3RpY2t5L2RlZmF1bHRfcHJvZmlsZV9pbWFnZXMvZGVmYXVsdF9wcm9maWxlXzBfbm9ybWFsLnBuZzi9gafrBEIASgBSAGgB'''
uir = UIR()
uir.ParseFromString(base64.decodestring(b64_user_blob))
self.pb = uir.up

def iterate_on_astr(self):
uar = UAR()
uar.ParseFromString(base64.decodestring(activities_blob))
for a in uar.astr:
self.pb.s1

container = Container()
container.iterate_on_astr()



Tested environments:
$ uname -a
Linux isengard 3.1-pf #1 SMP PREEMPT Mon Jan 9 02:15:02 EST 2012
x86_64 Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz GenuineIntel GNU/Linux
$ python2 --version
Python 2.7.2

And

shamer@prod5:~$ uname -a
Linux prod5.upverter.com 2.6.35-24-virtual #42-Ubuntu SMP Thu Dec 2
05:15:26 UTC 2010 x86_64 GNU/Linux
shamer@prod5:~$ python --version
Python 2.6.6


Changes to python_generator.cc
79,80c79,80
< string ModuleName(const string& filename) {
< string basename = StripProto(filename);
---
> string ModuleName(const FileDescriptor *file) {
> string basename = StripProto(file->name());
83c83,89
< return basename + "_pb2";
---
>
> string package = file->package();
> if (package.length() > 0) {
> return package + "." + basename + "_pb2";
> } else {
> return basename + "_pb2";
> }
245c251
< string module_name = ModuleName(file->name());
---
> string module_name = ModuleName(file);
286c292
< string module_name = ModuleName(file_->dependency(i)->name());
---
> string module_name = ModuleName(file_->dependency(i));
950c956
< name = ModuleName(descriptor.file()->name()) + "." + name;
---
> name = ModuleName(descriptor.file()) + "." + name;
962c968
< name = ModuleName(descriptor.file()->name()) + "." + name;
---
> name = ModuleName(descriptor.file()) + "." + name;
975c981
< name = ModuleName(descriptor.file()->name()) + "." + name;
---
> name = ModuleName(descriptor.file()) + "." + name;

Jeremiah Jordan

unread,
Feb 29, 2012, 9:14:41 PM2/29/12
to prot...@googlegroups.com
The issue is here:
       uir = UIR() 
        uir.ParseFromString(base64.decodestring(b64_user_blob)) 
        self.pb = uir.up 

The python cpp protobuf implementation doesn't suport keeping references to internal attributes when the base attribute leaves scope.  So what happens is that when uir eventually gets garbage collected accessing self.pb.s1 will crash.

You need to keep the top level uir around if you want to use the cpp implementation.
Reply all
Reply to author
Forward
0 new messages