Bad data causes segmentation fault in google::protobuf::DescriptorPool::FindFileByName

4,887 views
Skip to first unread message

patrick...@googlemail.com

unread,
May 2, 2014, 9:29:48 AM5/2/14
to prot...@googlegroups.com
Hi,

I am using a straightforward C++ client to fetch GTFS-realtime feeds. Feeds are read into a string with libcurl and than parsed:

GOOGLE_PROTOBUF_VERIFY_VERSION; 
FeedMessage fm;

/** fetching realtime feed with libcurl into readBuffer **/

if (!fm.ParseFromString(readBuffer)) {  // this line crashes
  LOG(ERROR) << "Failed to parse realtime GTFS.";
  return;
}

I experience random segfault crashes during parsing. All of them oocure while parsing a very specific feed. The segfault is caused by

google::protobuf::DescriptorPool::FindFileByName

Backtrace:

Program received signal SIGSEGV, Segmentation fault.
#0  0x00007ffff7901f3e in google::protobuf::DescriptorPool::FindFileByName(std::string const&) const () from /usr/lib/libprotobuf.so.7
#1  0x000000000047e045 in transit_realtime::protobuf_AssignDesc_headers_2fprotobuf_2fgtfs_2drealtime_2eproto() () at headers/protobuf/gtfs-realtime.pb.cc:80
#2  0x00007ffff7683400 in pthread_once ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:104
#3  0x0000000000471593 in transit_realtime::FeedMessage::GetMetadata() const
    () at /usr/include/google/protobuf/stubs/once.h:115
#4  0x00007ffff7932274 in google::protobuf::Message::GetTypeName() const ()
   from /usr/lib/libprotobuf.so.7
#5  0x00007ffff78e2f17 in ?? () from /usr/lib/libprotobuf.so.7
#6  0x00007ffff78e3714 in google::protobuf::MessageLite::ParseFromString(std::string const&) () from /usr/lib/libprotobuf.so.7


Since the segfaults are _always_ caused by the same feed, I strongly suspect that it is bad data that causes the crash. The random nature of the crashes could be explained by the feed returning a bad message every few few hours. Nevertheless, for bad data, ParseFromString() should return false, and not crash completely.

I am using g++ 4.8.1 with -O3 -g -std=c++0x
protoc --version returns "libprotoc 2.4.1"

Thanks for any help :) 






patrick...@googlemail.com

unread,
May 22, 2014, 12:28:48 PM5/22/14
to prot...@googlegroups.com
I am still experiencing this problem. Any help is highly appreciated, I tried everything.

The problem only occurs if bad data is fed to the parseFromString() function. The following code crashes with a segmentation fault if readBuffer contains bad data (currently, the content of http://www.google.com). If readBuffer contains well-formed GTFS-realtime data, everything works as expected:

  if (!(fm.ParseFromString(readBuffer))) {
    LOG(WARNING) << "Failed to parse realtime GTFS.";
    return;
  }

This is the line in the gtfs-realtime.pb.cc (in method protobuf_AssignDesc_headers_2fprotobuf_2fgtfs_2drealtime_2eproto()) that causes the crash:

  const ::google::protobuf::FileDescriptor* file =
    ::google::protobuf::DescriptorPool::generated_pool()->FindFileByName(
      "headers/protobuf/gtfs-realtime.proto");

For some reason, it tries to read the original .proto file if bad data is received. I tried changing this line to an absolute path, but the problem persists. The gtfs-realtime.proto file resides in headers/protobuf. protoc is called like this:

protoc -I=headers/protobuf --cpp_out=. headers/protobuf/gtfs-realtime.proto 

I tried several locations for -cpp_out, but it didnt help.

This is the valgrind output after the crash:

==4015== Invalid read of size 8
==4015==    at 0x5108FF2: google::protobuf::DescriptorPool::FindFileByName(std::string const&) const (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==4015==    by 0x47F3F4: transit_realtime::protobuf_AssignDesc_gtfs_2drealtime_2eproto() (in /home/patrick/repos/trajserver/trajhttpserv)
==4015==    by 0x50DC2CF: google::protobuf::internal::FunctionClosure0::Run() (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==4015==    by 0x50DC4F0: google::protobuf::GoogleOnceInitImpl(long*, google::protobuf::Closure*) (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==4015==    by 0x474444: transit_realtime::FeedMessage::GetMetadata() const (in /home/patrick/repos/trajserver/trajhttpserv)
==4015==    by 0x513FF6F: google::protobuf::Message::GetTypeName() const (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==4015==    by 0x50EB090: ??? (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==4015==    by 0x50EB891: google::protobuf::MessageLite::ParseFromString(std::string const&) (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==4015==    by 0x442E83: GtfsRealTimeReader::getTripUpdates() (in /home/patrick/repos/trajserver/trajhttpserv)
==4015==    by 0x4824AB: updateRealtimeReaders(std::vector<GtfsRealTimeReader, std::allocator<GtfsRealTimeReader> >*, unsigned int) (in /home/patrick/repos/trajserver/trajhttpserv)
==4015==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==4015== 
[22/05/2014 15:21:02,241645] FATAL: CRASH HANDLED; Application has crashed due to [SIGSEGV] signal


Again, this crash happens every time I try to parse some random string as a GTFS-realtime protobuf feed.

I reproduced the segfault on three different machines: a Ubuntu 13.10 machine with libprotoc 2.4.1, a Ubuntu 14.04 machine with libprotoc 2.5.0 and a Debian machine with libprotoc 2.4.1. I tried different c++ version, Ii tried -O1, -O2, -O3, I tried fixing the gtfs-realtime.pb.cc line by hand, I played around with the protoc parameters, I compiled and installed libprotoc 2.5.0 by hand on all 3 machines, I tried parsing the feed from a stream and from a string, I created an application that only reads http://www.google.com and feeds it to parseFromString() function - the behaviour is always exactly the same: no problem on well-formed data, segfault on bad data.

I am using the original gtfs-realtime.proto file from the GTFS realtime page.

Anyone?

Feng Xiao

unread,
May 22, 2014, 1:47:21 PM5/22/14
to patrick...@googlemail.com, Protocol Buffers
Can you try if the following code can produce any meaningful results? I suggest you try this once in your original binary and then create a simple program calling the following code only (and only linking the .proto file without anything else in your project). If both fail (segfault), it might be a bug in protobuf generated code. If only the former fails, the problem should be somewhere else in your project (memory corruption bugs most likely).

LOG(INFO) << fm.GetDescriptor()->DebugString();

Note that you shouldn't change the generated code. It won't work.


--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

patrick...@googlemail.com

unread,
May 22, 2014, 1:58:48 PM5/22/14
to prot...@googlegroups.com, patrick...@googlemail.com
Thank your very much for your answer! The result of

  GOOGLE_PROTOBUF_VERIFY_VERSION;
  FeedMessage fm;
  LOG(INFO) << fm.GetDescriptor()->DebugString();

in both applications is

==13935== Invalid read of size 8
==13935==    at 0x510EFF2: google::protobuf::DescriptorPool::FindFileByName(std::string const&) const (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==13935==    by 0x47F3F4: transit_realtime::protobuf_AssignDesc_gtfs_2drealtime_2eproto() (in /home/patrick/secure/repo/trajserver-internal2/build/trajhttpserv)
==13935==    by 0x50E22CF: google::protobuf::internal::FunctionClosure0::Run() (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==13935==    by 0x50E24F0: google::protobuf::GoogleOnceInitImpl(long*, google::protobuf::Closure*) (in /usr/lib/x86_64-linux-gnu/libprotobuf.so.8.0.0)
==13935==    by 0x474464: transit_realtime::FeedMessage::GetMetadata() const (in /home/patrick/secure/repo/trajserver-internal2/build/trajhttpserv)
==13935==    by 0x442D40: GtfsRealTimeReader::getTripUpdates() (in /home/patrick/secure/repo/trajserver-internal2/build/trajhttpserv)
==13935==    by 0x4824AB: updateRealtimeReaders(std::vector<GtfsRealTimeReader, std::allocator<GtfsRealTimeReader> >*, unsigned int) (in /home/patrick/secure/repo/trajserver-internal2/build/trajhttpserv)
==13935==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==13935== 
[22/05/2014 19:55:47,646579] FATAL: CRASH HANDLED; Application has crashed due to [SIGSEGV] signal

Feng Xiao

unread,
May 22, 2014, 2:07:43 PM5/22/14
to patrick...@googlemail.com, Protocol Buffers
Can you post the code of the simple program you used to reproduce the error? (including the .pb.h/.pb.cc or the .proto file)

patrick...@googlemail.com

unread,
May 22, 2014, 2:56:22 PM5/22/14
to prot...@googlegroups.com, patrick...@googlemail.com
Okay, I narrowed it down.

I am creating a pool of GTFS real-time readers. For this, I created a wrapper class GtfsRealTimeReader. GtfsRealTimeReader.h basically looks like this (not including header guards and basic includes):

// Reads GTFS realtime protocol buffer feeds
class GtfsRealTimeReader {
 public:
  GtfsRealTimeReader() {}

  // fetch new updates
  void getTripUpdates();
};

GtfsRealTimeReader.cpp looks like this (in the test application):

#include <gtfs-realtime.pb.h>
#include "GtfsRealTimeReader.h"
#include "easylogging.h"

using std::string;
using transit_realtime::FeedMessage;
using transit_realtime::FeedEntity;
using transit_realtime::TripUpdate_StopTimeEvent;
using transit_realtime::TripDescriptor;
using transit_realtime::TripUpdate_StopTimeUpdate_ScheduleRelationship;
using transit_realtime::TripUpdate_StopTimeUpdate;

void GtfsRealTimeReader::getTripUpdates() {
  GOOGLE_PROTOBUF_VERIFY_VERSION;
  FeedMessage fm;
  LOG(INFO) << fm.GetDescriptor()->DebugString();
}

The following code WORKS:

  GtfsRealTimeReader g();
  g.getTripUpdates();

The following code WORKS:

  GtfsRealTimeReader g();
  g.getTripUpdates();
  GtfsRealTimeReader gg();
  gg.getTripUpdates();
  GtfsRealTimeReader ggg();
  ggg.getTripUpdates();

output is the DebugString() message.

The following code SEGFAULTS:

  vector<GtfsRealTimeReader> realTimeReaders;
  realTimeReaders.push_back(GtfsRealTimeReader());
  GtfsRealTimeReader g();
  g.getTripUpdates();

I can create an arbitrary number of GtfsRealTimeReader objects, they all work fine. As soon as I push one of them onto a vector and call getTripUpdates() on any of them, the segfault occurs.


patrick...@googlemail.com

unread,
May 23, 2014, 5:15:58 AM5/23/14
to prot...@googlegroups.com, patrick...@googlemail.com
Fixed by holding a vector of GtfsRealTimeReader() pointers instead of the actual objects. Still no idea what causes it.
Reply all
Reply to author
Forward
0 new messages