Yet Another Protocol Buffers Implementation for C

David Rogers

unread,

Jul 24, 2015, 4:25:39 PM7/24/15

to Protocol Buffers

sprotoc - short for stack protocol buffer compiler is a C-code generator for protocol buffers.
It lives at: https://github.com/frobnitzem/sprotoc

I coded it up a year ago and have been using it happily since.
What makes it unique is the ability to write your own copy in/out functions
for each message type so that you aren't stuck with creating a copy of a large recursive data structure.
It generates example copy functions so you can get going without reading excessive documentation.

Michael Haberler

unread,

Jul 25, 2015, 12:40:23 AM7/25/15

to David Rogers, Protocol Buffers

Hi David,

how is that different from nanopb (http://koti.kapsi.fi/~jpa/nanopb/) ?

- Michael

>
>
> --
> You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+u...@googlegroups.com.
> To post to this group, send email to prot...@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.

David Rogers

unread,

Jul 25, 2015, 1:04:02 AM7/25/15

to Michael Haberler, Protocol Buffers

Nanopb has a heavier interface for including sub-messages. Here's an example, from their union structure:

// This is an example of how to handle 'union' style messages
// with nanopb, without allocating memory for all the message types.
//
// There is no official type in Protocol Buffers for describing unions,
// but they are commonly implemented by filling out exactly one of
// several optional fields.

message MsgType1 {
    required int32 value = 1;
}

message MsgType2 {
    required bool value = 1;
}

message MsgType3 {
    required int32 value1 = 1;
    required int32 value2 = 2;
}

message UnionMessage {
    optional MsgType1 msg1 = 1;
    optional MsgType2 msg2 = 2;
    optional MsgType3 msg3 = 3;
}

it's encoded with:

/* This function is the core of the union encoding process. It handles
 * the top-level pb_field_t array manually, in order to encode a correct
 * field tag before the message. The pointer to MsgType_fields array is
 * used as an unique identifier for the message type.
 */
bool encode_unionmessage(pb_ostream_t *stream, const pb_field_t messagetype[], const void *message)
{
    const pb_field_t *field;
    for (field = UnionMessage_fields; field->tag != 0; field++)
    {
        if (field->ptr == messagetype)
        {
            /* This is our field, encode the message using it. */
            if (!pb_encode_tag_for_field(stream, field))
                return false;
            
            return pb_encode_submessage(stream, messagetype, message);
        }
    }
    
    /* Didn't find the field for messagetype */
    return false;
}

Their low-level code is necessary because "Usually, nanopb would allocate space to store all of the possible messages at the same time, even though at most one of them will be used at a time. "

My library would encode such a thing by calling the top-level (auto-generated) function:
buf = UnionMessage_to_string(&len, (MY_UnionMessage *)&test);

and providing copy functions for each message. The auto-generated example code to use the interface is below:

void protowr_MsgType1(MsgType1 *out, MY_MsgType1 *a) {
    out->value = a->value;
}
void protowr_MsgType2(MsgType2 *out, MY_MsgType2 *a) {
    out->value = a->value;
}
void protowr_MsgType3(MsgType3 *out, MY_MsgType3 *a) {
    out->value1 = a->value1;
    out->value2 = a->value2;
}
// of course, the programmer would modify this to a case() statement.
void protowr_UnionMessage(UnionMessage *out, MY_UnionMessage *a) {
    out->has_msg1 = a->has_msg1;
    if(a->has_msg1) {
        out->msg1 = a->msg1;
    }
    out->has_msg2 = a->has_msg2;
    if(a->has_msg2) {
        out->msg2 = a->msg2;
    }
    out->has_msg3 = a->has_msg3;
    if(a->has_msg3) {
        out->msg3 = a->msg3;
    }
}

There is no dealing with the byte-level encoding at this level, only copying between a shallow parsed format and your own output data structure, one message at a time. It also doesn't litter message-specific field tags or names anywhere - only high-level, per-message encoding/decoding functions. I don't know if nanopb does this, but some implementations do.

Hope that helps,
~ David.

Michael Haberler

unread,

Jul 26, 2015, 1:53:46 AM7/26/15

to David Rogers, Protocol Buffers

David,

not yet, as I do not fully understand the point you are making

Is the type/name specific per-field/per-message accessors?

I note one of the key advantages of nanob is - it can get by without malloc/free support meaning it can live in very restricted environments like embedded/bare metal or even in-kernel, which explains some of the API choices

- Michael

David Rogers

unread,

Jul 26, 2015, 2:57:59 PM7/26/15

to Protocol Buffers, habe...@gmail.com

The point is that sprotoc works submessage-at-a-time, without global data structures.
You do not need to create a completely filled-out "struct Message" before calling protowr_UnionMessage.
You just call protowr_UnionMessage with your own data structure, whatever that may be. It uses your callback
to make a shallow-copy of the fields, then serializes that copy.

In nanopb's case, you have to do your own conversion from your struct-s to nanopb's structs.
It helps by creating a bunch of global data structures like "UnionMessage_fields," but using these
to manually call "pb_encode_tag_for_field(stream, field)" is different than using a callback
and setting out->msg1 = a->msg1;

This is flexible enough to serialize the union data structures with the (to me) intuitive callback:

typedef struct {
    int type;
    union {
       MY_MsgType1 *msg1;
       MY_MsgType2 *msg2;
       MY_MsgType3 *msg3;
    };
} MY_UnionMessage;

void protowr_UnionMessage(UnionMessage *out, MY_UnionMessage *a) {

   switch(a->type) {
     case 1:
         out->has_msg1 = 1;
   out->msg1 = a->msg1; // sprotoc uses callback to serialize
break;
case 2:

out->has_msg2 = a->has_msg2;

out->msg2 = a->msg2;

break;
case 3:

out->has_msg3 = a->has_msg3;

out->msg3 = a->msg3;

break;
   }
}

Also, you can easily use sprotoc without malloc, since it provides a complete allocator interface:
(sprotoc/sprotoc.h)
// Simple linked-list of memory regions for managing recursive objects.
// The max size that can be allocated in the first heap
// is ALLOCATOR_SIZE - sizeof(Allocator)
#define ALLOCATOR_SIZE (4096)
typedef struct allocator Allocator;
struct allocator {
    void *sp;
    size_t avail;
    Allocator *next;
};
Allocator *allocator_ctor();
void allocator_dtor(Allocator **);
void *allocate(Allocator *, size_t);

The implementation in sprotoc.c uses malloc, but there's no reason a fixed
stack couldn't be used instead.

Reply all

Reply to author

Forward