how to get RepeatedField object

7,016 views
Skip to first unread message

Romain Francois

unread,
Feb 11, 2010, 8:57:08 AM2/11/10
to Protocol Buffers, Dirk Eddelbuettel
Hello,

How can I get hold of a RepeatedField object to manage a repeated field
in C++.

In RProtoBuf, we do a lot of :

for( int i=0; i<size; i++){
INTEGER(res)[i] = (int) ref->GetRepeatedInt32( *message, fieldDesc, i ) ;
}

where essentially the INTEGER macro gives a pointer to the beginning of
the int array we are filling.

I'd like to replace this using e.g std::copy

RepeatedField field ;
std::copy( field.begin(), field.end(), INTEGER(res) ) ;

but I can't find how to actually get hold of a RepeatedField object.

Is it possible ?

Romain

--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/NrTG : Rcpp 0.7.5
|- http://tr.im/MPYc : RProtoBuf: protocol buffers for R
`- http://tr.im/KfKn : Rcpp 0.7.2

Jason Hsueh

unread,
Feb 11, 2010, 12:50:12 PM2/11/10
to Romain Francois, Protocol Buffers, Dirk Eddelbuettel
No, there isn't a way to get the RepeatedField from the reflection interface. You can only do so via the generated interface.


--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.


Romain Francois

unread,
Feb 12, 2010, 5:51:07 AM2/12/10
to Jason Hsueh, Protocol Buffers

Why not ? It seems reasonnable to want to use e.g. std::copy and friends.

On the documentation it says :

"
Most users will not ever use a RepeatedField directly; they will use the
get-by-index, set-by-index, and add accessors that are generated for all
repeated fields
"

What if I do want to use RepeatedField ?

Romain

On 02/11/2010 06:50 PM, Jason Hsueh wrote:
> No, there isn't a way to get the RepeatedField from the reflection
> interface. You can only do so via the generated interface.
>
> On Thu, Feb 11, 2010 at 5:57 AM, Romain Francois
> <romain.francoi...@gmail.com

Jason Hsueh

unread,
Feb 12, 2010, 3:11:38 PM2/12/10
to Romain Francois, Protocol Buffers, Kenton Varda
+kenton

Kenton may have a better answer, but I surmise that it's to avoid tying the Reflection interface to implementation details. A Message implementation might not use RepeatedField at all. The original version of protobufs used a different class to represent repeated fields, so it wouldn't have been possible to implement Reflection for the original version if the interface required access to RepeatedField. So maybe the reason is historical. Recent changes have been made that would make this technically possible. However, adding methods to get direct access to the RepeatedField would still expand the Reflection interface quite a bit. I'll defer to Kenton on that.

Kenton Varda

unread,
Feb 12, 2010, 4:37:39 PM2/12/10
to Jason Hsueh, Romain Francois, Protocol Buffers
Yeah, it would add a lot of complication and reduce the flexibility of the Reflection interface.  We don't want to require repeated fields to be implemented in terms of RepeatedField or RepeatedPtrField.

Romain Francois

unread,
Feb 13, 2010, 4:16:51 AM2/13/10
to Kenton Varda, Jason Hsueh, Protocol Buffers, Dirk Eddelbuettel
Hello,

Thanks for the answers.

Maybe I should give some more background on why this is of interest to
me. In RProtoBuf, we essentially only use the reflection api so that we
can dynamically load new proto message types at runtime, etc ... we
don't use protoc and therefore have no access to the generated classes.

In the class "GeneratedMessageReflection", there are templates such as :

template <typename Type>
inline Type GetRepeatedField(const Message& message,
const FieldDescriptor* field,
int index) const;

but they are private ? Why ?

From what I can read of the code, methods like GetRepeatedInt32 get
expanded out of :

PASSTYPE GeneratedMessageReflection::GetRepeated##TYPENAME( \
const Message& message,
\
const FieldDescriptor* field, int index) const {
\
USAGE_CHECK_ALL(GetRepeated##TYPENAME, REPEATED, CPPTYPE);
\
if (field->is_extension()) {
\
return GetExtensionSet(message).GetRepeated##TYPENAME(
\
field->number(), index);
\
} else {
\
return GetRepeatedField<TYPE>(message, field, index);
\
}
\
}
\

so doing things like this code

for( int i=0; i<size; i++){
INTEGER(res)[i] = (int) ref->GetRepeatedInt32( *message,
fieldDesc,
i ) ;
}

is going to be not as efficient as if I could directly iterate over the
repeated field using RepeatedField::iterator

Instead of extending the Reflection interface, what about making the
templates public in GeneratedMessageReflection and then maybe use some
sort of trait to indicate whether that the instance of Reflection I have
access to has these templates.


Something like :


template <typename _T, _T _V> struct integral_constant {
static const _T value = _V;
typedef _T value_type;
typedef integral_constant<_T,_V> type;
};
typedef integral_constant<bool, true> true_type;
typedef integral_constant<bool, false> false_type;

struct reflection_has_templates{} ;
struct reflection_no_templates{} ;

template <typename T> struct has_templates : public false_type{};
template<> struct has_templates<GeneratedMessageReflection> : public
true_type{} ;

So that using this trait, I can dispatch between :
- suboptimal implementation when the templates are not there
- optimal implementation using RepeatedField::iterator when they are
available.

It adds nothing to the Reflection interface.

Romain


On 02/12/2010 10:37 PM, Kenton Varda wrote:
> Yeah, it would add a lot of complication and reduce the flexibility of
> the Reflection interface. We don't want to require repeated fields to
> be implemented in terms of RepeatedField or RepeatedPtrField.
>
> On Fri, Feb 12, 2010 at 12:11 PM, Jason Hsueh <jas...@google.com
> <mailto:jas...@google.com>> wrote:
>
> +kenton
>
> Kenton may have a better answer, but I surmise that it's to avoid
> tying the Reflection interface to implementation details. A Message
> implementation might not use RepeatedField at all. The original
> version of protobufs used a different class to represent repeated
> fields, so it wouldn't have been possible to implement Reflection
> for the original version if the interface required access to
> RepeatedField. So maybe the reason is historical. Recent changes
> have been made that would make this technically possible. However,
> adding methods to get direct access to the RepeatedField would still
> expand the Reflection interface quite a bit. I'll defer to Kenton on
> that.
>
> On Fri, Feb 12, 2010 at 2:51 AM, Romain Francois
> <romain.francoi...@gmail.com

> <mailto:romain.francoi...@gmail.com>> wrote:
>
>
> Why not ? It seems reasonnable to want to use e.g. std::copy and
> friends.
>
> On the documentation it says :
>
> "
> Most users will not ever use a RepeatedField directly; they will
> use the get-by-index, set-by-index, and add accessors that are
> generated for all repeated fields
> "
>
> What if I do want to use RepeatedField ?
>
> Romain
>
>
> On 02/11/2010 06:50 PM, Jason Hsueh wrote:
>
> No, there isn't a way to get the RepeatedField from the
> reflection
> interface. You can only do so via the generated interface.
>
> On Thu, Feb 11, 2010 at 5:57 AM, Romain Francois
> <romain.francoi...@gmail.com
> <mailto:romain.francoi...@gmail.com>

> <mailto:romain.francoi...@gmail.com

Kenton Varda

unread,
Feb 16, 2010, 4:44:50 PM2/16/10
to Romain Francois, Jason Hsueh, Protocol Buffers, Dirk Eddelbuettel
GeneratedMessageReflection is an internal class used by the protobuf implementation.  Currently, users are not allowed to use it directly, because we may change it at any time.  You're suggesting that we promote it to a public interface, which has implications with regard to maintenance costs and implementation agility.

I'm open to considering making this change for performance purposes.  However, even them I'm hesitant to expose Repeated[Ptr]Field references directly via this interface.  I'd like to see what happens if we simply make all of the existing public accessor methods "inline", so you could then do something like:

  int size = reflection->FieldSize(message, field);
  for (int i = 0; i < size; i++) {
    const Message& sub_message =
        reflection->GeneratedMessageReflection::GetRepeatedMessage(message, field, i);
    // Do something with sub_message.
  }

If GetRepeatedMessage is inline then I believe the above loop would be nearly as efficient as iterating directly over a RepeatedPtrField.  Note that the funny syntax for the method call avoids making a virtual call.

BTW, the "has_templates" template you suggest would not work as you think -- how would you actually use it?  What you would really need to use is dynamic_cast.  My golden rule of dynamic_cast is that it should only be used for optimization, and you must provide an implementation for the case where dynamic_cast always returns NULL.  In your case, you are doing this, so that should be fine.

Romain Francois

unread,
Feb 16, 2010, 5:47:53 PM2/16/10
to Kenton Varda, Jason Hsueh, Protocol Buffers, Dirk Eddelbuettel
Thanks for the feedback.

On 02/16/2010 10:44 PM, Kenton Varda wrote:
> GeneratedMessageReflection is an internal class used by the protobuf
> implementation. Currently, users are not allowed to use it directly,
> because we may change it at any time. You're suggesting that we promote
> it to a public interface, which has implications with regard to
> maintenance costs and implementation agility.

I understand that.

I'm not necessarily suggesting to make this particular class public, but
am looking for some way to iterate over the elements of a repeated field.

I see that this is available for classes that are generated by protoc,
which is what is mostly used, but unfortunately we don't/can't really do
that with R.


On the same score, if I want to increase the number of elements in a
repeated field, I have to do it one by one right ? I can't do something
like first reserve space, and then fill the generated space. Does that
mean that memory is reallocated each time ?

> I'm open to considering making this change for performance purposes.
> However, even them I'm hesitant to expose Repeated[Ptr]Field
> references directly via this interface. I'd like to see what happens if
> we simply make all of the existing public accessor methods "inline", so
> you could then do something like:
>
> int size = reflection->FieldSize(message, field);
> for (int i = 0; i < size; i++) {
> const Message& sub_message =
>
> reflection->GeneratedMessageReflection::GetRepeatedMessage(message,
> field, i);
> // Do something with sub_message.
> }
>
> If GetRepeatedMessage is inline then I believe the above loop would be
> nearly as efficient as iterating directly over a RepeatedPtrField. Note
> that the funny syntax for the method call avoids making a virtual call.
>
> BTW, the "has_templates" template you suggest would not work as you
> think -- how would you actually use it?

implement some sort of TMP dispatch :

template <typename T>
foo( T t ){
foo_dispatch( t, typename has_templates<T>::value_type() ) ;
}

and then have

template <typename T>
foo( T t , false_type){
/* the less efficient version */
}

template <typename T>
foo( T t, true_type){
/* the more efficient version using the RepeatedField */
}

This is similar to how e.g std::distance dispatches depending on whether
it deals with random access iterator or some other iterator category.

> What you would really need to
> use is dynamic_cast. My golden rule of dynamic_cast is that it should
> only be used for optimization, and you must provide an implementation
> for the case where dynamic_cast always returns NULL.

But dynamic_cast is a runtime thing, where TMP dispatch happens at
compile time.

> In your case, you
> are doing this, so that should be fine.
>
> On Sat, Feb 13, 2010 at 1:16 AM, Romain Francois
> <romain.francoi...@gmail.com

> <mailto:jas...@google.com <mailto:jas...@google.com>>> wrote:
>
> +kenton
>
> Kenton may have a better answer, but I surmise that it's to
> avoid
> tying the Reflection interface to implementation details. A
> Message
> implementation might not use RepeatedField at all. The original
> version of protobufs used a different class to represent
> repeated
> fields, so it wouldn't have been possible to implement
> Reflection
> for the original version if the interface required access to
> RepeatedField. So maybe the reason is historical. Recent changes
> have been made that would make this technically possible.
> However,
> adding methods to get direct access to the RepeatedField
> would still
> expand the Reflection interface quite a bit. I'll defer to
> Kenton on
> that.
>
> On Fri, Feb 12, 2010 at 2:51 AM, Romain Francois
> <romain.francoi...@gmail.com
> <mailto:romain.francoi...@gmail.com>

|- http://tr.im/OcQe : Rcpp 0.7.7
|- http://tr.im/O1wO : highlight 0.1-5
`- http://tr.im/O1qJ : Rcpp 0.7.6

Kenton Varda

unread,
Feb 16, 2010, 6:51:55 PM2/16/10
to Romain Francois, Jason Hsueh, Protocol Buffers, Dirk Eddelbuettel
On Tue, Feb 16, 2010 at 2:47 PM, Romain Francois <romain.francoi...@gmail.com> wrote:
Thanks for the feedback.


On 02/16/2010 10:44 PM, Kenton Varda wrote:
GeneratedMessageReflection is an internal class used by the protobuf
implementation.  Currently, users are not allowed to use it directly,
because we may change it at any time.  You're suggesting that we promote
it to a public interface, which has implications with regard to
maintenance costs and implementation agility.

I understand that.

I'm not necessarily suggesting to make this particular class public, but am looking for some way to iterate over the elements of a repeated field.

The Reflection interface already provides a way -- FieldSize() and GetRepeatedX().  The only problem is that it's a bit slower than the generated accessors because these methods aren't inlineable.

BTW, have you observed an actual performance problem or are you just speculating that this performance difference may be a problem for you?
 
On the same score, if I want to increase the number of elements in a repeated field, I have to do it one by one right ? I can't do something like first reserve space, and then fill the generated space. Does that mean that memory is reallocated each time ?

No, adding an element is O(1), even if you haven't reserved space.  It works the same way std::vector works.
 

What you would really need to
use is dynamic_cast.  My golden rule of dynamic_cast is that it should
only be used for optimization, and you must provide an implementation
for the case where dynamic_cast always returns NULL.

But dynamic_cast is a runtime thing, where TMP dispatch happens at compile time.

GetReflection() returns Reflection*, NOT GeneratedMessageReflection*.  Therefore the compile has no idea which type was returned, therefore you *must* use a runtime check.  That's my point -- your template dispatch doesn't accomplish anything in this context.

Romain Francois

unread,
Feb 17, 2010, 3:19:23 AM2/17/10
to Kenton Varda, Jason Hsueh, Protocol Buffers, Dirk Eddelbuettel
On 02/17/2010 12:51 AM, Kenton Varda wrote:
> On Tue, Feb 16, 2010 at 2:47 PM, Romain Francois
> <romain.francoi...@gmail.com
> <mailto:romain.francoi...@gmail.com>> wrote:
>
> Thanks for the feedback.
>
>
> On 02/16/2010 10:44 PM, Kenton Varda wrote:
>
> GeneratedMessageReflection is an internal class used by the protobuf
> implementation. Currently, users are not allowed to use it
> directly,
> because we may change it at any time. You're suggesting that we
> promote
> it to a public interface, which has implications with regard to
> maintenance costs and implementation agility.
>
>
> I understand that.
>
> I'm not necessarily suggesting to make this particular class public,
> but am looking for some way to iterate over the elements of a
> repeated field.
>
>
> The Reflection interface already provides a way -- FieldSize() and
> GetRepeatedX(). The only problem is that it's a bit slower than the
> generated accessors because these methods aren't inlineable.

Sure. I meant "STL algorithms" iterating.

> BTW, have you observed an actual performance problem or are you just
> speculating that this performance difference may be a problem for you?

In similar (non protobuf-related) settings, we have observed quite a bit
of difference between using plain loops and accessors as opposed to
iterators.

But this is mainly speculation on the performance.


Also, this looks nicer :

double sum = std::accumulate( y.begin(), y.end(), 0.0 ) ;

than this:

double x = 0.0 ;
for( int i=0; i<n; i++){
x += ref.GetRepeatedDouble( message, field, i ) ;
}

... but I suppose we can wrap these in our own iterators, but then I
guess we'd have to resort to some sort of proxy to deal with
GetRepeatedDouble and SetRepeatedDouble.

> On the same score, if I want to increase the number of elements in a
> repeated field, I have to do it one by one right ? I can't do
> something like first reserve space, and then fill the generated
> space. Does that mean that memory is reallocated each time ?
>
>
> No, adding an element is O(1), even if you haven't reserved space. It
> works the same way std::vector works.

But when you use std::vector, it is best to first reserve the target
size as opposed to create an empty vector and push_back each element.


> What you would really need to
> use is dynamic_cast. My golden rule of dynamic_cast is that it
> should
> only be used for optimization, and you must provide an
> implementation
> for the case where dynamic_cast always returns NULL.
>
>
> But dynamic_cast is a runtime thing, where TMP dispatch happens at
> compile time.
>
>
> GetReflection() returns Reflection*, NOT GeneratedMessageReflection*.
> Therefore the compile has no idea which type was returned, therefore
> you *must* use a runtime check. That's my point -- your template
> dispatch doesn't accomplish anything in this context.

Ah ok. Sorry about that.

> In your case, you
> are doing this, so that should be fine.
>
> On Sat, Feb 13, 2010 at 1:16 AM, Romain Francois
> <romain.francoi...@gmail.com
> <mailto:romain.francoi...@gmail.com>

> <mailto:romain.francoi...@gmail.com
> <mailto:romain.francoi...@gmail.com>>> wrote:
>
> Hello,
>

Kenton Varda

unread,
Feb 18, 2010, 4:44:35 PM2/18/10
to Romain Francois, Jason Hsueh, Protocol Buffers, Dirk Eddelbuettel
On Wed, Feb 17, 2010 at 12:19 AM, Romain Francois <romain.francoi...@gmail.com> wrote:
On 02/17/2010 12:51 AM, Kenton Varda wrote:
The Reflection interface already provides a way -- FieldSize() and
GetRepeatedX().  The only problem is that it's a bit slower than the
generated accessors because these methods aren't inlineable.

Sure. I meant "STL algorithms" iterating.

You could easily write an STL-like iterator on top of these if you really need it.
 

BTW, have you observed an actual performance problem or are you just
speculating that this performance difference may be a problem for you?

In similar (non protobuf-related) settings, we have observed quite a bit of difference between using plain loops and accessors as opposed to iterators.

It would obviously depend on the data structure involved.  For example, on an stl vector, the following two loops will have equivalent performance:

  vector<int> v;
  for (int i = 0; i < v.size(); ++i) {
    DoSomehting(v[i]);
  }

  for (vector<int>::const_iterator i = v.begin(); i != v.end(); ++i) {
    DoSomehting(*i);
  }

In fact, a good compiler may even produce identical assembly code for both loops.
 
But this is mainly speculation on the performance.

It's generally a bad idea to try to fix theoretical performance problems.

But when you use std::vector, it is best to first reserve the target size as opposed to create an empty vector and push_back each element.

It's slightly faster, but either way is still O(n).  Reserving is often not worth the effort, especially if you are good about reusing objects, in which case they will already have space reserved from the previous use.
Reply all
Reply to author
Forward
0 new messages