Built a simplified XPath-like syntax for Protocol Buffers

1,677 views
Skip to first unread message

Scott Stafford

unread,
Dec 4, 2009, 5:17:33 PM12/4/09
to Protocol Buffers
Hi -

We built a simplified XPath-like syntax (and implementation for C++)
to control the Reflection interface via strings. For instance,

message Substructure
{
required int32 id = 10;

required int32 field1 = 1;
optional int32 field2 = 2;
repeated int32 field3 = 3;
}

message Structure
{
repeated int32 field1 = 1;
repeated Substructure field2 = 2;
}

PBPath::clearField( myMessage, "field2[0]" );

would basically run myMessage.clear_field2();

and

PBPath::GetInt32( myMessage, "field2[id=50].field3[1]" );

would search myMessage.field2() for a message with id()==50, then get
the second element of Substructure.field3() and return it.

Is this something people would be interested in if I cleaned it up for
release as a separate library? Is there any interest in me submitting
a patch to build it in to the protobuf library directly in some form?
If so, what form?

The implementation involves digesting the xpath-like string into a
protobuf object that contains field ids and so on.

In case you're interested, our use case involved having a large local
Protobuf object containing configuration data. We needed a way to
have a remote process send in commands to clear and update various
parts of the configuration data object. So it sent the digested xpath
strings to clear various subtrees or elements. Updating was easy, we
could simply send in a new object with all the updates set and Merge
it in...

Kenton Varda

unread,
Dec 4, 2009, 6:06:03 PM12/4/09
to Scott Stafford, Protocol Buffers
How does this differ from net/proto2/util/public/field_path.h?


--

You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To post to this group, send email to prot...@googlegroups.com.
To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.



Kenton Varda

unread,
Dec 4, 2009, 6:13:09 PM12/4/09
to Scott Stafford, Protocol Buffers
Ooops!  I mixed up my mailing lists.  Sigh.

Someone recently wrote something similar internally but it has not made it into a public release.  That's what I was referring to.  We should probably get more of our random utility code out into the open.  :/

Scott Stafford

unread,
Dec 7, 2009, 9:49:00 AM12/7/09
to Protocol Buffers
Ah, then I guess you are (well, would have been) interested. ;) Are
you allowed to forward that header you referenced to me? I'm curious
how the implementations overlap. I'll let you know if ours has any
value to add.

On Dec 4, 6:13 pm, Kenton Varda <ken...@google.com> wrote:
> Ooops!  I mixed up my mailing lists.  Sigh.
>
> Someone recently wrote something similar internally but it has not made it
> into a public release.  That's what I was referring to.  We should probably
> get more of our random utility code out into the open.  :/
>
>
>
> On Fri, Dec 4, 2009 at 3:06 PM, Kenton Varda <ken...@google.com> wrote:
> > How does this differ from net/proto2/util/public/field_path.h?
>
> >> protobuf+u...@googlegroups.com<protobuf%2Bunsubscribe@googlegroups.c om>
> >> .

Kenton Varda

unread,
Dec 9, 2009, 3:33:31 PM12/9/09
to Scott Stafford, Protocol Buffers
Well, the reason it's not in the open already is because it would take some work to clean up for release, and no one has time.  Sending it to you would require the same work.  :/

But I suppose pasting the interface here can't hurt!

It looks like you have a syntax for selectors (e.g. "field2[id=50]") which our code doesn't have.  I think these seem useful, but also add significant complication which many users wouldn't need.  I wonder if it makes sense for both of these tools to exist, so that programmers can choose the one that suits their needs.

class FieldPath {
 public:
  // root_type is the descriptor for the type of messages to be passed as input
  // to Follow*(). Append*() and Parse() also use this for validity checking.
  explicit FieldPath(const Descriptor* root_type);

  // Replaces this path with one described like: "field_name[0].foo.bar[13]".
  // Returns false if an error occurs with either the format of the string or
  // with mismatches of type (e.g. a subfield of an integer) or label (i.e. a
  // repeated field lacks an index or vice versa).
  bool Parse(const string& path_string);

  // Replaces the contents of this path with the contents of another.
  void CopyFrom(const FieldPath& new_path);

  // Clear all fields from path.
  void Clear();

  // Clear all fields and change root type.
  void Reset(const Descriptor* new_root_type);

  inline const Descriptor* GetRootType() { return root_type_; }

  // Number of fields on this path. Does not count the root as a field.
  inline int GetLength() const { return path_.size(); }

  // True if GetLength() == 0.
  inline bool IsEmpty() const { return GetLength() == 0; }

  // Gets one field descriptor and index from path. If the field is not
  // repeated, then repeated_field_index is set to -1.
  void GetPart(int part_index, const FieldDescriptor** field,
               int* repeated_field_index) const;

  // Gets the descriptor for the last part -- i.e. what you'd get if you called
  // GetPart() with part_index = GetLength() - 1.
  inline const FieldDescriptor* GetLastField() const {
    CHECK(!path_.empty());
    return path_.back().field;
  }

  // Gets the index for the last part -- i.e. what you'd get if you called
  // GetPart() with part_index = GetLength() - 1.
  inline int GetLastIndex() const {
    CHECK(!path_.empty());
    return path_.back().index;
  }

  // If empty, then returns the root type.  Otherwise, returns
  // GetLastField()->message_type(), unless the last field is not a
  // message-typed field, in which case this returns NULL.
  inline const Descriptor* GetMessageType() const {
    if (path_.empty()) {
      return root_type_;
    } else if (GetLastField()->cpp_type() == FieldDescriptor::CPPTYPE_MESSAGE) {
      return GetLastField()->message_type();
    } else {
      return NULL;
    }
  }

  // Adds a subfield to get from the innermost field in the current path.
  // Index is ignored if field is not repeated.
  // REQUIRES:  field->containing_type() == GetMessageType()
  void AppendPart(const FieldDescriptor*, int index);

  // Removes and deletes the last num_to_remove items from the path.
  void RemoveLast(int num_to_remove);

  // Same as Parse, but appends to this path instead of replacing it.
  bool ParseAndAppend(const string& path_string);

  // Appends the fields contained by sub_path to this path.
  // REQUIRES:  sub_path.GetRootType() == GetMessageType()
  void Append(const FieldPath& sub_path);

  // Follows the path starting from the given base message.  *message is filled
  // in with the immediate parent of the field at the end of the path, *field is
  // filled in with the final field's descriptor, and *index with the index if
  // the final field is repeated (-1 otherwise).  You can then use reflection
  // to query the field value.
  //
  // Returns false if the field path contains an out-of-bounds index (which is
  // the only error that can't be detected at parsing time); in this case
  // *parent, *field, and *index are filled in with the values for the
  // out-of-bounds field.
  bool Follow(const Message& base, const Message** parent,
              const FieldDescriptor** field, int* index) const;

  // Like Follow() but creates each sub-message in the path if it isn't already
  // there. Note that if a repeated field has an out-of-bounds index, this
  // returns false -- it does not attempt to resize the array.
  bool FollowMutable(Message* base, Message** parent,
                     const FieldDescriptor** field, int* index) const;

 private:
  struct PathPart {
    PathPart() : field(NULL), index(-1) {}
    PathPart(const FieldDescriptor* new_field, int new_index)
      : field(new_field), index(new_index) {}
    const FieldDescriptor* field;
    int index;
  };

  std::vector<PathPart> path_;
  const Descriptor* root_type_;

  DISALLOW_COPY_AND_ASSIGN(FieldPath);
};


To unsubscribe from this group, send email to protobuf+u...@googlegroups.com.

Pavel Shramov

unread,
Dec 18, 2009, 2:21:52 AM12/18/09
to Scott Stafford, Protocol Buffers
On Fri, Dec 04, 2009 at 02:17:33PM -0800, Scott Stafford wrote:
> Hi -
>
> We built a simplified XPath-like syntax (and implementation for C++)
> to control the Reflection interface via strings. For instance,
... skip ...

> PBPath::GetInt32( myMessage, "field2[id=50].field3[1]" );
>
> would search myMessage.field2() for a message with id()==50, then get
> the second element of Substructure.field3() and return it.
May You share details about query syntax?


> Is this something people would be interested in if I cleaned it up for
> release as a separate library? Is there any interest in me submitting
> a patch to build it in to the protobuf library directly in some form?
> If so, what form?

> The implementation involves digesting the xpath-like string into a
> protobuf object that contains field ids and so on.
So You have formal (protobuf) description of this syntax?

I would be greatful if you share these parts :)

Pavel

Scott Stafford

unread,
Dec 18, 2009, 10:01:36 AM12/18/09
to Protocol Buffers
We don't have a formal spec of it or anything, but from what I can
tell we evolved the same syntax as the internal Google effort. No
frills, just a sequence of N field names separated by dots (.). I'll
try to write out the rules in English, but I might mess this up since
I'm just doing it from memory...

The 1st through N-1th field can be:

a required/optional Message field_name
a repeated Message field_name subscripted by [INDEX|
SUBFIELD_NAME=value]

and the Nth field could be:

a repeated field_name subscripted by [INDEX|SUBFIELD_NAME=value]
a required, optional, or repeated field_name

INDEX is any integer, where values >=0 indicate the sequence number of
the element in the repeated field, or <0 in which case it counts from
the last item, Python-style, so [-1] is the last element.

SUBFIELD_NAME can be any [required|optional] [string|int*] field in
the field_name Message.

Does that make sense? Sorry it's not a formal syntax, I never took
that class. ;) I just wrote the parser directly, using
boost::tokenizer.

If you do write up the formal syntax, please post it!

Scott

On Dec 18, 2:21 am, Pavel Shramov <shra...@mexmat.net> wrote:
> On Fri, Dec 04, 2009 at 02:17:33PM -0800, Scott Stafford wrote:
> > Hi -
>

> > We built a simplifiedXPath-like syntax (and implementation for C++)


> > to control the Reflection interface via strings.  For instance,
> ... skip ...
> > PBPath::GetInt32( myMessage, "field2[id=50].field3[1]" );
>
> > would search myMessage.field2() for a message with id()==50, then get
> > the second element of Substructure.field3() and return it.
>
> May You share details about query syntax?
>
> > Is this something people would be interested in if I cleaned it up for
> > release as a separate library?  Is there any interest in me submitting
> > a patch to build it in to the protobuf library directly in some form?
> > If so, what form?

> > The implementation involves digesting thexpath-like string into a

Pavel Shramov

unread,
Dec 18, 2009, 1:04:52 PM12/18/09
to Scott Stafford, Protocol Buffers
On Fri, Dec 18, 2009 at 07:01:36AM -0800, Scott Stafford wrote:
> We don't have a formal spec of it or anything, but from what I can
> tell we evolved the same syntax as the internal Google effort. No
> frills, just a sequence of N field names separated by dots (.). I'll
> try to write out the rules in English, but I might mess this up since
> I'm just doing it from memory...
>
> The 1st through N-1th field can be:
>
> a required/optional Message field_name
> a repeated Message field_name subscripted by [INDEX|
> SUBFIELD_NAME=value]
>
> and the Nth field could be:
>
> a repeated field_name subscripted by [INDEX|SUBFIELD_NAME=value]
> a required, optional, or repeated field_name
>
> INDEX is any integer, where values >=0 indicate the sequence number of
> the element in the repeated field, or <0 in which case it counts from
> the last item, Python-style, so [-1] is the last element.
>
> SUBFIELD_NAME can be any [required|optional] [string|int*] field in
> the field_name Message.
>
> Does that make sense? Sorry it's not a formal syntax, I never took
> that class. ;) I just wrote the parser directly, using
> boost::tokenizer.
>
> If you do write up the formal syntax, please post it!
As I understand formal syntax for Your 'xpath' in BNF like form
looks like:

query ::= query-part | query "." query-part
query-part ::= field-name | field-name "[" selector "]"
selector ::= number | attr-query
attr-query ::= field-name "=" field-value
field-value ::= number | '"' text '"'
field-name ::= text

Here is form with extended predicates that I'd like to see:

query ::= query-part | query "." query-part
query-part ::= field-name | field-name "[" selector "]"
selector ::= number | attr-query
attr-query ::= attr-match | "!" attr-query | "(" attr-query ") | \
attr-query "or" attr-query | attr-query "and" attr-query
attr-match ::= field-name | field-name "=" field-value
field-value ::= number | '"' text '"'
field-name ::= text

Is it acceptable? I'll try to create python implementation of last form.

Pavel

Pavel Shramov

unread,
Sep 17, 2010, 1:19:40 AM9/17/10
to Greg Adair, Scott Stafford, Protocol Buffers
On Thu, Sep 16, 2010 at 05:37:03PM -0700, Greg Adair wrote:
> Hello Pavel and Scott,
>
> Did this XPath-like functionality for protocol buffers end up going
> anywhere?
>
> I would definitely be interested in using and extending what you guys
> described. I need what is described on the message board, plus I
> would add logical, arithmetic, and comparison expressions.
My python implementation may be found at [1]. It has comparison and simple
logical tests (see tests at [2]). Currently I'm not using it but if you have
suggestions/patches how to improve it I'll try to improve it.

Pavel

P.S. Added list to CC to return thread back there.

--
[1] http://psha.org.ru/git?p=psha/pbpath;a=summary
[2] http://psha.org.ru/git?p=psha/pbpath;a=tree;f=tests;hb=HEAD

Greg Adair

unread,
Sep 17, 2010, 4:50:54 PM9/17/10
to Protocol Buffers
Pavel,

Just curious, is your code licensed under the New BSD license like
Protocol Buffers is?
My employer's lawyers are very particular about licensing when it
comes to what I can use or contribute to for use in a product.

--Greg Adair

Pavel Shramov

unread,
Sep 19, 2010, 5:53:09 AM9/19/10
to Greg Adair, Protocol Buffers
On Fri, Sep 17, 2010 at 01:50:54PM -0700, Greg Adair wrote:
> Just curious, is your code licensed under the New BSD license like
> Protocol Buffers is?
> My employer's lawyers are very particular about licensing when it
> comes to what I can use or contribute to for use in a product.
Since most of my code is quiet useless I've not bothering with licenses :)
Personally I prefer GPL but if it's not suitable for Your needs then let
it be New BSD. I'll add LICENSE file today.
Pavel

Greg Adair

unread,
Sep 20, 2010, 10:11:59 PM9/20/10
to Protocol Buffers
I'm not one that really cares much about licenses, but I work for a
very large open source friendly company. Unfortunately their lawyers
won't let us use or contribute to anything GPL, but are very friendly
to MIT, BSD, and other non-viral licenses.

I'm developing primarily in C++, but there is a chunk of my product
that is in python. I think I'll start by porting your python stuff to
C++, then I'll start adding new operators. When i'm satisfied, I will
backport the additions to python.

--Greg
Reply all
Reply to author
Forward
0 new messages