2.1.0 release is up

Kenton Varda

unread,

May 13, 2009, 7:04:25 PM5/13/09

to Protocol Buffers

http://code.google.com/p/protobuf/downloads/list

Aaaand, I just realized that CHANGES.txt still has the release date as ????. :(

/me is not very good at release engineering.

Oh well.

Kenton Varda

unread,

May 13, 2009, 7:06:04 PM5/13/09

to Protocol Buffers

Here's the major changes (from CHANGES.txt):

General

* Repeated fields of primitive types (types other that string, group, and

nested messages) may now use the option [packed = true] to get a more

efficient encoding. In the new encoding, the entire list is written

as a single byte blob using the "length-delimited" wire type. Within

this blob, the individual values are encoded the same way they would

be normally except without a tag before each value (thus, they are

tightly "packed").

* For each field, the generated code contains an integer constant assigned

to the field number. For example, the .proto file:

message Foo { optional int bar_baz = 123; }

would generate the following constants, all with the integer value 123:

C++: Foo::kBarBazFieldNumber

Java: Foo.BAR_BAZ_FIELD_NUMBER

Python: Foo.BAR_BAZ_FIELD_NUMBER

Constants are also generated for extensions, with the same naming scheme.

These constants may be used as switch cases.

* Updated bundled Google Test to version 1.3.0. Google Test is now bundled

in its verbatim form as a nested autoconf package, so you can drop in any

other version of Google Test if needed.

* optimize_for = SPEED is now the default, by popular demand. Use

optimize_for = CODE_SIZE if code size is more important in your app.

* It is now an error to define a default value for a repeated field.

Previously, this was silently ignored (it had no effect on the generated

code).

* Fields can now be marked deprecated like:

optional int32 foo = 1 [deprecated = true];

Currently this does not have any actual effect, but in the future the code

generators may generate deprecation annotations in each language.

* Cross-compiling should now be possible using the --with-protoc option to

configure. See README.txt for more info.

protoc

* --error_format=msvs option causes errors to be printed in Visual Studio

format, which should allow them to be clicked on in the build log to go

directly to the error location.

* The type name resolver will no longer resolve type names to fields. For

example, this now works:

message Foo {}

message Bar {

optional int32 Foo = 1;

optional Foo baz = 2;

}

Previously, the type of "baz" would resolve to "Bar.Foo", and you'd get

an error because Bar.Foo is a field, not a type. Now the type of "baz"

resolves to the message type Foo. This change is unlikely to make a

difference to anyone who follows the Protocol Buffers style guide.

C++

* Several optimizations, including but not limited to:

- Serialization, especially to flat arrays, is 10%-50% faster, possibly

more for small objects.

- Several descriptor operations which previously required locking no longer

do.

- Descriptors are now constructed lazily on first use, rather than at

process startup time. This should save memory in programs which do not

use descriptors or reflection.

- UnknownFieldSet completely redesigned to be more efficient (especially in

terms of memory usage).

- Various optimizations to reduce code size (though the serialization speed

optimizations increased code size).

* Message interface has method ParseFromBoundedZeroCopyStream() which parses

a limited number of bytes from an input stream rather than parsing until

EOF.

* GzipInputStream and GzipOutputStream support reading/writing gzip- or

zlib-compressed streams if zlib is available.

(google/protobuf/io/gzip_stream.h)

* DescriptorPool::FindAllExtensions() and corresponding

DescriptorDatabase::FindAllExtensions() can be used to enumerate all

extensions of a given type.

* For each enum type Foo, protoc will generate functions:

const string& Foo_Name(Foo value);

bool Foo_Parse(const string& name, Foo* result);

The former returns the name of the enum constant corresponding to the given

value while the latter finds the value corresponding to a name.

* RepeatedField and RepeatedPtrField now have back-insertion iterators.

* String fields now have setters that take a char* and a size, in addition

to the existing ones that took char* or const string&.

* DescriptorPool::AllowUnknownDependencies() may be used to tell

DescriptorPool to create placeholder descriptors for unknown entities

referenced in a FileDescriptorProto. This can allow you to parse a .proto

file without having access to other .proto files that it imports, for

example.

* Updated gtest to latest version. The gtest package is now included as a

nested autoconf package, so it should be able to drop new versions into the

"gtest" subdirectory without modification.

Java

* Fixed bug where Message.mergeFrom(Message) failed to merge extensions.

* Message interface has new method toBuilder() which is equivalent to

newBuilderForType().mergeFrom(this).

* All enums now implement the ProtocolMessageEnum interface.

* Setting a field to null now throws NullPointerException.

* Fixed tendency for TextFormat's parsing to overflow the stack when

parsing large string values. The underlying problem is with Java's

regex implementation (which unfortunately uses recursive backtracking

rather than building an NFA). Worked around by making use of possesive

quantifiers.

* Generated service classes now also generate pure interfaces. For a service

Foo, Foo.Interface is a pure interface containing all of the service's

defined methods. Foo.newReflectiveService() can be called to wrap an

instance of this interface in a class that implements the generic

RpcService interface, which provides reflection support that is usually

needed by RPC server implementations.

* RPC interfaces now support blocking operation in addition to non-blocking.

The protocol compiler generates separate blocking and non-blocking stubs

which operate against separate blocking and non-blocking RPC interfaces.

RPC implementations will have to implement the new interfaces in order to

support blocking mode.

* New I/O methods parseDelimitedFrom(), mergeDelimitedFrom(), and

writeDelimitedTo() read and write "delemited" messages from/to a stream,

meaning that the message size precedes the data. This way, you can write

multiple messages to a stream without having to worry about delimiting

them yourself.

* Throw a more descriptive exception when build() is double-called.

* Add a method to query whether CodedInputStream is at the end of the input

stream.

* Add a method to reset a CodedInputStream's size counter; useful when

reading many messages with the same stream.

* equals() and hashCode() now account for unknown fields.

Python

* Added slicing support for repeated scalar fields. Added slice retrieval and

removal of repeated composite fields.

* Updated RPC interfaces to allow for blocking operation. A client may

now pass None for a callback when making an RPC, in which case the

call will block until the response is received, and the response

object will be returned directly to the caller. This interface change

cannot be used in practice until RPC implementations are updated to

implement it.

* Changes to input_stream.py should make protobuf compatible with appengine.

Kenton Varda

unread,

May 13, 2009, 7:21:05 PM5/13/09

to Protocol Buffers

Updated documentation covering all this has been submitted and should go live in a couple hours.

http://code.google.com/apis/protocolbuffers/docs/overview.html

Henner Zeller

unread,

May 13, 2009, 7:41:01 PM5/13/09

to Kenton Varda, Protocol Buffers

Thanks for releasing!

Good enough, these things happen. Thanks for your continuous support
and hard work!

-h

Peter K.

unread,

May 13, 2009, 8:35:32 PM5/13/09

to Protocol Buffers

Good job, Kenton!

Thanks for your efforts.

Ciao,

Peter K.

clint.foster

unread,

May 14, 2009, 10:18:28 AM5/14/09

to Protocol Buffers

It's very nice to see support in the API for length-prefixed messages
and blocking RPC's. Both will reduce the amount of boilerplate code
needed for many protobuf applications.

Antony Dovgal

unread,

May 14, 2009, 10:27:21 AM5/14/09

to clint.foster, Protocol Buffers

On 14.05.2009 18:18, clint.foster wrote:
> It's very nice to see support in the API for length-prefixed messages

Yes, native support for this kind of feature would be very welcome.

--
Wbr,
Antony Dovgal

Kenton Varda

unread,

May 14, 2009, 4:02:47 PM5/14/09

to Antony Dovgal, clint.foster, Protocol Buffers

Yep, it's there in Java. I didn't get the chance to add the equivalent support to C++ or Python yet, but if someone wants to submit a patch, go for it.

Chris

unread,

May 15, 2009, 11:51:11 AM5/15/09

to Protocol Buffers

Kenton Varda wrote:
> Here's the major changes (from CHANGES.txt):
>
> General
> * Repeated fields of primitive types (types other that string,
> group, and
> nested messages) may now use the option [packed = true] to get a more
> efficient encoding. In the new encoding, the entire list is written
> as a single byte blob using the "length-delimited" wire type. Within
> this blob, the individual values are encoded the same way they would
> be normally except without a tag before each value (thus, they are
> tightly "packed").

I see http://code.google.com/apis/protocolbuffers/docs/proto.html has
been updated.
I will add Haskell support for this.

> * For each field, the generated code contains an integer constant
> assigned
> to the field number. For example, the .proto file:
> message Foo { optional int bar_baz = 123; }
> would generate the following constants, all with the integer value
> 123:
> C++: Foo::kBarBazFieldNumber
> Java: Foo.BAR_BAZ_FIELD_NUMBER
> Python: Foo.BAR_BAZ_FIELD_NUMBER
> Constants are also generated for extensions, with the same naming
> scheme.
> These constants may be used as switch cases.

Currently the wire layer has the field number baked in; it is never
exposed except through the reflection API.

Not hard to define a bunch of Int values. But in Haskell these cannot
be used as case targets. To do that I have to create the Int values as
Enum constructors. Which is less good.

For now I'll just ignore them, and add a note about it to the user.
This will be a demand driven feature.

> other version of Google Test if needed.

> * It is now an error to define a default value for a repeated field.
> Previously, this was silently ignored (it had no effect on the
> generated
> code).

easy

> * Fields can now be marked deprecated like:
> optional int32 foo = 1 [deprecated = true];
> Currently this does not have any actual effect, but in the future
> the code
> generators may generate deprecation annotations in each language.

easy
> protoc

> * The type name resolver will no longer resolve type names to
> fields. For
> example, this now works:
> message Foo {}
> message Bar {
> optional int32 Foo = 1;
> optional Foo baz = 2;
> }
> Previously, the type of "baz" would resolve to "Bar.Foo", and
> you'd get
> an error because Bar.Foo is a field, not a type. Now the type of
> "baz"
> resolves to the message type Foo. This change is unlikely to make a
> difference to anyone who follows the Protocol Buffers style guide.

Ack, the Haskell version needs to be updated to track this change.
This means I have to go back and understand the name resolution module
in my Haskell implementation.
Hmmm....
It currently has a "resolve in environment" that returns the first hit.
I'll have to update that.
> C++

> * DescriptorPool::AllowUnknownDependencies() may be used to tell
> DescriptorPool to create placeholder descriptors for unknown entities
> referenced in a FileDescriptorProto. This can allow you to parse
> a .proto
> file without having access to other .proto files that it imports, for
> example.

hmmm....odd.
> Java

> * New I/O methods parseDelimitedFrom(), mergeDelimitedFrom(), and
> writeDelimitedTo() read and write "delemited" messages from/to a
> stream,
> meaning that the message size precedes the data. This way, you
> can write
> multiple messages to a stream without having to worry about delimiting
> them yourself.

This will help responding to that FAQ.

Chris Kuklewicz

unread,

May 17, 2009, 9:57:36 AM5/17/09

to Protocol Buffers

I am patching the Haskell implementation and I have a follow up
question to this:

On May 14, 12:06 am, Kenton Varda <ken...@google.com> wrote:
> * The type name resolver will no longer resolve type names to fields. For
> example, this now works:
> message Foo {}
> message Bar {
> optional int32 Foo = 1;
> optional Foo baz = 2;
> }
> Previously, the type of "baz" would resolve to "Bar.Foo", and you'd get
> an error because Bar.Foo is a field, not a type. Now the type of "baz"
> resolves to the message type Foo. This change is unlikely to make a
> difference to anyone who follows the Protocol Buffers style guide.

You did not fix this similar case, where the "int32 Baz" field causes
an error when trying to extend the "message Baz":

package test_resolve;

message Foo {
optional int32 Baz = 2;

extend Baz {
optional int32 nonsense = 76335;
}
}

message Baz {
extensions 100 to max;
}

I will make the Haskell version compatible with protoc-2.1.0 but
perhaps you want to make the above a legal proto file in the future.

What do people think?

Kenton Varda

unread,

May 18, 2009, 2:06:20 PM5/18/09

to Chris Kuklewicz, Protocol Buffers

You're right, this should have been handled too. Oh well, I'll stick it on my TODO list for a later release.

Hopefully most open source users follow the style guide, making this irrelevant. Inside Google we unfortunately have a lot of code that predates the style guide and uses CamelCase field names. People kept getting confused as to why code like this didn't work:

optional Foo Foo = 1;

even though similar code also would not work in any of our main programming languages (C++, Java, Python). Eventually I caved and made it a non-error so that people would stop complaining.

Chris

unread,

May 19, 2009, 3:08:51 AM5/19/09

to Protocol Buffers

As for the improved name resolution:

Kenton Varda wrote:
> On Sun, May 17, 2009 at 6:57 AM, Chris Kuklewicz <turin...@gmail.com
> <mailto:turin...@gmail.com>> wrote:
>
>
> What do people think?
>
>
> You're right, this should have been handled too. Oh well, I'll stick
> it on my TODO list for a later release.

I am quite happy to have helped. The two name resolution functions were
side by side in my code; making the decision to fix only one looked
odd. I will immediately support resolving extendee names to Messages,
ignoring Fields and other things.

As for the "packed" fields, I just now got my Haskell version to the
next stage:
(1) new new runtime and converter both compile with "packed" support
(2) it can convert the new unittest.proto into Haskell code with
"packed" support
(3) the generated Haskell code compiles against new runtime with
"packed" support
(4) it has regenerated its own descriptor.proto and been recompiled
(enums needed an extra line to get packed fields efficiently)

So the next stage is to test the behavior and see if it can
inter-operate with itself and with packed files from protobuf-2.1.0.

Making the extension fields also "packable" was tedious but did not
require redesigning anything. Whew. The "unknown" field support did
not need updating at all.

As for the newly exposed field number constants: I cannot make them a
proper enum data type in Haskell because those are closed definitions
and so could not include any of the extension fields outside the
message's own proto file. I could still make them type safe constants,
but these could not be used as targets of a case statement. The data is
available through reflection, so I will wait to implement anything else
until an actual person comes to me with a use case that I can make
design decisions for.

As for delimiting messages by prepending the length: I already had these
commands, so all I did was change the documentation from "author's
extension" to "compatible with protobuf-2.1.0". Not that I actually
tested it...

--
Chris

Reply all

Reply to author

Forward