TODO list

38 views
Skip to first unread message

Kenton Varda

unread,
Jul 15, 2008, 3:53:43 PM7/15/08
to Protocol Buffers
Here's my TODO list.  I hope to burn through all the small things this week.  The medium things will hopefully come before protobuf leaves beta (within a month or two).  The large things are things I probably won't have time to do for months, but if someone else wants to take ownership of some of these, please let me know!

Note that all assigned issues through issue 19 in the issues list should be covered here.

Small things:
- PyPI source package needs better readme.
- Add links to various per-language binding projects
  - (did I miss any?)
- "package foo;package bar;" should leave package as "foo" or "bar"
  (but this is a parse error anyway so not very important)
- Add FAQ about "protocol buffers" name
- Throw InvalidProtocolBufferException when parsing invalid UTF-8 in Java.
- Fix fully-qualified output paths in Windows. (issue 13)
- "bytes"-typed fields should not have set_foo(const char*) accessor.  (issue 19)
- Apply Johan's patch to fix Python CallMethod().  (issue 16)
- Fix static initialization ordering bug that causes crashes on OSX when
  compiling with static linking. (issue 17)
- Publish Alex Storm's .proto grammar.
- Apply Kevin Ko's patch to allow trailing slashes in proto_path names,
  or do something equivalent.
- Add suggestion in README that people use --prefix=/usr. (issue 1)
- Fix Python text_format_test on platforms which print redundant zeros
  in exponents.  (issue 5)
- Improve Java README to explain how to compile without Maven. (issue 7)
- Improve error message when two enums defined in the same scope contain
  values with the same name.  (issue 12)
- Mention vsprojects directory in top-level readme.

Medium things:
- Create a tool for converting between text and binary formats from the
  command line (probably just a new flag to protoc).
- Add a flag to protoc which writes the FileDescriptorProtos for all parsed
  files to a single output file.  (issue 15)
- Create a realistic public benchmark by obfuscating an internal Google protocol.
- Consider using static linking in more cases.

Large things:
- Make code generators deal with valid .proto definitions that do not
  compile in the target language (e.g. if you name a field "descriptor").
  (issue 4)
- Create a better test suite to help people writing implementations in
  other languages.
- Make protoc retain doc comments when parsing, and perhaps write them out
  again as doc comments in the generated code.
- Create protoc Ant task.
- Provide a [ctype=WSTRING] option allowing strings to be represented using
  std::wstring in C++.
- Port other ctype options currently missing from open source release.
- Devise a better way to compile the same .proto file with differing
  options.
- Support maps.
- Support "packed" repetaed fields.
- Support lazy parsing of sub-messages.
- Allow people to define their own options by defining extensions of
  the *Options protos in descriptor.proto.

Alek Storm

unread,
Jul 15, 2008, 4:11:17 PM7/15/08
to Protocol Buffers
On Jul 15, 2:53 pm, "Kenton Varda" <ken...@google.com> wrote:
> - Publish Alex Storm's .proto grammar.

Very small issue: my name is *Alek* Storm. Don't worry about it, I'm
used to it :). And thanks for publishing it.

Alkis Evlogimenos ('Αλκης Ευλογημένος)

unread,
Jul 15, 2008, 8:03:23 PM7/15/08
to Kenton Varda, Protocol Buffers
I put a patch up for issue 19 for you to review.
--

Alkis

Henner Zeller

unread,
Jul 16, 2008, 6:13:20 AM7/16/08
to Kenton Varda, Protocol Buffers
> Small things:
> - PyPI source package needs better readme.
> - Add links to various per-language binding projects
> - Ruby: http://code.google.com/p/protobuf-ruby/
> - Haskell: http://darcs.haskell.org/packages/protocol-buffers/
> - Erlang: http://github.com/tim/erlang-protobuf/tree/master
> - Perl: http://groups.google.com/group/protobuf-perl
> - C#: http://github.com/jskeet/dotnet-protobufs/tree/master
> - (did I miss any?)

Lisp: http://code.google.com/p/cl-protobuf/

Matt

unread,
Jul 16, 2008, 9:36:21 PM7/16/08
to Protocol Buffers
> Here's my TODO list.  I hope to burn through all the small things this week.
>  The medium things will hopefully come before protobuf leaves beta (within a
> month or two).  The large things are things I probably won't have time to do
> for months, but if someone else wants to take ownership of some of these,
> please let me know!

I think that another thing that would be useful is to somehow detect
the problem that two folks have mentioned having on windows. (Where
they are using a debug build of the program and a release build of
libprotobuf.dll.) It would be nice if this either raised a runtime
error or just worked. (I don't see why it shouldn't just work...)

Matt

Kenton Varda

unread,
Jul 16, 2008, 9:50:47 PM7/16/08
to Matt, Protocol Buffers
On Wed, Jul 16, 2008 at 6:36 PM, Matt <ma...@tolton.com> wrote:
I think that another thing that would be useful is to somehow detect
the problem that two folks have mentioned having on windows.  (Where
they are using a debug build of the program and a release build of
libprotobuf.dll.)  It would be nice if this either raised a runtime
error or just worked. (I don't see why it shouldn't just work...)

I don't understand why it doesn't "just work" either.  It would be great if someone who knows MSVC better could help figure this out.  I vaguely suspect that MSVC uses different STL implementations in debug vs. opt mode and thus passing STL objects across the DLL boundary doesn't work, but I have no direct evidence to support this hypothesis at present.  If this is the problem, it's unfixable -- the protobuf library uses STL heavily in its interface (particularly the string class).

Matt

unread,
Jul 17, 2008, 12:17:55 AM7/17/08
to Protocol Buffers
> I don't understand why it doesn't "just work" either.  It would be great if
> someone who knows MSVC better could help figure this out.  I vaguely suspect
> that MSVC uses different STL implementations in debug vs. opt mode and thus
> passing STL objects across the DLL boundary doesn't work, but I have no
> direct evidence to support this hypothesis at present.  If this is the
> problem, it's unfixable -- the protobuf library uses STL heavily in its
> interface (particularly the string class).

Ok. I asked a friend who knows about Windows. He thinks it's the msvc
runtime library being linked to, just like you mentioned. See:
http://msdn.microsoft.com/en-us/library/abx4dbyh(VS.80).aspx

So, since it can't "just work", do you think that perhaps the
GOOGLE_PROTOBUF_VERIFY_VERSION macro should also check to make sure
the runtime libs are the same? It looks like, at the very least, you
could pass in whether the _DEBUG macro was defined in the client code
(according to the aforementioned article, it needs to be defined when
linking to a debug dll) and then compare it to whether it was defined
when compiling the dll.

Kenton Varda

unread,
Jul 17, 2008, 12:58:08 AM7/17/08
to Matt, Protocol Buffers
On Wed, Jul 16, 2008 at 9:17 PM, Matt <ma...@tolton.com> wrote:
Ok. I asked a friend who knows about Windows.  He thinks it's the msvc
runtime library being linked to, just like you mentioned.  See:
http://msdn.microsoft.com/en-us/library/abx4dbyh(VS.80).aspx

The page suggests the problem would go away if libprotobuf were a static library instead of a DLL.  It seems like it might be a good idea to compile it as such anyway.  It's likely that future versions of libprotobuf will not be binary-compatible with old versions due to the nature of C++, so it's probably best that people statically link regardless.  The only problem is that means people cannot pass protobuf objects across DLL boundaries.  But it's not hard for people to recompile the source as a DLL if they really need it.

Alek Storm

unread,
Jul 17, 2008, 2:40:46 AM7/17/08
to Protocol Buffers
I think the documentation needs to be updated to answer some questions
that keep being asked - succintly,
Q. "Do Protocol Buffers support streaming?"
A. No, not yet, it requires a complete redesign.
Q. "How do I specify the type of a message?"
A. Any of the following: use your own container format, specify a type
field in the message that's an enum value or string of the type, use a
container message with one of the message fields filled in.

These seem to lend themselves easily to FAQs, but the current FAQs are
much more high-level.

Matt Tolton

unread,
Jul 17, 2008, 2:41:40 AM7/17/08
to Kenton Varda, Protocol Buffers
> The page suggests the problem would go away if libprotobuf were a static
> library instead of a DLL. It seems like it might be a good idea to compile

Out of curiosity, how do you arrive at this? If the libprotobuf code
is compiled using headers with _DEBUG defined, and your code isn't (or
visa versa), isn't it easily possible that the structure of objects
and such could change in the headers due to that flag?

Scott Woods

unread,
Jul 17, 2008, 3:48:35 AM7/17/08
to prot...@googlegroups.com
Hi,

New to this list but maybe this is useful.

If you are allocating an object (new) in one loadable module (e.g. EXE or
DLL) and
deallocating (delete) in another, then that is your problem. Each module is
running its
own heap.

That's not a problem attributable to STL or protocol buffers. Its
lower-level than that.

There is a workaround but its limited. You have to be the owner+author of
the
DLL because you need to revector the global new+delete operators to use the
parent load module (i.e. EXE).

Cheers.

zwetan

unread,
Jul 17, 2008, 10:47:33 AM7/17/08
to Protocol Buffers


Hi Kenton,

On Jul 15, 8:53 pm, "Kenton Varda" <ken...@google.com> wrote:
[...]
I could plan a port for AS3, is there a prefered way to do that ?

is it ok if I work on my own branch and submit a big patch,
or is it preferrable to submit small patch one after another ?

cheers,
zwetan

Kenton Varda

unread,
Jul 17, 2008, 12:41:18 PM7/17/08
to zwetan, Protocol Buffers
On Thu, Jul 17, 2008 at 7:47 AM, zwetan <zwe...@gmail.com> wrote:
I could plan a port for AS3, is there a prefered way to do that ?

You just have to write a new CodeGenerator and hook it into a custom compiler.  Check out this page:

 
is it ok if I work on my own branch and submit a big patch,
or is it preferrable to submit small patch one after another ?

It's probably best for you to create a separate project on googlecode so that you have your own SVN tree for now.  Once the code is mature we can think about merging it into the core project.  Send me a link to your project once it is up and I'll add it to the list.

Kenton Varda

unread,
Jul 17, 2008, 12:44:35 PM7/17/08
to Matt Tolton, Protocol Buffers
On Wed, Jul 16, 2008 at 11:41 PM, Matt Tolton <ma...@tolton.com> wrote:
Out of curiosity, how do you arrive at this?  If the libprotobuf code
is compiled using headers with _DEBUG defined, and your code isn't (or
visa versa), isn't it easily possible that the structure of objects
and such could change in the headers due to that flag?

The page suggested that static libraries originally built against the old msvcrt.dll will work when linked against newer C runtimes, even though having multiple DLLs linked against different C runtimes apparently doesn't work.  I was hypothesizing that this applies to debug vs. release too.  But I haven't actually tried it; maybe I'm wrong.

Kenton Varda

unread,
Jul 17, 2008, 12:48:40 PM7/17/08
to Scott Woods, prot...@googlegroups.com
On Thu, Jul 17, 2008 at 12:48 AM, Scott Woods <scott....@gmail.com> wrote:
If you are allocating an object (new) in one loadable module (e.g. EXE or
DLL) and
deallocating (delete) in another, then that is your problem. Each module is
running its
own heap.

Is that really still a problem?  I would have expected the tests to reveal this, as I'm pretty sure there are multiple places where memory allocated in libprotobuf.dll is deleted in tests.exe, or vice versa.  If it is still a problem, I think we should definitely just switch to static linking.

I do not understand why Microsoft made this design decision.

Scott Woods

unread,
Jul 17, 2008, 1:18:22 PM7/17/08
to prot...@googlegroups.com
The issue is very real. Once you start playing with complex combinations
of libraries it's not uncommon to have one static library and one dynamic
library and both of them 3rd party, i.e. you have no ability to produce
different versions.

Its not just the heap but anything that is of global scope in the
C runtime - each loadable module will have its own copy.

Why did MS do it that way? How is the loader going to know the
addresses of C runtime objects in an EXE and set them in each
freshly loaded DLL? Loaders dont know about C runtimes or even
C. I've found the symptoms of this pretty inconvenient too but I
cant see that there is any design/implementation question to
answer here.

ps:
This is a re-send of mail I sent direct to Kenton. Think I've been
treating this list the wrong way. Sorted now.

Kenton Varda

unread,
Jul 17, 2008, 1:21:56 PM7/17/08
to Scott Woods, prot...@googlegroups.com
On Thu, Jul 17, 2008 at 10:18 AM, Scott Woods <scott....@gmail.com> wrote:
Why did MS do it that way? How is the loader going to know the
addresses of C runtime objects in an EXE and set them in each
freshly loaded DLL? Loaders dont know about C runtimes or even
C. I've found the symptoms of this pretty inconvenient too but I
cant see that there is any design/implementation question to
answer here.

OK, but I don't know of any other OS that has this problem.

gsxr

unread,
Jul 17, 2008, 1:32:08 PM7/17/08
to Protocol Buffers


On Jul 18, 5:21 am, "Kenton Varda" <ken...@google.com> wrote:
> On Thu, Jul 17, 2008 at 10:18 AM, Scott Woods <scott.suz...@gmail.com>
> wrote:
>
> > Why did MS do it that way? How is the loader going to know the
> > addresses of C runtime objects in an EXE and set them in each
> > freshly loaded DLL? Loaders dont know about C runtimes or even
> > C. I've found the symptoms of this pretty inconvenient too but I
> > cant see that there is any design/implementation question to
> > answer here.
>
> OK, but I don't know of any other OS that has this problem.

Having written a bunch of loader code in the past I'm curious. What
other
way is there for OSs to do it? I spent years on *nix platforms a long
time ago
but I dont remember them pulling off anything fundamentally different
(cleaner
for sure ;-)

Do you have complex activity running across loadable modules on *nix?

Cheers.

Kenton Varda

unread,
Jul 17, 2008, 1:47:28 PM7/17/08
to gsxr, Protocol Buffers
On Thu, Jul 17, 2008 at 10:32 AM, gsxr <scott....@gmail.com> wrote:
Having written a bunch of loader code in the past I'm curious. What
other
way is there for OSs to do it?

I'm no expert on the subject, but my guess is that as long as all the modules link against a single libc.so, there's no problem sharing the heap.  If any one module chooses to statically link against libc.so (but why would you statically link a dynamic library?) then you have a problem.

In fact I'm having a hard time figuring out how the Windows malloc() implementation manages to have different heaps for each DLL.  Wouldn't that mean that it has to explicitly look up the call stack to find out which module called malloc() in order to place the memory on that module's heap?  Or is a separate copy of malloc() linked into each DLL?

This is getting pretty off-topic...

gsxr

unread,
Jul 17, 2008, 4:06:11 PM7/17/08
to Protocol Buffers


On Jul 18, 5:47 am, "Kenton Varda" <ken...@google.com> wrote:
> On Thu, Jul 17, 2008 at 10:32 AM, gsxr <scott.suz...@gmail.com> wrote:

> In fact I'm having a hard time figuring out how the Windows malloc()
> implementation manages to have different heaps for each DLL. Wouldn't that
> mean that it has to explicitly look up the call stack to find out which
> module called malloc() in order to place the memory on that module's heap?
> Or is a separate copy of malloc() linked into each DLL?
>
> This is getting pretty off-topic...

True. I suppose the relevant thing here is the problem that arises
when
allocation and deallocating across (Windows) loadable modules.

I know that this problem exists but I cannot confirm that this is the
underlying cause of your problems. Simple to test I would guess.

Cheers.
Reply all
Reply to author
Forward
0 new messages