Robotics Application

280 views

Skip to first unread message

Florian Enner

unread,

Jan 9, 2015, 1:59:33 PM1/9/15

to flatb...@googlegroups.com

Hi,

I'm currently evaluating FlatBuffers for use in a soft-real-time robotics project. In the past we have been using protobufs, but we are hitting limits with respect to determinism of the Google provided libraries.

Project

We are working on a set of modular actuators/sensors that can be combined in various ways, such as a hexapod (https://www.youtube.com/watch?v=AMVO6rI5mL4) or a snake robot (https://www.youtube.com/watch?v=wMsDg_1TQxA). We would like to be able to support simultaneous communications with hundreds of modules at rates up to 1KHz. Although protobuf works fine for most use cases, there are some corner cases that need to be very deterministic and won't allow use of dynamic memory in Java.

Constraints

- C++: must be compatible with C++98 and run on an ARM7 embedded device

- Java: limited to JDK6, <150MB Heap, no optimization flags, needs to support all major operating systems

- Determinism: no dynamic memory, no garbage collection

- Languages: at least C++ and Java, but Python and C# would be good

- Bandwidth: almost unconstrained. We will likely hit CPU limits far before we hit any bandwidth limit.

Options

So far I've been looking at a few potential solutions

1) 3rd party Java protobuf library with object pooling

Protostuff won't work due to the JDK6 requirement. The other libraries I've found seem outdated and don't provide a reflective API. I'm open for suggestions.

2) JNI to the C++ protobuf API

Providing compiled binaries for all supported systems, and auto-generating JNI wrappers is simply too much work.

3) CapnProto

Very nice concept, but we need to be able to check whether a field has been set.

4) FlatBuffers

The protocol is almost exactly what we would like to see, but the current implementation unfortunately doesn't quite fit.

Overall you guys have done a nice job with this library. I've tried to compile a list with the stuff that I've thought about in the past couple of days. I'd like to get your opinions on whether this is something that may be able to be integrated in the library. Note that so far I've only looked at the Java bindings.

Questions

A) What is the second size uint16_t after voffset_t in sub-tables used for? I understand the use case for deserializing from streams, but that would only require a size in the root object. Are those sizes used anywhere? Is it possible to modify the protocol definition to allow not setting this size for nested objects (the root-size should be enough for streaming)?

B) Why are bytes in the ByteBuffer considered immutable once written (during the building phase)? The protocol can't be streamed, so I don't see the reasoning behind this decision.

Barriers to our Adoption

a) FlatBufferBuilder uses dynamic memory

Currently each object creates a temporary byte[] to store the vtable. This creates so much garbage that the library is unusable for us. There would need to be a way to either recycle them, or the buffer would need to be built without temporary objects (preferred).

b) Writing data

As mentioned above, the current implementation uses a temporary array which generates lots of garbage. Additionally, it feels a bit rigid for such a flexible wire format. I'd like to see a more dynamic way to generate the message, similar to protobuf. I've added a text file showing some ideas (writing.txt)

c) Reading data

The current way of reading data using recycled views is good, but it's not ideal for multithreaded subscribers. In order to not lose determinism, there has to be either a global recycle bin (potential memory leak) or each reader has to keep separate object instances. Unfortunately value types are still long ways off in Java :( I'd propose an additional approach that is closer to C-style pointer manipulation, using addresses and static methods. Addresses could be stored as ints on the stack. I've attached a small sample of how I imagine that this could look like (reading.txt). It should also result in better performance due to fewer indirections.

Minor Issues

d) Don't omit default values

We have different types of modules that respond with the same message, but only populate the subset of fields that they need. Because of this, we need to know whether a field has actually been set, e.g., not sending default values makes it ambiguous whether a sensor doesn't exist or whether it's just reporting the default value. Another concern is that some modules may be running old firmware that was compiled with varying default values. What makes this even worse is that every field has a default value, even if none has been defined in the IDL file. Either there should be no default-default values, or there needs to be a way to opt-out of this.

e) Add hasField() accessors

It would be very nice to be able to check whether a field is populated. This makes most sense in combination with (d).

Suggestions

f) Add setField() into the builder

Is there a good reason why fields written to the write-buffer are considered immutable? A set() could check whether a field is already set, and then either call addField() or overwrite it.

g) Java bindings should conform to the Java guidelines

Since this is a Google project, the code should conform to the Google guidelines (https://google-styleguide.googlecode.com/svn/trunk/javaguide.html). Accessors without "get" and all the underscores look weird in a Java project.

h) Bool fields should return a boolean, not a byte

"addBoolField((byte)1)" should become "addBoolField(true)"

"boolean x = getBoolField() !=0" should become "boolean x = getBoolField()"

Overall I'm fascinated by the concept and I'm glad that you've open-sourced this. I really appreciate the work you've been putting into it. If you think that these barriers/issues could be eliminated, I'd be happy to switch and offer some help and ideas with the Java implementation.

Thanks,

- Florian

Florian Enner, Principial Systems/Software Engineer

Robotics Institute: Carnegie Mellon University

reading.txt

writing.txt

Wouter van Oortmerssen

unread,

Jan 12, 2015, 7:13:05 PM1/12/15

to Florian Enner, flatb...@googlegroups.com

Wow, those are some really cool robots! :)

I'd be happy to help to see if FlatBuffers can be made to work for you.

We are currently not C++98 compatible. See e.g. https://github.com/google/flatbuffers/issues/120

We have a C# implementation, and Python is being worked on by external contributors.

The Java interface was designed to be usable with minimal garbage generated. We can see if it can be further improved.

Your questions:

A) We were envisioning streaming really large amounts of data (as typical in game development), which would require sizes on a per object level. I agree that its not needed for most use cases. At this point it is hard to remove it from the format, since we want to retain backwards & forwards compatibility. At best we could have #define that excludes this field that can be used optionally.

B) You could modify existing fields after construction if you wanted to, its just not that useful, since you can't set any fields that weren't set before (including fields at their default value), you can insert new objects or grows strings/vectors etc (can be done in theory, but is rather complicated and inefficient).

a) I would say that is an oversight. I can easily change that code to only allocate a new byte[] if a longer one than the existing one is needed.

b) writing.txt: I agree that would be a nice API. The problem is that currently vtables are shared between all objects that have the same layout, amortizing the cost of vtables greatly. In your system, not only would they have to exist per object, they would probably have to have 32bit entries (since a new field can be all the way at the end of a buffer, and we support up to 2GB FlatBuffers). That in total is a lot of space usage.

c) reading.txt: This was my original design, but early feedback made it obvious this was a bit too low level for Java programmers. The cost of reusable accessor objects seemed low compared to the gain in type-safety, readability, and shorter code. I'm not against making this an alternative API hiding behind a command-line flag if need be.

As for accessor use, I would say that 1 accessor object per table type per thread should be acceptable, but I don't know your use case. Then again, if memory is really tight, the best optimisation of all would be to .. not use Java :P

d) The C++ implementation already has ForceDefaults(), but this was never ported to Java. This should be easy to fix.

e) I originally planned for HasField, but it is kind of meaningless in the context of the defaults optimisation. It could be added, but probably only as an option.

f) see B. I guess if you're ok in this only succeeding when the field is present, this could be added. This is a larger change since we'd have to make a LOT of code non-const in C++.

g) Hah. I don't think adding "get" to all fields would be helpful in an object that is pretty much entirely accessors. And while the implementation code follows Google standards, for the generated code I try to be a little bit more agnostic, as it is used by people that could be using all sorts of standards.

And.. underscores? we convert them to camelCase in Java. Where do you see them?

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wouter van Oortmerssen

unread,

Jan 12, 2015, 7:14:18 PM1/12/15

to Florian Enner, flatb...@googlegroups.com

(missed one):

h): Agreed, this can be fixed.

Florian Enner

unread,

Jan 12, 2015, 10:56:42 PM1/12/15

to flatb...@googlegroups.com, floria...@gmail.com

Thank you for your elaborate response. We've done some tests and it seems that we can get C++11 running. FlatBuffers looks like the best candidate so far.

A) Would the current C++ Api break if the table payload size were not set?

B) Most of the time we need to re-send the same message with only a single value changed (e.g. a timestamp). It'd be a waste to create a new message every time. Pointers could be initialized (e.g. initName()) to NULL (=0) and set later. It really depends on the use case.

a) yes, that would work. I can't believe I forgot about the simplest option

c) I've had some fun on the weekend working on a proof of concept for an alternative (deterministic and mutable) Java Api. While trying to implement the pointer-style syntax, I came to the same conclusion in that it's just too inconvenient and error prone. Instead I've tried to go for a protobuf-like syntax, which so far seems to work quite well. I have some ideas on how to share vtables, but it would require some deliberate action by users. In case you want to take a look at the current status:

Github: https://github.com/ennerf/flatbuffers-java-poc/

Sample: https://github.com/ennerf/flatbuffers-java-poc/blob/master/src/test/java/org/enner/flatbuffers/validation/PocTestData.java

b) I guess keeping uint16 offsets is fine as long as users have a way to initialize pointers ahead of time. I think I like the table-length field after all. I could use it to share vtables, e.g, once a length is set, a vector table becomes immutable/shareable

g) Don't take this critique the wrong way, but I believe that an Api should match the language it's written in. Java has a large set of guidelines that practically everyone follows, which causes deviations to look very weird. In C++ this is probably less of an issue because there is no consistent standard. The Python-style underscore prefixes are all over the generated files and the internal Api.

Overall I really like the protocol. The only really annoying issue is that reference pointers are uint32, even though the protocol only supports a 2 GB limit. I think that changing this is probably even worth breaking backwards compatibility. You could add a "--with-deprecated-uint32-pointers" flag into the generator to mitigate complaints.

- Florian

Wouter van Oortmerssen

unread,

Jan 16, 2015, 5:12:38 PM1/16/15

to Florian Enner, flatb...@googlegroups.com

Florian,

Thank you for your elaborate response. We've done some tests and it seems that we can get C++11 running. FlatBuffers looks like the best candidate so far.

Cool!

A) Would the current C++ Api break if the table payload size were not set?

Yes. We support both forwards and backwards compatibility, so old code must be able to read new binary files.

Hence, a #define is the best we can do, and it would cause you to be binary incompatible with code that does not set that flag, or data produced before that flag was set. So will have to be used with care.

I will look into adding this #define

B) Most of the time we need to re-send the same message with only a single value changed (e.g. a timestamp). It'd be a waste to create a new message every time. Pointers could be initialized (e.g. initName()) to NULL (=0) and set later. It really depends on the use case.

You can't really set a pointer later, unless we add functionality to add this to the buffer later, which is non-trivial.

Nulls are not actually stored in the buffer, since it means the field is simply absent. ForceDefaults() does not change this, it only works on scalars.

Mutable scalars is something I was considering, but like I said it is a large refactor.

a) yes, that would work. I can't believe I forgot about the simplest option

I'll add this.

c) I've had some fun on the weekend working on a proof of concept for an alternative (deterministic and mutable) Java Api. While trying to implement the pointer-style syntax, I came to the same conclusion in that it's just too inconvenient and error prone. Instead I've tried to go for a protobuf-like syntax, which so far seems to work quite well. I have some ideas on how to share vtables, but it would require some deliberate action by users. In case you want to take a look at the current status:

Github: https://github.com/ennerf/flatbuffers-java-poc/
Sample: https://github.com/ennerf/flatbuffers-java-poc/blob/master/src/test/java/org/enner/flatbuffers/validation/PocTestData.java

Wow, you really went to town!

But yes, you are writing the vtable for every object, which is allowed in the format,

but not something we want to do.

Interestingly, my very first design for FlatBuffers had a vtable for each object, but used bytes for offsets, and did not have the vtable pointer (since they were always at the start of the object, saving a further 4 bytes). This was nice and simple, but ultimately too limiting for object sizes.

You've also abstracted some of the FlatBuffers internals further. While making for nicer code, this scares me a bit in Java, since these things have a cost. In particular, I am not understanding the need for Addressable, which seems expensive (with the switch etc).

b) I guess keeping uint16 offsets is fine as long as users have a way to initialize pointers ahead of time. I think I like the table-length field after all. I could use it to share vtables, e.g, once a length is set, a vector table becomes immutable/shareable

Note that by writing the vtable inside of the object, you are pretty much guaranteeing it can't be shared, since another object you'd want to share it with wouldn't have it, and thus have all different offsets.

g) Don't take this critique the wrong way, but I believe that an Api should match the language it's written in. Java has a large set of guidelines that practically everyone follows, which causes deviations to look very weird. In C++ this is probably less of an issue because there is no consistent standard. The Python-style underscore prefixes are all over the generated files and the internal Api.

Where are they in the generated files?

The internal ones (e.g. __offset) are named exactly such so that its obvious that they're not user facing (I'd make them private if I could), i.e. a Java programmer should never use them.

Overall I really like the protocol. The only really annoying issue is that reference pointers are uint32, even though the protocol only supports a 2 GB limit. I think that changing this is probably even worth breaking backwards compatibility. You could add a "--with-deprecated-uint32-pointers" flag into the generator to mitigate complaints.

Not sure what the problem here is. The 2GB limitation is because vtables can be both before and after an object. uint reference pointer is because I want to document that they HAVE to point forward only, by design. What would you have done differently?

Wouter

Florian Enner

unread,

Jan 18, 2015, 10:38:56 PM1/18/15

to flatb...@googlegroups.com, floria...@gmail.com

c) I initially started with purely static functions. Since that API didn't work very well, I've tried to create a purely object oriented and type safe API. The Addressable "switch" is there because I found it weird to mix objects and untyped integer offsets in one API. As far as I know inheritance is relatively cheap in Java. If that ever becomes a bottleneck, users will probably switch to a custom implementation using "Unsafe" (no boundary checks) anyways.

Another reason for it was that I did like the idea of treating vectors as separate data structures instead of embedding them into generated accessors. I could imagine that in the future there may be a need for more dynamic collections, such as a linked list. I was also planning on making String a separate class that implements "CharSequence". That being said, Addressable may not be the best way to go about it.

b) You are correct. The order could easily be changed. Vtables could be shared through either deliberate reuse at table creation (e.g. "monster.createWithExistingVectorTable(existing)"), or through the use of temporary memory similar as in the official implementation.

uint32) I thought that I've read other threads where people were asking about signed32 pointers, but I may be remembering that wrong. My problem with them is that they are tough to deal with when creating custom implementations. E.g. assume that I want to build the buffer in ascending order and have a list of monsters with the same group-name. The workflow would have to look something like this:

1) reserve space for vector

2) iteratively create monsters

2a) set hp/mana etc.

2b) set group-name pointer to NULL to reserve space

2c) create next monster

3) create group-name string

4) reiterate through vector

4a) set group-name to string

If the pointer were allowed to be negative, the group-name string could be created before the monsters. Note that only the uint32 pointers are problematic and that the uint16 vtable offsets are easy to deal with. The reason why I'd want to use an ascending API is purely for readability. An alternative would be to create a stateful-write-object, but that would defeat the purpose of zero encoding.

Why did you make this design decision? Was it to mitigate potential infinite loops / attacks ?

Florian

Wouter van Oortmerssen

unread,

Jan 21, 2015, 3:11:49 PM1/21/15

to Florian Enner, flatb...@googlegroups.com

Florian,

On Sun, Jan 18, 2015 at 7:38 PM, Florian Enner <floria...@gmail.com> wrote:

c) I initially started with purely static functions. Since that API didn't work very well, I've tried to create a purely object oriented and type safe API. The Addressable "switch" is there because I found it weird to mix objects and untyped integer offsets in one API. As far as I know inheritance is relatively cheap in Java. If that ever becomes a bottleneck, users will probably switch to a custom implementation using "Unsafe" (no boundary checks) anyways.

Inheritance is probably "cheap" in Java because you're paying for its cost regardless of whether you actually use it not, unlike C++.

What I am more worries about in Java is the cost of additional objects allocated in terms of GC.

Another reason for it was that I did like the idea of treating vectors as separate data structures instead of embedding them into generated accessors. I could imagine that in the future there may be a need for more dynamic collections, such as a linked list. I was also planning on making String a separate class that implements "CharSequence". That being said, Addressable may not be the best way to go about it.

b) You are correct. The order could easily be changed. Vtables could be shared through either deliberate reuse at table creation (e.g. "monster.createWithExistingVectorTable(existing)"), or through the use of temporary memory similar as in the official implementation.

The problem with putting the user in charge of vtable reuse is that it is very hard to know which two tables can be reused, e.g. leaving out a fields or writing fields in a different order causes vtables to differ for one table type. On the flip side, vtables are sometimes reused across table types because their layout happens to be the same.

That, and it would complicate the API, with more work for the user. I don't think that's a good idea.

uint32) I thought that I've read other threads where people were asking about signed32 pointers, but I may be remembering that wrong. My problem with them is that they are tough to deal with when creating custom implementations. E.g. assume that I want to build the buffer in ascending order and have a list of monsters with the same group-name. The workflow would have to look something like this:

1) reserve space for vector
2) iteratively create monsters
2a) set hp/mana etc.
2b) set group-name pointer to NULL to reserve space
2c) create next monster
3) create group-name string
4) reiterate through vector
4a) set group-name to string

Do note that currently buffers are built backwards (starting at the high memory address), so sharing a string in multiple objects is actually very easy, you first write the string, and then objects refer to it directly.

If the pointer were allowed to be negative, the group-name string could be created before the monsters. Note that only the uint32 pointers are problematic and that the uint16 vtable offsets are easy to deal with. The reason why I'd want to use an ascending API is purely for readability. An alternative would be to create a stateful-write-object, but that would defeat the purpose of zero encoding.

Why did you make this design decision? Was it to mitigate potential infinite loops / attacks ?

Yes, it was deemed that cycles in serialized data were not a frequent use case, and eliminating them would make the format and code more robust.