Implementation details and focus

Maxim Zaks

unread,

Dec 13, 2015, 7:34:27 AM12/13/15

to FlatBuffers

Hi guys,

first of all, I want to tell that FlatBuffers is a great concept and I am really thankful you making it happen as open source project.

I guess now you are expecting a BUT part and here it comes :)

It seems to me that there was some degree of historical evolution which resulted in some strange implementation details.

I would like to list them

JSON support. flatc enables us to translate Binary into JSON and JSON in to Binary. However even if binaries are backwards and forward compatible, JSONs are not. As far as I can see, it is only due to the fact that flatc parses JSON and schemas through the same code. This is an implementation detail which cripples JSON support.
Memory footprint vs. effective access. The beauty of FlatBuffers is that it can support both. We can layout data in the most thrifty way, by reusing definitions. This is already done with reuse of vTables. I was however a bit puzzled why it is not done for strings. However, this implementation detail also implies that you can't have a really lazy data access where you can stream the data without the need of feeling up the whole buffer first. It could happen that I reuse vTable which is somewhere at the end of the buffer.
The base classes which are used by the generated code are very C++ centric. It kind of correlates with the 2. point. They favour effective access over ease of use. One questionable feature is the min alignment and padding. Which I guess has much bigger impact in C++ than in other languages. However I have to be honest I am not sure how big of an impact it makes. I would really appreciate if you could elaborate on this implementation detail. I personally picked on it because it makes the process of serialisation in to binary much more complex and also effects the memory foot print of the binary.

I am sorry for being so critical, but I truly believe that FlatBuffers is a great project and that it could be very beneficial to the project if we could do following adjustments to address the problems that I listed before.

Decouple JSON and Schema parser from each other.
Provide a possibility to define if it is desirable to go for the smallest memory footprint, or efficient access (maybe even streaming access)
Make it easy to write own code generators for different languages. I guess it was at some point on the table anyways (looking at attribute_decl = attribute string_constant ;).

I started with (https://github.com/mzaks/FlatBuffersSchemaEditor) to address the third point. I will have a bit more time after 20th of December. I hope to make some progress there.

Cheers,

Maxim

mikkelfj

unread,

Dec 14, 2015, 2:28:58 PM12/14/15

to FlatBuffers

JSON support. flatc enables us to translate Binary into JSON and JSON in to Binary. However even if binaries are backwards and forward compatible, JSONs are not. As far as I can see, it is only due to the fact that flatc parses JSON and schemas through the same code. This is an implementation detail which cripples JSON support.

I don't see any inherent reason why JSON isn't forward or backward compatible - the current parser may require all fields to be present which is sometimes preferable. I was discussed in another thread and it seems Wouter is open to PR's in this area.

One thing JSON cannot ever handle with forward schema is extended enums: If an older parser receives a more recent enum symbolic name it doesn't know about, it is forced to reject the input. Unions can be handled by ignoring unknown types.

Personally I would prefer to have JSON completely seperated from the schema text format and my parser in C does not support JSON. I do plan on releasing a generated JSON parser at some point.

JSON could also be parsed by a library taking a binary schema and a json text as input, but currently the binary schema does not have full scope support which puts some constraints on such a parser - mostly concerning enum symbols.

Memory footprint vs. effective access. The beauty of FlatBuffers is that it can support both. We can layout data in the most thrifty way, by reusing definitions. This is already done with reuse of vTables. I was however a bit puzzled why it is not done for strings. However, this implementation detail also implies that you can't have a really lazy data access where you can stream the data without the need of feeling up the whole buffer first. It could happen that I reuse vTable which is somewhere at the end of the buffer.

I wondered as well - but after I implemented the C flatbuffer builder I learned to appreciate this: For one thing you sometimes not want a DAG - you can use object identity although you shouldn't really rely on it for portablity. More importantly, the C builder does never actually build a complete flatbuffer - it streams chunks of buffers that can be sent over the network depending on the emitter object associated. This makes it possible to stream flatbuffers. If compression was enabled the builder would be slower, require much more memory, and require a full buffer representation or something equivalent.

With the design I have in the C builder, an emitter object can be attached which acts as compressor. Feel free to implement such a compressor for the C backend, or alterntively a PR for the C++ builder with an option to compress the buffer would probably also be of value, although I cannot speak for Wouter.

On a related note: I would actually have preferred if the vtables were not shared across different table types. I don't recall the use case exactly - but it makes hacks to mutate buffers more difficult, or something like that.

(Sorry about numbering, it seems my web editor is too clever about enumeration)

The base classes which are used by the generated code are very C++ centric. It kind of correlates with the 2. point. They favour effective access over ease of use.

I don't really get this point. The C++ generated code is C++ centric?

One questionable feature is the min alignment and padding. Which I guess has much bigger impact in C++ than in other languages. However I have to be honest I am not sure how big of an impact it makes. I would really appreciate if you could elaborate on this implementation detail. I personally picked on it because it makes the process of serialisation in to binary much more complex and also effects the memory foot print of the binary.

It is a real pain. But it is absolutely necessary if you want a portable high performance binary format. The C builder handles all of this, so hopefully some languages can reuse this and focus on higher level abstractions - but hard to tell - most languages want to use their own language for everything.

I am sorry for being so critical, but I truly believe that FlatBuffers is a great project and that it could be very beneficial to the project if we could do following adjustments to address the problems that I listed before.
Decouple JSON and Schema parser from each other.

Yes that I agree with.

Provide a possibility to define if it is desirable to go for the smallest memory footprint, or efficient access (maybe even streaming access)

That is possible in the C builder via the emitter object - although there is only one emitter availble currently.

Make it easy to write own code generators for different languages. I guess it was at some point on the table anyways (looking at attribute_decl = attribute string_constant ;).
I started with (https://github.com/mzaks/FlatBuffersSchemaEditor) to address the third point. I will have a bit more time after 20th of December. I hope to make some progress there.

Great. Also keep in mind the considerations I have about a future extended binary schema that includes more scope information - more suited for code generation than runtime reflection. And probably also a schema editior. I probably won't get back to that for a while - but I think it is needed in some form.

mikkelfj

unread,

Dec 14, 2015, 2:53:33 PM12/14/15

to FlatBuffers

> On a related note: I would actually have preferred if the vtables were not shared across different table types. I don't recall the use case exactly -

> but it makes hacks to mutate buffers more difficult, or something like that.

Now I recall: I want to tag all entries in the vtable which are offsets - that is - tables, strings and vectors. Each vtable is 16 bits wide and the high bit could be used for that. Incompatible readers would see this as a future field because the entry is out of range and just use defaults. This isn't possible when the vtable is shared across types.

Such as tagged vtable would enable many operations not otherwise possible without deep type information. For example, a flatbuffer compressor would only need to know the basic type of object (table, string, vector). It can the clone objects and merge identical objects in the process.

As it is, such a backend would need more type information.

Wouter van Oortmerssen

unread,

Dec 14, 2015, 3:46:14 PM12/14/15

to Maxim Zaks, FlatBuffers

Maxim,

On Sun, Dec 13, 2015 at 4:34 AM, Maxim Zaks <maxim...@googlemail.com> wrote:

Hi guys,

first of all, I want to tell that FlatBuffers is a great concept and I am really thankful you making it happen as open source project.

I guess now you are expecting a BUT part and here it comes :)

It seems to me that there was some degree of historical evolution which resulted in some strange implementation details.

I would like to list them
JSON support. flatc enables us to translate Binary into JSON and JSON in to Binary. However even if binaries are backwards and forward compatible, JSONs are not. As far as I can see, it is only due to the fact that flatc parses JSON and schemas through the same code. This is an implementation detail which cripples JSON support.

I'm not sure what about JSON is not forwards/backwards compatible. One issue is that by default, The FlatBuffer parser does not accept unknown fields. This is for good reason, the JSON input was originally conceived as a friendly way to data into FlatBuffers, not necessarily to absorb abitrary JSON.

As such, it is important you get error-feedback on mis-spelled fields, rather than difficult to diagnose errors at run-time due to missing data.

That said, people have asked for an option to ignore unknown fields before, and I think it should be added.

Memory footprint vs. effective access. The beauty of FlatBuffers is that it can support both. We can layout data in the most thrifty way, by reusing definitions. This is already done with reuse of vTables. I was however a bit puzzled why it is not done for strings. However, this implementation detail also implies that you can't have a really lazy data access where you can stream the data without the need of feeling up the whole buffer first. It could happen that I reuse vTable which is somewhere at the end of the buffer.

Reuse of strings (and vectors, and tables) is up to the discretion of the user, and can easily be implemented by the user (FlatBuffers are allowed to be a DAG, so if you want to reuse the result of CreateString twice, no-one is stopping you.

There are consequences to reusing objects, related to identity and mutation, so this should always be under user control.

That said, specifically for strings which are a common use case, we could easily add a CreateSharedString() function that does the heavy lifting for you automatically.

The base classes which are used by the generated code are very C++ centric. It kind of correlates with the 2. point. They favour effective access over ease of use. One questionable feature is the min alignment and padding. Which I guess has much bigger impact in C++ than in other languages. However I have to be honest I am not sure how big of an impact it makes. I would really appreciate if you could elaborate on this implementation detail. I personally picked on it because it makes the process of serialisation in to binary much more complex and also effects the memory foot print of the binary.

C++ centric? Which language are you using?

FlatBuffers was entirely designed to give maximum performance though in-place memory layouts, something which is indeed very foreign to most languages other than C++. We can't change these languages however, but I believe that optimising how a CPU accesses memory brings benefits to any language, though admittedly some of that is lost when not using C++.

Whenever in FlatBuffers there is a mutually exclusive choice between fast and convenient, we go for the former. We just hope that in most cases it is not mutually exclusive.

Games, codecs, scientific computing, lots of fields use data that needs specific alignment to be fast (for e.g. SIMD). Having to copy this data out of FlatBuffers to be maximally efficient would defeat the point of FlatBuffers.

I am sorry for being so critical, but I truly believe that FlatBuffers is a great project and that it could be very beneficial to the project if we could do following adjustments to address the problems that I listed before.
Decouple JSON and Schema parser from each other.

What problem does this solve? I believe any compatibility issues can be solved with the existing parser.

It is also a very compact and very fast parser that is useful for those that want to read JSON at run-time, but with low cost. It parses straight into a FlatBuffer with very little intermediate memory usage. I've looked around at existing JSON parsers, and they're all orders or magnitude less efficient (particularly in memory usage).

If people wish to create a JSON -> FlatBuffers converter using a different parser of their choosing, this would not be hard to do.

Provide a possibility to define if it is desirable to go for the smallest memory footprint, or efficient access (maybe even streaming access)

Beyond pooling strings, what are you suggesting here?

Make it easy to write own code generators for different languages. I guess it was at some point on the table anyways (looking at attribute_decl = attribute string_constant ;).

Generally a language implementation is complicated thing, because FlatBuffers is such a thin layer of raw memory access, there's relatively a lot to implement, and I am not sure to what extend that can be simplified from the current generators, since most of it is so language specific.

I started with (https://github.com/mzaks/FlatBuffersSchemaEditor) to address the third point. I will have a bit more time after 20th of December. I hope to make some progress there.

Definitely sounds cool :)

Wouter

Message has been deleted

Maxim Zaks

unread,

Dec 20, 2015, 6:00:22 AM12/20/15

to FlatBuffers

Hi guys, thanks for replies and sorry for my slow response :)

I want to address everything one by one now because other wise it becomes super hard to read.

Lets start with JSON and binary compatibility.

I am currently working on a strategy game where we use Flatbuffers for all Client - Backend communication where client is a game running on mobile device, implemented with Unity3D.

The communication works as following, BackEnd sends client config file and players game state. Client sends Backend Commands which reflects changes in the game state.

Now the config is initially done in Google Spreadsheets (preferred tool of game designers) than it is transformed in to JSON. The Backend than translates JSON into Flatbuffer binary and sends it to Client. If handling of JSON would be equal to binary we could have additive changes to config without the need of redeploying the backend with the new schema.

Now here is a tricky part. And this is described best by the game state example. Game States are also saved as JSONs in Postgres, mainly so that we can query them.

Here it is also desirable that a game state of a new client can be read by (old) BackEnd which serves old client. But there is another detail which makes JSON behave different from the binary representation. When you rename a property in a schema this does not need any migration as in binary the property names are not stored. This is however not the case in current JSON representation.

Schema like:

table Monster {
  mana:short = 150;
  hp:short = 100;
  name:string;
}

Would be represented as:

{
  mana : 100,
  hp : 50
  name : "Max"
}

But to be resilient agains the property name changes as the binary is, it should be something like:

[
  100, 50, "max"
]

This is a direct equivalent to binary representation.

Even though it is less human readable and it makes the JSON also much harder to query.

Honestly I am uncertain myself if it is a good idea to have such cryptic JSON, but it would be the most resilient one for migrations.

What do you think?

Maxim Zaks

unread,

Dec 20, 2015, 9:24:26 AM12/20/15

to FlatBuffers

Now to the representation of data.

The FlatBufferBuilder as implemented in the main FlatBuffers repo (I guess it applies to all languages)

currently builds a DAG which reuses most of the nodes except for strings.

As Wouter mentioned CreateSharedString method would than let the user also reuse stings and it would result in the most efficient memory footprint.

If you don't know the internals of FlatBuffers you might think of it as a compressed version.

However as mikkelfj mentioned a "compressed" DAG is inefficient to stream and almost impossible to mutate.

It is better to have an unfolded DAG which will than results in a Tree, where you can read only certain branches and not even put other branches in to memory.

I guess mikkelfj implemented it in his C port. (I didn't looked at the implementation, sorry)

In my opinion those are two strategies which are totally valid and it would be nice to be able to select which strategy is important to you.

Let's take again my use case with the city builder strategy game I am working on.

Players game state is something that we have to read completely and turn into runtime objects.

Because most of it describes your city and current state of economy.

Config how ever is much bigger and describes things that we don't even need.

It contains information about all the building types and all it levels.

So let's say you have 50 different building types and they all can progress to level 10.

You have 500 different cases but you are able to build only 100 buildings (after months of playing the game) in your city.

So taking this into account the player state should be a compressed DAG, because I want it to be as small as possible to download.

I will any ways go through all of it and unfold it as runtime object tree.

However the config I want to read lazy maybe even stream and not download completely.

For this I would need the config to be represented as an unfolded Tree and not as DAG.

By this I wanted to showcase that it is possible to have both use cases in a single application.

And it would be nice if the FlatBuffer implementation would embrace those in a user friendly manner.

Am Sonntag, 13. Dezember 2015 13:34:27 UTC+1 schrieb Maxim Zaks:

mikkelfj

unread,

Dec 20, 2015, 12:26:58 PM12/20/15

to FlatBuffers

On Sunday, December 20, 2015 at 12:00:22 PM UTC+1, Maxim Zaks wrote:

The communication works as following, BackEnd sends client config file and players game state. Client sends Backend Commands which reflects changes in the game state.

Now the config is initially done in Google Spreadsheets (preferred tool of game designers) than it is transformed in to JSON. The Backend than translates JSON into Flatbuffer binary and sends it to Client. If handling of JSON would be equal to binary we could have additive changes to config without the need of redeploying the backend with the new schema.

Interesting!

If you hang tight a little longer, and if you can can live with a JSON parser compiled from schema to C (doesn't read binary bfbs and can only parse the compiled schema), I am very close to releasing one in flatcc, but it needs some more testing. Printing JSON is not necessarily supported in the first version.

This parser will handle schema evolution, or not, depending on compile time flags, and very preliminary tests (may be fast due to bugs) suggests 700 byte JSON parsed at 230MB/s or 300K ops/sec. (monsterdata_test.golden test file) (very light on float data - floats will likely slow down the parse).

Now here is a tricky part. And this is described best by the game state example. Game States are also saved as JSONs in Postgres, mainly so that we can query them.

...

Would be represented as:
{ mana : 100, hp : 50 name : "Max" }

But to be resilient agains the property name changes as the binary is, it should be something like:
[ 100, 50, "max" ]

This would be possible for structs, but not for tables, because it would force you to include all fields. FlatBuffers is designed to support lots of default values that only materialize when you read the data. I don't really think it is a good idea for structs even though I initally had the same idea - it was mostly a performance concern, but I can now parse names so fast it really doesn't matter unless there are a lot of fields.

First of all I sugges not to rename if at all possible, second, try to deprecate old fields and use new fields with new ids and new names instead, third, if you absolutely must rename a field, do the following:

Compile a JSON parser for each known schema version where there are breaking changes, i.e. renames. Now either send the data to the right parser, or implement a fallback parse on error. Make sure the schema does not change in way so a rename maps to an old different name so the fallback parse does not fail but misinterprets data.

nestly I am uncertain myself if it is a good idea to have such cryptic JSON, but it would be the most resilient one for migrations.

What do you think?

Bad idea, also for practical reasons - someone is going to mess it up due to lack of transparency. The same can be said for renaming fields.

mikkelfj

unread,

Dec 20, 2015, 12:47:42 PM12/20/15

to FlatBuffers

On Sunday, December 20, 2015 at 3:24:26 PM UTC+1, Maxim Zaks wrote:

It is better to have an unfolded DAG which will than results in a Tree, where you can read only certain branches and not even put other branches in to memory.
I guess mikkelfj implemented it in his C port. (I didn't looked at the implementation, sorry)

I was thinking a bit more about the problem - it is fairly easy to implement a new emitter object as backend to the builder (by design). But I am not sure it will actually support compression without first building a complete buffer - I believe a minor interface change would allow the emitter object to return and old node instead the new data added so the builder can place this during construction on the fly. This way an emitter may choose to perform limited caching while streaming. Mind you, due to the flatbuffer addressing logic, streaming uses negative offsets that must be recombined at the receiving end of a stream, but that is not difficult.

In my opinion those are two strategies which are totally valid and it would be nice to be able to select which strategy is important to you.

Yes - there are endless scenarios. UDP packages with retransmission, ...

Let's take again my use case with the city builder strategy game I am working on.

interesting

However the config I want to read lazy maybe even stream and not download completely.
For this I would need the config to be represented as an unfolded Tree and not as DAG.

By this I wanted to showcase that it is possible to have both use cases in a single application.
And it would be nice if the FlatBuffer implementation would embrace those in a user friendly manner.

It is not clear what you constraints are, other than network. Can you afford a large local storage buffer on disk? If so LZ4 compression (fast), or some better butter slower compression may also work. You can then expand locally and later only read what you need from the buffer. It will not be as good as DAG, or DAG + compression, but it is simple and something that can plugged directly into the current flatcc C emitter - I just didn't want to clutter the builder with this.

A more flexible compression scheme would create a DAG that preserves identity - but that would be somewhat complicated and definitely not standard flatbuffers. It would store a traversal index along with the offset to a shared node. This index could be stored separately and looked up in a hash table to preserve that flatbuffer format.

Another option is to query only what you need over the network. GraphQL by FaceBook is designed for problems like this - it is something I consider longer term in some form or another. Then you can get local views of a larger remote flatbuffer.

Yet another approach is to use delta compression against earlier data but only of interest for updates, not static game layout. For delta compression you need some sort of overlay over a flatbuffer and a modified flatbuffer reader that checks offsets (pointers) against a hash table and reads either old or new data. This can be very fast, but of course not as fast as Flatbuffers alone. It is also useful for mutable data without ever touching the original data, but it may not solve the DAG problem due to identity confusion, unless the extra index is provided.

mikkelfj

unread,

Dec 21, 2015, 3:19:12 AM12/21/15

to FlatBuffers

> Here it is also desirable that a game state of a new client can be read by (old) BackEnd which serves old client. But there is another detail which makes JSON behave

> different from the binary representation. When you rename a property in a schema this does not need any migration as in binary the property names are not stored.

> This is however not the case in current JSON representation.

See also my other anser - I gave this some more thought. I suggested using a fallback parsing in JSON if there are renames - but this requires unknown fields to fail and prevents safe forward compatibility when only adding fields. So a parser is needed for every schema version. Therefore it is better to not rename at all.

However, I think it would be possible to add an attribute (alias: "old_name") to table and struct fields. The attribute can safely be ignored, but a JSON parser could understand the attribute and also accept the old name. Unfortunately this doesn't handle renames, so you end up with something like (alias: "old_name1", alias: "old_name2"), or (alias: "old_name1, old_name2"). This could also apply to enum names, and even type names.

I am not really sure it is worthwhile though.

Wouter van Oortmerssen

unread,

Dec 21, 2015, 6:39:26 PM12/21/15

to Maxim Zaks, FlatBuffers

Note that someone just made a PR to allow JSON to be read while ignoring unknown fields, which will allow your use case of adding to the JSON without changing the server:

https://github.com/google/flatbuffers/pull/2776

As for JSON not being friendly for changing field names, this is a problem inherent in JSON that cannot be fixed by FlatBuffers without going to a different text format.

On Sun, Dec 20, 2015 at 2:50 AM, Maxim Zaks <maxim...@googlemail.com> wrote:

Hi Wouter, thanks for reply and sorry for my slow response :)

Am Montag, 14. Dezember 2015 21:46:14 UTC+1 schrieb Wouter van Oortmerssen:
Maxim,

On Sun, Dec 13, 2015 at 4:34 AM, Maxim Zaks <maxim...@googlemail.com> wrote:
Hi guys,

first of all, I want to tell that FlatBuffers is a great concept and I am really thankful you making it happen as open source project.

I guess now you are expecting a BUT part and here it comes :)

It seems to me that there was some degree of historical evolution which resulted in some strange implementation details.

I would like to list them
JSON support. flatc enables us to translate Binary into JSON and JSON in to Binary. However even if binaries are backwards and forward compatible, JSONs are not. As far as I can see, it is only due to the fact that flatc parses JSON and schemas through the same code. This is an implementation detail which cripples JSON support.
I'm not sure what about JSON is not forwards/backwards compatible. One issue is that by default, The FlatBuffer parser does not accept unknown fields. This is for good reason, the JSON input was originally conceived as a friendly way to data into FlatBuffers, not necessarily to absorb abitrary JSON.

I think here is better to have a use case, how the JSON might be used.

I am currently working on a strategy game where we use Flatbuffers for all Client - Backend communication where client is a game running on mobile device, implemented with Unity3D.

The communication works as following, BackEnd sends client config file and players game state. Client sends Backend Commands which reflects changes in the game state.

Now the config is initially done in Google Spreadsheets (preferred tool of game designers) than it is transformed in to JSON. The Backend than translates JSON into Flatbuffer binary and sends it to Client. If handling of JSON would be equal to binary we could have additive changes to config without the need of redeploying the backend with the new schema.

Now here is a tricky part. And this is described best by the game state example. Game States are also saved as JSONs in Postgres, mainly so that we can query them.

Here it is also desirable that a game state of a new client can be read by (old) BackEnd which serves old client. But there is another detail which makes JSON behave different from the binary representation. When you rename a property in a schema this does not need any migration as in binary the property names are not stored. This is however not the case in current JSON representation.

Schema like:

table Monster {
  mana:short = 150;
  hp:short = 100;
  name:string;
}

Would be represented as:

{

}

As such, it is important you get error-feedback on mis-spelled fields, rather than difficult to diagnose errors at run-time due to missing data.

That said, people have asked for an option to ignore unknown fields before, and I think it should be added.
Memory footprint vs. effective access. The beauty of FlatBuffers is that it can support both. We can layout data in the most thrifty way, by reusing definitions. This is already done with reuse of vTables. I was however a bit puzzled why it is not done for strings. However, this implementation detail also implies that you can't have a really lazy data access where you can stream the data without the need of feeling up the whole buffer first. It could happen that I reuse vTable which is somewhere at the end of the buffer.
Reuse of strings (and vectors, and tables) is up to the discretion of the user, and can easily be implemented by the user (FlatBuffers are allowed to be a DAG, so if you want to reuse the result of CreateString twice, no-one is stopping you.

There are consequences to reusing objects, related to identity and mutation, so this should always be under user control.

That said, specifically for strings which are a common use case, we could easily add a CreateSharedString() function that does the heavy lifting for you automatically.
The base classes which are used by the generated code are very C++ centric. It kind of correlates with the 2. point. They favour effective access over ease of use. One questionable feature is the min alignment and padding. Which I guess has much bigger impact in C++ than in other languages. However I have to be honest I am not sure how big of an impact it makes. I would really appreciate if you could elaborate on this implementation detail. I personally picked on it because it makes the process of serialisation in to binary much more complex and also effects the memory foot print of the binary.
C++ centric? Which language are you using?

FlatBuffers was entirely designed to give maximum performance though in-place memory layouts, something which is indeed very foreign to most languages other than C++. We can't change these languages however, but I believe that optimising how a CPU accesses memory brings benefits to any language, though admittedly some of that is lost when not using C++.

Whenever in FlatBuffers there is a mutually exclusive choice between fast and convenient, we go for the former. We just hope that in most cases it is not mutually exclusive.

Games, codecs, scientific computing, lots of fields use data that needs specific alignment to be fast (for e.g. SIMD). Having to copy this data out of FlatBuffers to be maximally efficient would defeat the point of FlatBuffers.

I am sorry for being so critical, but I truly believe that FlatBuffers is a great project and that it could be very beneficial to the project if we could do following adjustments to address the problems that I listed before.
Decouple JSON and Schema parser from each other.
What problem does this solve? I believe any compatibility issues can be solved with the existing parser.

It is also a very compact and very fast parser that is useful for those that want to read JSON at run-time, but with low cost. It parses straight into a FlatBuffer with very little intermediate memory usage. I've looked around at existing JSON parsers, and they're all orders or magnitude less efficient (particularly in memory usage).

If people wish to create a JSON -> FlatBuffers converter using a different parser of their choosing, this would not be hard to do.
Provide a possibility to define if it is desirable to go for the smallest memory footprint, or efficient access (maybe even streaming access)
Beyond pooling strings, what are you suggesting here?
Make it easy to write own code generators for different languages. I guess it was at some point on the table anyways (looking at attribute_decl = attribute string_constant ;).
Generally a language implementation is complicated thing, because FlatBuffers is such a thin layer of raw memory access, there's relatively a lot to implement, and I am not sure to what extend that can be simplified from the current generators, since most of it is so language specific.

I started with (https://github.com/mzaks/FlatBuffersSchemaEditor) to address the third point. I will have a bit more time after 20th of December. I hope to make some progress there.

Definitely sounds cool :)

Wouter

--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wouter van Oortmerssen

unread,

Dec 21, 2015, 6:42:42 PM12/21/15

to Maxim Zaks, FlatBuffers

As I said, FlatBuffers already gives you complete choice over Tree vs DAG representation. This will always be the choice of the user. Will definitely add CreateSharedString as that's a very common use case.

--

Wouter van Oortmerssen

unread,

Dec 21, 2015, 6:47:04 PM12/21/15

to Maxim Zaks, FlatBuffers

Ahh, this part of your message was cut off:

Would be represented as:
{ mana : 100, hp : 50 name : "Max" }

But to be resilient agains the property name changes as the binary is, it should be something like:
[ 100, 50, "max" ]

This is a direct equivalent to binary representation.

Even though it is less human readable and it makes the JSON also much harder to query.
Honestly I am uncertain myself if it is a good idea to have such cryptic JSON, but it would be the most resilient one for migrations.

What do you think?

I see the point, but I don't think it would so well. At a binary level, vtables are used to allow values to be omitted etc. The above representation is therefore even more restrictive than the binary one. That's not a good fit for a text format for a table.

Reply all

Reply to author

Forward