Flatbuffers generator and parser using grammar and language tool like ANTLR

1,022 views
Skip to first unread message

Hải Nguyễn

unread,
Nov 25, 2015, 10:46:40 PM11/25/15
to FlatBuffers
Hi there,
This seems a silly question, but we want to try writing flatbuffers compiler using pure Java instead of using C++ version of flatc (for some reasons, using JNI make it difficult).
Do you think it is possible using a language tool like ANTLR to generate Java source code from schema file (*.fbs)?
We have an alternative way: porting C++ code of flatc into Java, but we want to evaluate all possibilities.
Thank you.
/Hai

mikkelfj

unread,
Nov 26, 2015, 2:20:35 AM11/26/15
to FlatBuffers
Hi Hål,

I did it for C using a hand written top-down parser: https://github.com/dvidelabs/flatcc

It is very easy to parse, but even with the detailed grammar, there are a lot of issues hidden in the detail such as what does a namespace actually mean when you include files. And after parsing there are a lot of semantic rules to handle, such as an id attribute must be contigous if present, except for unions that has a gap, and then there are symbolic constants that may have a namespace prefix.

So it is a lot more work than one would think up front. If does not have to be native, I would recommend going with an existing parser, either integrating flatc, or link with the flatcc library I wrote, specifically to simplify intergration for other languages. JINI might be more kind to the C version?

Volkan Yazıcı

unread,
Nov 26, 2015, 2:40:44 AM11/26/15
to FlatBuffers
Hello Hải,

I had a similar problem as well. As mikkelfj pointed out, given the corner cases, parsing FBS files is not simple, but not impossible either. Then I followed a different path: I type the FBS definitions in XML (see the XSD in the attachment) and generate FBS files and Java (de)serializers.

<?xml version="1.0" encoding="utf-8" ?>
<schema xmlns="http://vlkan.com/schema/flatbuffers">

<version>1</version>

<namespace>com.vlkan.flatbuffers.data</namespace>

<attributes>
<attribute>priority</attribute>
</attributes>

<struct name="Vec3">
<field name="x" type="float"/>
<field name="y" type="float"/>
<field name="z" type="float"/>
</struct>

<table name="Monster">
<field name="pos" type="Vec3"/>
<field name="mana" type="short" default="150"/>
<field name="hp" type="short" default="100"/>
<field name="name" type="string"/>
<field name="friendly" type="bool" default="false">
<attribute name="deprecated"/>
<attribute name="priority" value="1"/>
</field>
<field name="inventory" type="[ubyte]"/>
</table>

</schema>

Long story short, if you want to write a native Java code generator in Java (which would be awesome!), you can leverage a similar XML structure too. If it is still needed, in the future phases of the project, you can add a FBS schema reader too.

And you do not need to mess with JNI. Did you see my Maven artifacts for flatc? You can just plug them into your build system and you are done.

Cheers!
flatbuffers-v1.xsd

Hải Nguyễn

unread,
Nov 26, 2015, 5:22:53 AM11/26/15
to FlatBuffers
Thank Mikkelfj, Volkan.
Maven flatc is cool approach, but it seems not fit our requirement. I would like a solution that can be packed into a jar library that generate Java code at run time. This lib can be used in some service that runs continuously (no need to restart service): Parsing schema code then generating appropriate java code.
Hmm, hand written a schema parser may be an easier option.
Thank.
Message has been deleted

mikkelfj

unread,
Nov 26, 2015, 12:14:56 PM11/26/15
to FlatBuffers
I there are more independent code generation tools, I think we should look into an alternative binary schema format that all parsers or xml readers feed into and code generators work from, decoupled from the parser and analyzer. I need something if my C backend should be supported by the main flatc tool, otherwise the code generator cannot be shared. The current binary scheme is good for reflection, but it is not sufficient for code generation.

I don't want to take it on right now, but it is something to keep in mind. This means we should be able to apply code generators to a binary schema that has been verified and has done scope resolution etc. I don't expect that flatc tool to support this directly, but it may be used as a user supplied intermediate backend.

Hải Nguyễn

unread,
Nov 27, 2015, 4:32:50 AM11/27/15
to FlatBuffers
Hmm, So one thing that can be done at least in this time is using reflection in Java. 
About reflection, I saw that lookup by key is implemented by binary search. Can we improve speed by using a hashmap?
Can you share on how to get started with this hack?
Thank.

mikkelfj

unread,
Nov 27, 2015, 10:20:14 AM11/27/15
to FlatBuffers


Den fredag den 27. november 2015 kl. 10.32.50 UTC+1 skrev Hải Nguyễn:
Hmm, So one thing that can be done at least in this time is using reflection in Java. 
About reflection, I saw that lookup by key is implemented by binary search. Can we improve speed by using a hashmap?
 
Not likely - I just saw a fast json parser in Java which stored key values in two arrays because and did a linear scan because it was faster than constructing the hash map for small sets.

And, since you are generating code, the performance is not critical and you are guaranteed fast termination even with large inputs.
But if you want, it is extremely simple to load all the names into an external hash map and map to the object and and enum vector indices 0..N-1.

For a schema dedicated to code generation, I would consider a prebuilt hash table index - but it really isn't necessary.
 
Can you share on how to get started with this hack?

The following C example shows have I traverse a binary schema and convert it to JSON output - for no particular reason other than to test the interface.
It takes advantage of testing if a field is present or not. I don't know if you will need this or if Java supports this, but otherwise you can probably write a low-level check in the vtable yourself.


(Note that reflection contains a DAG (a shared root object), so in JSON there is duplication.)

Hải Nguyễn

unread,
Nov 27, 2015, 7:10:15 PM11/27/15
to FlatBuffers
Thanks, will check it.

Wouter van Oortmerssen

unread,
Nov 30, 2015, 12:46:51 PM11/30/15
to Hải Nguyễn, FlatBuffers
I'm not sure if I understand your requirements. If this is meant to be part of a longer running service, then if it compiles new Java accessors from a schema, surely it doesn't know how to access those new fields?

Maybe you're better off implementing more reflection functionality in Java (similar to what reflection.h does in C++) ?


--
You received this message because you are subscribed to the Google Groups "FlatBuffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to flatbuffers...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hải Nguyễn

unread,
Dec 1, 2015, 1:38:51 AM12/1/15
to FlatBuffers, orio...@gmail.com
Yes you are right, I am trying to implement Reflection in Java (based on C++ implementation in reflection.h).

Hải Nguyễn

unread,
Dec 7, 2015, 5:51:35 AM12/7/15
to FlatBuffers, orio...@gmail.com
Hi Wouter,
I am playing in reflection (in Java). 
From C++ implementation, I known there is lookup utility (LookupByKey), however I really don't know how to port it into Java, there are some difference between two languages: C++ implementation supports representing buffer as vector of data (hence easily binary searching), whereas Java's one does not, it only support iterating list through its index, hence taking O(N) to search for field.
Do you have any idea on doing reflection (especially for nested object) on Java?
Thank.

Wouter van Oortmerssen

unread,
Dec 7, 2015, 12:59:57 PM12/7/15
to Hải Nguyễn, FlatBuffers
There's no reason why you can't implement the same kind of binary search with indices instead of pointers.. it's the same algorithm.

One complication is that if the keys are strings (as they are in reflection), that you don't want to access them as String, as that would make the binary search very slow. Better to write a custom comparison function that works directly on the ByteBuffer.

Hải Nguyễn

unread,
Dec 7, 2015, 8:37:46 PM12/7/15
to FlatBuffers, orio...@gmail.com
Sorry for misrepresenting my idea.
I mean using a hash table alike to store the map from field name to its offset, as Mikkelfj mentioned before. It causes the conflict: The same field name with different location, for eg field X of table A and field X of table B (Both table A, B is inside root Z) may have the different offset value (relatively offset to its parent position, right?), but hashing offset makes them override each other's value. I think it is better to cache the offset of each field into a DAG, if I want to reduce the searching time.
What do you think about that?

Hải Nguyễn

unread,
Dec 8, 2015, 12:59:05 AM12/8/15
to FlatBuffers, orio...@gmail.com
I am looking back to the code of LookupByKey (C++), it works for flatbuffers::Vector, however, to do reflection for nested type, the object return by LookupByKey (table) must be converted to Vector to do LookupByKey again (I posted this question before https://groups.google.com/forum/#!topic/flatbuffers/nAi8MQu3A-U), I followed Mikkelfj' guidance without success. 
Please give me some advice on this.
Thank you.

Maxim Zaks

unread,
Dec 9, 2015, 1:08:33 PM12/9/15
to FlatBuffers
Hi,

I started working on FlatBufferSchemaEditor based on Xtext.


It is early but the progress I am doing is pretty good.
On full_schema_grammer_support branch I already have a parser which is feature complete.

Otherwise it can parse all the examples from the repository.

Code generation is also pretty simple.
I initially started this project because I am writing a Swift implementation of FlatBuffers and I didn't wanted to extend C++ code generator.
But now I stopped the development of Swift code generator in favour of C# eager FlattBuffers Serialisation, because I actually need it at work :).

Hope it can help someone

Wouter van Oortmerssen

unread,
Dec 9, 2015, 3:03:05 PM12/9/15
to Maxim Zaks, FlatBuffers
Hai: I'm not sure why you want to create a hash table. The binary schemas have been layed out such that they can be used with an in-place binary search, and such that there's no ambiguity between table types.

See reflection/reflection.fbs: if you look up a field in a table, and that field happens to refer to a nested table, then the type.base_type will be Obj, and schema.objects[type.index] will refer to the nested table.


--

Wouter van Oortmerssen

unread,
Dec 9, 2015, 3:06:30 PM12/9/15
to Maxim Zaks, FlatBuffers
Maxim: very cool project, the schema editor!

As for swift code generation, if this is ever to be part of the main project, it be a lot better if it was integrated with the C++ parser. That way it can evolve with the other languages much more easily.

On Wed, Dec 9, 2015 at 10:08 AM, Maxim Zaks <maxim...@googlemail.com> wrote:

--

Maxim Zaks

unread,
Dec 10, 2015, 4:40:05 AM12/10/15
to FlatBuffers, maxim...@googlemail.com
Hi Wouter,

happy you liked the schema editor project.
I understand that it is important to tight the language supports together. 
I just have some questions I would like to clarify in the current implementation. 
But I guess I will create a special forum thread for it.

Cheers,

Maxim

Hải Nguyễn

unread,
Dec 10, 2015, 5:24:24 PM12/10/15
to FlatBuffers
Thank you Wouter. I just figured out the use of type.index when type.base_type is Obj :)
For hashtable stuff, I just think that if will be faster lookup than binary search for medium and big object (with more than 50 fields and nested level is more than 3, kind of that big). I will try both and choose what is best suited.

Hải Nguyễn

unread,
Dec 10, 2015, 5:26:54 PM12/10/15
to FlatBuffers, maxim...@googlemail.com
I like your project.
Please make a new topic about schema editor.
I also want to have an utility to make FB schema interchangeable with JSON schema, http://json-schema.org/
By this way, users can use their familiar JSON grammar with the speed of Flatbuffers.
Reply all
Reply to author
Forward
0 new messages