Questions about grammar (using ANTLR4 and Java)

482 views
Skip to first unread message

ironchefpython

unread,
Jul 13, 2018, 4:14:21 AM7/13/18
to Jsonnet
I'm working on a Java port, I'm currently passing about half the specs from https://github.com/google/jsonnet/tree/master/test_suite, (most of what's left is making the error reporting strings match)


My question is about the grammar; I'm using ANTLR4 to lex and parse, as using a visitor to build an AST tree is stupid-simple. The translation was pretty straightforward, and I like how the g4 file is nearly a 1 to 1 match with the documentation.  

 I've made only a couple changes that seemed to be necessary.
  1. I added a rule alternative to explicitly match parenthetical expressions, which *seemed* to be missing from the official specification (added on line 9)
  2. I made the presence of objinside an optional expression to allow empty objects: {} (lines 10 and 33)
  3. I made the second expression inside a slice optional to match [expr::] and [::] variants (lines 14 and 15)
  4. I broke up and reordered the binary and unary operator rule alternatives to enforce operator precedence. (lines 22 through 32)
Is there anything I've messed up when making those changes? Did I misread the spec?

Additionally, I made a couple of changes for aesthetics.
  1. I changed the definition of forspec and removed compspec, so that I could build my ForSpec AstNode object using a fold left reduce (changes on lines 12, 46, and 69)
  2. I renamed the rule "hidden" to be "visibility" (line 58)
The changes to forspec seem to match all of the test suite, and it seems simpler. Did I miss a corner case?

-Chris

Dave Cunningham

unread,
Jul 19, 2018, 4:26:06 PM7/19/18
to Jsonnet
My question is about the grammar; I'm using ANTLR4 to lex and parse, as using a visitor to build an AST tree is stupid-simple. The translation was pretty straightforward, and I like how the g4 file is nearly a 1 to 1 match with the documentation.  

 I've made only a couple changes that seemed to be necessary.
  1. I added a rule alternative to explicitly match parenthetical expressions, which *seemed* to be missing from the official specification (added on line 9)
I suppose we should add that, it doesn't need to be in the core syntax but it's probably clearer if we include it in the "abstract syntax" (which would then become concrete syntax).

 
  1. I made the presence of objinside an optional expression to allow empty objects: {} (lines 10 and 33)
That seems like a bug in the spec to me!
 
  1. I made the second expression inside a slice optional to match [expr::] and [::] variants (lines 14 and 15)
And that!
 
  1. I broke up and reordered the binary and unary operator rule alternatives to enforce operator precedence. (lines 22 through 32)
Is there anything I've messed up when making those changes? Did I misread the spec?

Sounds fine.  The tests have pretty good coverage so I think you'd notice it if it was wrong.

 

Additionally, I made a couple of changes for aesthetics.
  1. I changed the definition of forspec and removed compspec, so that I could build my ForSpec AstNode object using a fold left reduce (changes on lines 12, 46, and 69)
  2. I renamed the rule "hidden" to be "visibility" (line 58)
The changes to forspec seem to match all of the test suite, and it seems simpler. Did I miss a corner case?

I think it's good (and also simpler)

 
Thanks for machine-checking the spec's syntax :)

I wonder if we should not have your grammar file in the core repo and either link to it or generate the spec from it.  it might be overkill.  We should at least have your fixes / improvements in it.


Also I'm curious why you're making a Java version, and whether some sort of go -> java transliteration might have worked for you.

I think you may have to implement your own stack to avoid blowing the Java stack when executing tail recursive Jsonnet code. Just recursive expansion of the AST probably won't cut it.  It'll still be much simpler than the C++ version though, because you can use the Java garbage collector.

ironchefpython

unread,
Jul 19, 2018, 10:27:04 PM7/19/18
to Jsonnet
On Thursday, July 19, 2018 at 4:26:06 PM UTC-4, Dave Cunningham wrote:

Thanks for machine-checking the spec's syntax :)

It was a pleasure. I updated the gist, I added support for the optional tailstrict keyword on line 18 and rearranged and simplified square bracket index/slice notation on lines 14 & 15.
 
I wonder if we should not have your grammar file in the core repo and either link to it or generate the spec from it.  it might be overkill.  We should at least have your fixes / improvements in it.

Totally up to you. I'm going to be putting this on an open git repository shortly, and you can decide if you would rather link to it, or own it in your repo and I can bring it in as a submodule. (If you're interested in a preview, I can add you to the bitbucket repo, I just don't want to release it publicly until I pass all the unit tests)

Also, I'm sure you know you can use ANTLR4 generate parsers that can be consumed in Go and C++. This probably wouldn't make sense for you, as there's already a hand-written lexer, but if anyone wants to write a Swift, Javascript, C# or Python implementation, this grammar could be used as a starting point as well.
 
There is a huge advantage to me in using a parser generator. By using multiple grammar files I can easily support many different variations of a grammar, and parse them all to a common set of AST nodes. This would allow me to open-source an Java 100% compatible Jsonnet implementation, and just use a different grammar file to create an interpreter that's backward compatible with the JSON templates I have running in production today.

Also I'm curious why you're making a Java version, 

I have a number of use-cases that have accreted over the last 7 years that involve generating parameterized JSON.  I'm running in many production applications a stupidly heterogeneous collection of JSON templating utilities, evolving as my my understanding of the problem and the complexity requirements has increased. From string substitution to mustache templating to a custom Jackson ObjectCodec to Jolt transformations to lists of JSONPath expressions... culminating in an unholy monstrosity that uses a patched Spring Expression Language engine to compile JSON templates into Java bytecode.  Don't get me wrong, the current solution is *fast*, but it's... inelegant.

So I have been looking for a cleaner solution, one that I can use inside Kafka Streams (on the JVM), Elasticsearch Plugins (also on the JVM), Storm bolts (again JVM), inside an Oracle database (Java stored procedures), and many more.

I have two goals; one short term, and one more ambitious. 

First, I want to be able to use Jsonnet as a java library, inside Java applications, and (for example) write JSON templates that call Java code that can accept a Jsonnet object comprehension that can execute a Java function to get data... and so on. Complete interoperability between jsonnet and java.

I'm 90% of the way there, I need to spend the remaining 90% of the time to finish the last 10%. The annoying parts will be eliding the differences between Java-style printf, and Python-style string formatting, and stuff like that. 

When I can pass 100% of the unit tests, I'll consider that goal complete.

Second, I want to implement the Jsonnet in Truffle. This will allow Jsonnet templates to be compiled directly into bytecode, and then using Graal, directly into native executables. I'm not sure yet exactly how this will use useful, but maybe someone one day would like to be able to embed a Jsonnet template in their Ruby or R or Rust code or whatever.
 
I think you may have to implement your own stack to avoid blowing the Java stack when executing tail recursive Jsonnet code. Just recursive expansion of the AST probably won't cut it.  It'll still be much simpler than the C++ version though, because you can use the Java garbage collector.

You are absolutely right, my current implementation doesn't handle tail recursion. I'm current trying to determine if I can come up with a set of rules to desugar expressions marked with TCO functions into a thunk, or if I have to update my (currently pleasantly simple) tree-walking interpreter to use a stack that knows how to reuse stackframes for functions marked with the tailstrict keyword.

This would be an interim solution, as the Oracle hotspot JVM has support for tail-call optimization  (for non-Java languages), and Graal will give me a way to easily leverage that.



Burak Emre Kabakcı

unread,
Feb 25, 2019, 5:09:02 AM2/25/19
to Jsonnet
Truffle sounds to be a great choice. 👍 Did you publish the project yet?

ironchefpython

unread,
Feb 25, 2019, 11:05:04 AM2/25/19
to Jsonnet
I'm still working on licensing issues with my employer.

Burak Emre Kabakcı

unread,
Mar 1, 2019, 7:54:46 AM3/1/19
to Jsonnet
I'm eagerly waiting for you to open-source the project, hope you can get it fixed soon. :)

Burak Emre Kabakcı

unread,
Mar 1, 2019, 7:57:23 AM3/1/19
to Jsonnet
In order to clarify, this version will not only make us compile Jsonnet in Java. We will also be able to traverse the AST, make modifications, create Jsonnet files from scratch easily thanks to Antlr.

Dan Compton

unread,
Mar 2, 2019, 7:02:54 PM3/2/19
to Jsonnet
Will this implementation include a self-evaluator that allows for the language to operate upon and output itself (in the sugared form)?   I've petitioned for this and my use-case is mass-refactoring of manifests.  The current tooling is a little cumbersome for this use case (e.g. adding fields to deeply nested objects in arrays, etc). 

haoy...@databricks.com

unread,
Mar 7, 2019, 6:43:47 PM3/7/19
to Jsonnet
For what it's worth, our https://github.com/databricks/sjsonnet implementation is pure-JVM and can be easily used in any JVM application. It's basically 100% compatible, really fast (https://databricks.com/blog/2018/10/12/writing-a-faster-jsonnet-compiler.html), and passes the bulk of the google/jsonnet test suite, as well as the 100,000s of lines of jsonnet we have in our codebase.

Our entire lexer/parser lives in one smallish file (https://github.com/databricks/sjsonnet/blob/master/sjsonnet/src/sjsonnet/Parser.scala), and can be easily extended with new syntaxes if necessary. The standard library (living in https://github.com/databricks/sjsonnet/blob/master/sjsonnet/src/sjsonnet/Std.scala) is also pluggable with arbitrary JVM functions. Not the fastest implementation from a theoretical point of view (could easily be beated by compilation/JIT/fast-interpreter techniques) but much faster than google/jsonnet, which is itself fast enough for most things.

ironchefpython

unread,
Mar 8, 2019, 4:12:40 AM3/8/19
to Jsonnet
Li,

I've been very interested in the databricks code, been watching your commits.

It's very similar to my first (non-Truffle) implementation, except rather than my own Json class, I evaluated to a Jackson node. I didn't have implicit memoization however. 

My next improvement was to remove the extra step between the AST and the lazy (unevaluated) data structure and use an ANTLR visitor to construct the evaluation nodes directly, and going from there adding the Truffle annotations. I'm not sure if you'll be able to go down that path, as the Truffle APIs are only documented for pure Java, and use a prepare-sources step that introspects the Java code to generate specialized type and function information.  If you can do that however, you can completely remove your client-server implementation.

Chris

Haoyi Li

unread,
Mar 8, 2019, 4:53:58 AM3/8/19
to ironchefpython, Jsonnet
Sjsonnet also evaluates to an existing JSON library: uJson is widely used as a standalone project =)

Truffle is somewhat orthogonal in terms of needing a client/server to avoid startup overhead: what you need is SubstrateVM, which works well enough on any Java/Scala application that doesn't use reflection (Sjsonnet does not). Someone ran SubstrateVM on Sjsonnet and it just worked, spitting out a self-contained executable that's only a few megabytes and boots and runs in a few milliseconds. Truffle would be an additional steady-state performance boost on top of that.

--
You received this message because you are subscribed to the Google Groups "Jsonnet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jsonnet+u...@googlegroups.com.
To post to this group, send email to jso...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jsonnet/ff264fbd-a76a-44a4-a2a0-d8269b29e1d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ironchefpython

unread,
Mar 8, 2019, 7:58:19 PM3/8/19
to Jsonnet
Sjsonnet also evaluates to an existing JSON library: uJson is widely used as a standalone project =)

Whoops, my mistake, I naively assumed that it was a project of yours given the org name in the uJson package.
Reply all
Reply to author
Forward
0 new messages