Version info API?

Graham Wideman

unread,

Nov 28, 2012, 7:56:44 AM11/28/12

to antlr-di...@googlegroups.com

Is there any thinking on sensible/standard approaches whereby an application can learn useful version info?

Version of grammar file(s) used? Currently, the grammar author could establish some convention involving some embedded code. Is there already a convention for this, or thoughts on a native grammar-language feature and API?
Version of ANTLR Tool used. I don't think that info is embedded anywhere in the generated code, is it?
Version of ANTLR runtime currently loaded (might currently be discoverable from jar metadata, not sure).

Thoughts?

-- Graham

Sam Harwell

unread,

Nov 28, 2012, 9:32:08 AM11/28/12

to antlr-di...@googlegroups.com

Hi Graham,

I’ve used #1 and #2 since ANTLR 3, and added #3 since version 4, and have been extremely happy with the results.

1. Include and enforce the required ANTLR version as part of your build. Maven allows you to do this easily. For Ant or other systems, you’ll have to find another way. I know the NetBeans build (Ant-based) automatically downloads the correct version of dependencies from a NetBeans server at build time.

2. Always build your grammars as part of the compile process. Again Maven makes this easy. Among other things, this ensures that if you switch branches in a VCS like Git and recompile, the result will be correct.

3. I use the rule versioning mechanism described in issue antlr/antlr4#35 (in particular the rule annotations method described first). I use an annotation processor to verify all dependencies at compile time (GoWorks currently has 690, ANTLRWorks 2 currently has ~250). NetBeans evaluates the annotation processor in the IDE, providing feedback about mismatched dependencies even before the build. This feature is not available in the reference distribution of ANTLR 4, but has truly been instrumental in allowing me to rapidly build applications like GoWorks with dramatically reduced likelihood of regression bugs as I update the grammar over time.

a. Even though the C# compiler doesn’t support annotation processors like javac, I’ll be implementing this feature as a Visual Studio extension plus a static verifier for .NET assemblies that can be used through MSBuild.

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

--

Graham Wideman

unread,

Nov 28, 2012, 10:00:35 PM11/28/12

to antlr-di...@googlegroups.com

Wow Sam, you've certainly thought through this territory quite a bit, and your exploration of rule-level versioning is very interesting. And I hear you on making use of maven's features where possible to require matching versions of everything used in the build process. However, I think that doesn't entirely overlap with what I was after:

A. The developer of the grammar/lever/parser is not necessarily the same party using the lexer/parser. So downstream builds won't necessarily start with running ANTLR Tool, and so can't enforce what version of ANTLR Tool was used. (At least, there's hope of that separation -- perhaps wishful thinking?) Hence my interest in having the ANTLR Tool version somehow automatically stamped into the generated code so it can be checked.

B. I was also interested in overall versioning of the grammars (ie: .g4 files), I think your initiatives didn't cover that (though your rule-level versioning does something much finer-grained.)

C. I'm interested in application run time access to Tool and grammar versioning info, to be able to support dynamic loading of different parsers/lexers and ANTLR runtimes, or at least run time detection of problems.

-- Graham

Sam Harwell

unread,

Nov 28, 2012, 11:06:10 PM11/28/12

to antlr-di...@googlegroups.com

In general, I approach this entire issue asking “How can I ensure that changes to my code base do not introduce regression bugs or unexpected behavior?” Traditional “informational” version numbers are almost useless in this regard, which is why the rule versioning I’m using allows for tight control over dependencies and includes a static verifier. If you have specific suggestions for altering the build or generated code related to this topic, be prepared to explain how that particular method is best suited for ensuring correct behavior in a team development environment.

For part B:

For file-level versioning, you have the unique commit hashes or version numbers in the version control system.

For part A:

The Maven plugin for ANTLR 4 will automatically download the correct version of the tool in order to build the parsers from a .g4 file at compile time, even if the person does not have ANTLR 4 tool in advance (or has a different version of it). If the parsers are generated as an integrated part of the compilation process, then you have absolute control over which version of the tool is used to generate the parsers. The only question becomes which version of the v4 runtime is in the classpath when they run their program, so for that we have…

For parts A and C:

The ATN serializer/deserializer does not currently put a “format” version number in the encoded string. I’ll definitely be adding that before we release v4. Any truly breaking change in the runtime after the v4 release that would cause a grammar to become non-functional will likely result in a change to the serialization format, so you’ll get an exception when such a grammar is loaded if the runtime library is mismatched.

I updated the Maven build to include the following in the MANIFEST.MF for all of the upcoming ANTLR jars, which you can access as described in the following link:

http://stackoverflow.com/questions/5204297/put-version-to-my-java-application-netbeans

Implementation-Title: ANTLR 4 Tool

Implementation-Version: 4.0-SNAPSHOT

Implementation-Vendor-Id: org.antlr

I’d like to include the @Generated annotation in the generated code and include the version of the tool that produced it, but that annotation is marked with the SOURCE retention policy so I think that means you won’t have access to it at runtime.

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

--

Graham Wideman

unread,

Nov 29, 2012, 12:57:52 AM11/29/12

to antlr-di...@googlegroups.com

Hi Sam,
Your work in this area sounds valuable, and certainly thought out considerably. And of course your criteria that versioning should defend against regression bugs and unexpected behavior are most agreeable.

Addition of version info to MANIFEST sounds good.

One note: I really like the idea of the @Generated annotation, and it looks to me like you can get that to retain through to runtime by preceding it with a (meta) annotation of @Retention(RetentionPolicy.RUNTIME). At least, so says Wikipedia :-).

-- Graham

Sam Harwell

unread,

Nov 29, 2012, 5:27:19 AM11/29/12

to antlr-di...@googlegroups.com

Unfortunately I can’t change the retention policy since the @Generated annotation is included since Java SE 6 as javax.annotation.Generated (I hyperlinked it in the last email).

You should know I’m not here to shoot down your ideas. If you come up with any other specific suggestions to address your needs in this area, please let us know. :)

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of Graham Wideman
Sent: Wednesday, November 28, 2012 11:58 PM
To: antlr-di...@googlegroups.com
Subject: Re: [antlr-discussion] Version info API?

Hi Sam,

--

Graham Wideman

unread,

Nov 29, 2012, 6:51:11 AM11/29/12

to antlr-di...@googlegroups.com

Sam, yes I realize that Java SE 6 defines @Generated, but I mistakenly thought that Java allowed you to override annotation default parameters by applying an annotation @Retention(RetentionPolicy.RUNTIME) to the @Generated annotation where it is used. Evidently not. And I also see there's no subclassing of annotations.

OK, looks like that's a dead end. Nothing to stop one making up one's own "@Generated2" annotation I supposed, but presumably that loses the benefit that might be gained from IDEs knowing about the standard @Generated, and showing/hiding generated code and so forth.

What level of granularity were you considering applying @Generated to?

-- Graham

Sam Harwell

unread,

Nov 29, 2012, 6:56:42 AM11/29/12

to antlr-di...@googlegroups.com

It could be applied to the following:

· The top level lexer and parser classes

· The generated listener and visitor interfaces and base class implementations

This should cover the complete contents of those files, including the nested context classes in the parser.

I strongly feel that a fixed version of a code generator with identical input on two runs should produce the same result, so there definitely won’t be a timestamp included in the @Generated annotation.

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of Graham Wideman
Sent: Thursday, November 29, 2012 5:51 AM
To: antlr-di...@googlegroups.com
Subject: Re: [antlr-discussion] Version info API?

Sam, yes I realize that Java SE 6 defines @Generated, but I mistakenly thought that Java allowed you to override annotation default parameters by applying an annotation @Retention(RetentionPolicy.RUNTIME) to the @Generated annotation where it is used. Evidently not. And I also see there's no subclassing of annotations.

--

Graham Wideman

unread,

Nov 29, 2012, 5:32:24 PM11/29/12

to antlr-di...@googlegroups.com

Hi Sam,

Thanks again for sharing your thoughts and plans regarding versioning. Couple of questions:

1. Your plan to apply @Generated at the granularity of top level classes was sort of what I expected. I'm assuming @Generated would include the code generator name, perhaps the qualified class of ANTLR Tool? Would you include the ANTLR version number here?

2. Currently, ANTLR Tool produces a comment line at the top of lexer and parser generated code like this, literally:
// $ANTLR ANTLRVersion> XMLParser.java generatedTimestamp>
It looks like the idea was to insert actual ANTLRVersion and generation-time time stamp. Do you know if that's still in the plan?

Thanks,

-- Graham

Terence Parr

unread,

Nov 29, 2012, 7:02:36 PM11/29/12

to antlr-di...@googlegroups.com

I'll have to think about this more when i get a chance…semester is almost over!
T

> --
>
>

Sam Harwell

unread,

Nov 29, 2012, 8:46:24 PM11/29/12

to antlr-di...@googlegroups.com

1. Yes, it would be appropriate to include the version number of the tool used to generate the code in the @Generated annotation.

2. I hope to modify this notice to include the ANTLR version used to generate code, but not include a timestamp.

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of Graham Wideman
Sent: Thursday, November 29, 2012 4:32 PM
To: antlr-di...@googlegroups.com
Subject: Re: [antlr-discussion] Version info API?

Hi Sam,

--

Graham Wideman

unread,

Nov 30, 2012, 1:39:03 AM11/30/12

to antlr-di...@googlegroups.com

Hi Sam,

> 1. ANTLR version number in @Generate:
Sounds good.

> 2. Omit timestamp from initial // $ANTLR comment

I had a suspicion you might say that. It seems you'd like to avoid having any timestamp within the generated file(s).

For my part, I think there is considerable value to retaining at least one timestamp, and hope you would reconsider this. It's also not clear to me the benefit of omitting the timestamp, but perhaps you have a compelling argument.

What you've said previously in this thread is that a particular version of ANTLR, presented with the same input, should always generate the same output. This is certainly what is supposed to happen. But the value I see in a timestamp is to troubleshoot why this expectation wasn't met, or some other aberration occurred. Having worked with pipelines of many kinds and sophistications over decades, even minimal provenance info like timestamps has often been crucial to understanding the cause of unexpected outcomes.

Specific to ANTLR, several situations come to mind:

1. Someone re-ran ANTLR Tool with a different grammar file that was put in place temporarily by mistake: a timestamp helps track down who did what, when. Overlaps somewhat with the generated file's file-system date, but not affected by file operations setting a different date (and allows checking whether such a file date change has occurred).

2. ANTLR Tool was run at a juncture where ANTLR's version number was not incremented, or has a generic component like SNAPSHOT. Timestamp is not a substitute for sophisticated versioning, but at least provides some capability to double check the history of generated code.

3. Somebody has edited the generated code. In this case the file system date-time will be substantially more recent than the ANTLR-inserted timestamp. Not a perfect detection, but way better than nothing.

In short, at the moment I don't see a great cost to a timestamp, and feel that it has considerable benefit in increasing confidence that the expected things are happening, and troubleshooting when they do not.

I realize that this timestamp is a relatively small part of the overall versioning picture, one that I've not fully thought through for sure, but just thought I should register my support for the lowly timestamp as a complement to more sophisticated version relationships, build tools etc.

-- Graham

Sam Harwell

unread,

Nov 30, 2012, 2:02:59 AM11/30/12

to antlr-di...@googlegroups.com

Item 1 cannot happen if the grammar generation process is part of the build itself.

In dealing with item 2, I force regeneration of all grammars after certain changes to the ANTLR tool itself to verify items which were intended to not affect the code. If a time stamp is included in the output, then the tools always report that changes occurred forcing me to manually verify the output of every grammar, every time. The timestamp causes false positives to reach 100% when the tool is working correctly.

Item 3 on your list is a bit misleading. If the generated code is in source control, then simply switching branches will touch the files so the last modified time will never be “stable”.

If after all this you still want to keep the generated parsers in source control, you have the additional consideration that most developers do not hand-edit files before committing them to ensure that only essential changes are committed in order to reduce commit sizes and complexity for reviewers. If each developer is getting new time stamps for the generated code even when nothing changed, these files will have a high rate of resubmission without actual changes which makes it much harder to look through the file history if necessary.

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of Graham Wideman
Sent: Friday, November 30, 2012 12:39 AM
To: antlr-di...@googlegroups.com
Subject: Re: [antlr-discussion] Version info API?

Hi Sam,

--

Graham Wideman

unread,

Nov 30, 2012, 7:10:02 PM11/30/12

to antlr-di...@googlegroups.com

Hi Sam,
I'm so glad I asked -- you've raised an objection to timestamps that I can see is important. Avoid gratuitous changes to generated files which would result in reports indicating the file had changed, when in fact the changes are insignificant. And all the more important if it's an impediment to development at ANTLR World HQ. Fair enough.

How about an option?

The arguments you make that timestamps don't provide value seem reliant on:

a) Projects build starting with ANTLR Tool and grammar file.
b) ANTLR being used consistently with a build system and version control.

I think the argument in favor of timestamps is to help troubleshoot what happened when things don't work. Perhaps that's more frequent when not using, for example, maven, and not using version control religiously. For example, for the 10's of thousands of readers of Ter's new book, at least while they are doing the exercises using command-line antlr4 and javac and experimenting with variations: "Why does this parser behave unexpectedly, and oh wait, is this the one I generated yesterday, or just now?"

Also, I think there's a use case where one development group is responsible for development of a lexer/parser, which they hand off, with generated source, but grammar-file-not-included (or at least no expectation of rerunning ANTLR), to other groups. A parallel case is where the target language is in a non-java/maven/etc ecosystem. For this I'm pretty sure that having a date stamp would at least provide some lowest-common-denominator basis for communication about problems.

So I think there are valuable arguments both ways. Would you consider a flag to enable/disable timestamps? Perhaps for ad hoc or less-disciplined use, timestamps are on by default, but a -no-timestamps flag turns them off for those who've progressed to using ANTLR in a more disciplined way and want to avoid the gratuitous changes.

Terence Parr

unread,

Nov 30, 2012, 7:34:49 PM11/30/12

to antlr-di...@googlegroups.com

hi. one could argue file timestamps from OS are good enough, though if you copy a file the stamps don't always get copied.
Ter

> --
>
>

Graham Wideman

unread,

Nov 30, 2012, 8:36:29 PM11/30/12

to antlr-di...@googlegroups.com

"Fred, you say you need a parser for SimpleXYZ? I got one I made recently with ANTLR, and I'll email it right over". Goodbye OS file timestamps. (Yeah, should be done using archive-unarchive which retains file times, but that's hit and miss.)

Hey, if you guys hate timestamps, I don't want to burn up goodwill lobbying for them :-). And Sam's case for suppressing them is a strong one. Just sayin', in years of working with processes large and small that don't always go according to plan, timestamps have allowed troubleshooting problems, and giving additional confidence that we could solve them if need be. The fact that @Generate contains a slot for generation timestamp reflects that experience, I think, even if @Generate may not be the perfect place to put timestamps for ANTLR. So, a timestamp option would be welcome by me, but not at the cost of annoying you guys.

Reply all

Reply to author

Forward