C++ target

350 views
Skip to first unread message

dasc...@gmail.com

unread,
May 10, 2015, 10:49:11 AM5/10/15
to antlr-di...@googlegroups.com
Hi all,

I am well aware of the current C++ target attempt, but unfortunately it's taking way too long for me (no commits for months), and I really don't like the "emulate the Java api" approach, which misses a ton of opportunities for a fast and idiomatic C++14 runtime.

So what to do... I have a few options: 1) antlr3, but I target Java too and my current grammar is a huge g4 file and I really don't wanna go back to the olden days, 2) work on Dan's project, but I don't think I would find motivation to work on a target whose design I fundamentally disagree with, 3) start my own C++ target. This one would be C++14 only and the first cut would be exactly enough to fulfill my own pressing needs.

Are there other C++ target attempts out there I should be aware off?

thanks

Terence Parr

unread,
May 10, 2015, 5:27:21 PM5/10/15
to antlr-di...@googlegroups.com
hi. there are a few people that would like to work on an idiomatic C++11/14 version but they can’t start for a couple more weeks apparently. Hmm…ok, the hell with it. Here is the start of a repository:

https://github.com/antlr/antlr4-cpp

People can fork this and play around to their hearts content; eventually we will have something from current C++ team or another. I have filled in a couple of sample files to show potential directory structure and have included the Java code needed for ANTLR to generate C++ code and to run all of the unit tests; I copied from C# target so none of it would actually work but it shows what needs to be done.

Based upon my limited understanding of the latest C++, we would use STL and do memory management using smart pointers like unique_ptr, auto_ptr,weak_ptr, and shared_ptr.

As we have one group going the automatic translation path, it seems to me a new path should look at all of the user facing objects and interfaces and decide on what their names and basic implementations should be in C++. Then, a basic scaffolding could be had with some include files. The only really complicated part is the parsing engine itself but there are plenty of unit tests that you can use to tweak the code until it gets the same answer as the Java version. Eric Vergnaud was able to get multiple targets together in a matter of weeks.

Is CMake the best build for C++? Either way, the target needs to fit within the cross repository architecture that we have, one repository per target. The python bild.py file in antlr4 main repository knows how to build and test all of the targets and package it all up.

With any luck, this initial repository can get people started going through a manual translation path.

Ter
> --
> You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Sam Harwell

unread,
May 10, 2015, 5:57:15 PM5/10/15
to antlr-di...@googlegroups.com
I started one earlier this year for experimenting. It was a private repository, but if people are interested in taking it somewhere I can make it public.
https://github.com/sharwell/antlr4cpp_exp

I'm just glad that I can show this off: ;)
https://github.com/sharwell/antlr4cpp_exp/blob/master/antlr4cpp/antlr/v4/runtime/misc/unordered_ptr_map.hpp

Sam

Sam Harwell

unread,
May 10, 2015, 5:58:49 PM5/10/15
to antlr-di...@googlegroups.com
Note that this repository is based on my "optimized" fork of the runtime, which is more complex than the original Java release and contains certain undocumented functionality which I needed for my IDE development work.

Terence Parr

unread,
May 10, 2015, 6:00:06 PM5/10/15
to antlr-di...@googlegroups.com
hahahaha. that is one crazy declaration. C++ at its finest.
Ter

Terence Parr

unread,
May 10, 2015, 6:00:52 PM5/10/15
to antlr-di...@googlegroups.com
Thanks for making it public, Sam. That should give people a big head start.
Ter

Sam Harwell

unread,
May 10, 2015, 6:01:11 PM5/10/15
to antlr-di...@googlegroups.com
The *really* crazy part is what it looked like before C++ 11...

Ruslan Zasukhin

unread,
May 10, 2015, 6:17:38 PM5/10/15
to antlr-di...@googlegroups.com
On 5/11/15, 1:00 AM, "Terence Parr" <pa...@cs.usfca.edu> wrote:

So now somebody should check Sam files, and copy them to public repository
created by Terence?
>> version but they can¹t start for a couple more weeks apparently. HmmŠok, the
--
Best regards,

Ruslan Zasukhin
VP Engineering and New Technology
Paradigma Software, Inc

Valentina - Joining Worlds of Information
http://www.paradigmasoft.com

[I feel the need: the need for speed]


dasc...@gmail.com

unread,
May 10, 2015, 6:17:40 PM5/10/15
to antlr-di...@googlegroups.com
On Sunday, May 10, 2015 at 11:57:15 PM UTC+2, Sam Harwell wrote:
I started one earlier this year for experimenting. It was a private repository, but if people are interested in taking it somewhere I can make it public.
https://github.com/sharwell/antlr4cpp_exp

I'm just glad that I can show this off: ;)
https://github.com/sharwell/antlr4cpp_exp/blob/master/antlr4cpp/antlr/v4/runtime/misc/unordered_ptr_map.hpp

Sam

Excellent news, I'll look at the repo's put up in this thread.

Your decl' raises a good point: let's not make this a template-heavy C++ target - I am moving away from Boost Spirit in one project because the extensive use of templates increases compile times by a HUGE factor and makes certain compile errors almost opaque. 

That's right - this is still a problem in 2015 compilers, even on fast hardware (especially true now that CLion and other smart IDE's tries to make sense of your C++ code while you edit)

Nothing against templates, just against template-heavy parsers/generators.

Thanks again! 

dasc...@gmail.com

unread,
May 10, 2015, 6:22:05 PM5/10/15
to antlr-di...@googlegroups.com


On Sunday, May 10, 2015 at 11:27:21 PM UTC+2, the_antlr_guy wrote:

Is CMake the best build for C++?   

Absolutely. That's what most open source C++ projects gravitate towards these days, and it's gaining excellent IDE support (CLion, Xcode, Visual Studio, all work with or are targets for CMake)
 

Sam Harwell

unread,
May 10, 2015, 6:23:53 PM5/10/15
to antlr-di...@googlegroups.com
I'm working directly with Terence to figure this out and get proper license notices added to the code I created.

dasc...@gmail.com

unread,
May 10, 2015, 6:41:52 PM5/10/15
to antlr-di...@googlegroups.com


On Sunday, May 10, 2015 at 11:57:15 PM UTC+2, Sam Harwell wrote:

https://github.com/sharwell/antlr4cpp_exp/blob/master/antlr4cpp/antlr/v4/runtime/misc/unordered_ptr_map.hpp


FWIW, this is exactly the approach I was hoping for. Lots of work ahead, but I hope this will be the starting point of the official C++ target.

Sam, will you be putting in some time into this in the near future? It would be great if you at least made sure any pull requests follows the idiomatic route :)

Thanks again for making this public!

Dan McLaughlin

unread,
May 10, 2015, 8:05:16 PM5/10/15
to antlr-di...@googlegroups.com
For the record while I think it makes good sense for interested parties to work together on a single C++, target, if it turns out a different approach gets there faster that's fine with me too. Certainly Sam is better qualified than anybody (except Ter), so I fully support anything he does.

To OP, just for sake of argument, I don't think the automated approach is missing much in the way of optimization. The runtime isn't 'thick' enough for that kind of issue IMO, and certainly it is easy enough to optimize after the fact as we always do. 

dasc...@gmail.com

unread,
May 11, 2015, 2:43:08 AM5/11/15
to antlr-di...@googlegroups.com

On Monday, May 11, 2015 at 2:05:16 AM UTC+2, Dan McLaughlin wrote:
To OP, just for sake of argument, I don't think the automated approach is missing much in the way of optimization. The runtime isn't 'thick' enough for that kind of issue IMO, and certainly it is easy enough to optimize after the fact as we always do. 

Stuff like two dynamic_cast's in the common path of a tight tree-walker definitely concerns me, as does the lack of move semantics everywhere. Moreover, I find it hard to reason about memory ownership, since std::shared/unique_ptr isn't used.  How easy is it to fix these issues after the fact?

Thanks.

Dan McLaughlin

unread,
May 11, 2015, 8:35:08 AM5/11/15
to antlr-di...@googlegroups.com
The dynamic_casts were done by the machine translation and we've gotten rid of most of them, on the rest of the memory mgmt we've been too busy fixing errors (mostly container API) to worry much about it yet. Shrug ... cross cutting concerns are hard to fix in a large released system (but I've done it commercially many times), but in an unreleased API isn't a big deal IMO. 

Andy Somogyi

unread,
May 13, 2015, 1:16:00 PM5/13/15
to antlr-di...@googlegroups.com
Hi, 

I'm starting to work on parser for a language based on python, using a C++ target. 

Currently, I'm using the ANTLR3 C++ target, and porting a grammar that was originally a Java target. 

I know the ANTLR4 C++ target is very early, but PLEASE, PLEASE, for the love of anything that is holy, please do not follow the approach used in ANTLR3 C++ target. 

The Java target code is simple, easy to read, easy to follow, I can figure out whats going on in a few minutes. The Java target API is very well designed. 

On the other hand, I've spent the last 3 days trying to figure out what all these template meta-programming templates expand to, I still have no real clue of what vars are of what type yet. Now, I'm a reasonably intelligent guy, 20 years experience with Scheme/C/C++/ObjectiveC/assembler..., etc.., published articles on JIT compilers, and have PhD in physics, but I'm still completely lost wading through the ANTLR3 template meta programming C++ target api.   

So, my request is that the new ANTLR4 C++ target api follows the Java target API fairly closely. 

And if you guys do need some help with the  ANTLR4 C++ target, I might have some time here in a month or so to help out. 

Terence Parr

unread,
May 13, 2015, 2:21:32 PM5/13/15
to antlr-di...@googlegroups.com
Hi. I also am opposed to templates gone wild. Sam’s experimental repo seems ok from quick peek.

https://github.com/sharwell/antlr4cpp_exp/blob/master/antlr4cpp/antlr/v4/runtime/tree/parse_tree.hpp

T

Ruslan Zasukhin

unread,
May 14, 2015, 12:12:13 AM5/14/15
to antlr-di...@googlegroups.com
On 5/11/15, 9:43 AM, "dasc...@gmail.com" <dasc...@gmail.com> wrote:

> Stuff like two dynamic_cast's in the common path of a tight tree-walker
> definitely concerns me, as does the lack of move semantics everywhere.
> Moreover, I find it hard to reason about memory ownership, since
> std::shared/unique_ptr isn't used. How easy is it to fix these issues after
> the fact?

Hi Guys,

1) I think you know that in fact exists 2 mainstreams, they both described
in C++ book.

1.1. Interfaces, COM/ActiveX uses it heavy, based on I_Unknown
interface.

1.2. Templates.


There projects that tend to use only Interfaces, there are projects that try
resolve all via templates. But gold middle is to mix both.



2) Smart pointers was yet in COM classes from MS.

I suggest always to define in code typedef, because this will allow easy
enough switch from one kind to another of smart ptrs, not touching rest
code.

We use in our project notation as

I_File_Ptr foo( );

Thanks to set of macroses (see below) and our own smart_ptr<> class with
intrusive counter.


3) Please note that STL do not offer smart_ptr class with intrusive counter.
And this is the fastest smart_ptr.

Shared_ptr<> is more common case, which can work with classes when you have
no their .cpp files, but only headers.

But you going to develop own set of C++ classes. So intrusive smart ptr is
best choice.


//**************************************************************************
SMART_INTERFACE( I_File );
SMART_CLASS( Predicate_Field );


//**************************************************************************
// Smart pointer for Valentina kernel.
//
#define SMART_PTR(ptr_class, name)\
typedef ptr_class<name> name##_Ptr

#define CONST_SMART_PTR(ptr_class, name)\
typedef ptr_class<const name> Const_##name##_Ptr

#define FULL_SMART_PTR(ptr_class, name) \
SMART_PTR(ptr_class, name);\
CONST_SMART_PTR(ptr_class, name)

#define FBL_SMART_PTR(name) \
FULL_SMART_PTR( FBL::smart_ptr, name )

#define SMART_INTERFACE(name) \
interface name; \
FBL_SMART_PTR(name)

#define SMART_CLASS(name) \
class name; \
FBL_SMART_PTR(name)

#define SMART_STRUCT(name) \
struct name; \
FBL_SMART_PTR(name)

#define SMART_VALUE_INTERFACE(name) \
interface name; \
FULL_SMART_PTR(fbl_smart_value_ptr, name)

Ruslan Zasukhin

unread,
May 14, 2015, 2:43:42 AM5/14/15
to antlr-di...@googlegroups.com
On 5/14/15, 7:12 AM, "Ruslan Zasukhin" <ruslan_...@valentina-db.com>
wrote:

> 2) Smart pointers was yet in COM classes from MS.

Besides interfaces based on I_Unknown interface, can be

A) simple - I_Unknown provides only counter and 2 virtual methods
Add()
Release()


B) I_Unknown contains yet one method QueryInterface()...
It is proved that it is faster of dynamic_cast<>()

We have started in our project with case A),
And only about 1-2 years later we have decide add case B) also, when we have
meet complex inheritance in one branch of classes. This was the only
effective way.

Andy Somogyi

unread,
May 14, 2015, 11:29:05 AM5/14/15
to antlr-di...@googlegroups.com
I took a look at Sam's repo, and that approach looks completely and totally acceptable. He uses standard C++ types, standard class designs, fairly clean and readable. I think its an excellent start. 

Now, I'm not saying templates are evil, they certainly have their place, like what there intended to do: generic programming such as collection classes, smart pointers, etc...

Yiqing Yang

unread,
May 27, 2015, 4:15:52 PM5/27/15
to antlr-di...@googlegroups.com
Hi,

 I am very happy to see the C++ target effort was kicked off. Please make tree parser a first class citizen this time. I am still waiting for the tree parser support in C++ target for Antlr 3.

Is the Antlr4 grammar a lot different than Antlr3? I need to start a new parser project right now. I can't wait for the Antlr4 C++ support to be completed before starting my project. I wonder if I use  Antlr3 for now, how much effort will be needed to convert it to Antlr4 later on?

Thanks,

Yiqing Yang

Jim Idle

unread,
May 27, 2015, 10:05:43 PM5/27/15
to antlr-di...@googlegroups.com
On Thu, May 28, 2015 at 4:15 AM, Yiqing Yang <yiqin...@gmail.com> wrote:
Hi,

 I am very happy to see the C++ target effort was kicked off. Please make tree parser a first class citizen this time. I am still waiting for the tree parser support in C++ target for Antlr 3.

I don't really know the state of the c++ target in v3 - I personally would not go that route.
 

Is the Antlr4 grammar a lot different than Antlr3?

It depends on what things you have used in v3. If you can avoid predicates then moving to v4 is easy, but even if you cannot, then a lot of the time, removing the predicates allows v4 to "just work"
 
I need to start a new parser project right now. I can't wait for the Antlr4 C++ support to be completed before starting my project. I wonder if I use  Antlr3 for now, how much effort will be needed to convert it to Antlr4 later on?

Well, as you are starting out with moving to v4 in mind, you should be able to keep the grammar in check by making incremental changes and then trying it with v4. If you use a source code control system, you can create a stub for the v3 grammar, then branch it to the v4 grammar. Then as you add things to the v3 grammar, integrate the changes to the v4 branch and make small changes. At the end you will have both grammars.

That said, the grammar probably is not your biggest issue. If you just start with a v3 grammar it isn't too difficult to convert to v4. But if you are relying on a v3 built AST, then all your other code will center around that and will be tied in to the v3 C target if you are not careful.

If I were you I would skip building the v3 tree and generate your own AST directly. Google will show you lots of posts from me about integrating C++ with C parsers. Basically, just make sure you generate a thin interface that accepts ANTLR3 tokens (et al), extracts the string or other information you need, and passes that on to a C++ interface. This thin interface uses extern "C" linkage. Then your grammar file contains minimal actions to call this interface directly and build your AST.

Your grammar need then only specify the include file for your thin interface.

When you can move to v4, then you can use a generated walker to call your C++ interface (which will otherwise not change).

So basically:

  • Don't bother with the C generated AST and tree parser
  • Create an extern "C" very thin interface for building your own AST (thousands of C++ examples of a standard accept/visit tree are online
  • Make grammar actions as simple as possible with all manipulation etc in the interface and not in the action code
  • Watch out for null pointers in the event of syntax errors
I have done this lots of times - there are few issues other than what to do with tree building if you recover from a syntax error  - but that is always solvable. 

Jim

Sam Harwell

unread,
Jun 4, 2015, 10:10:43 PM6/4/15
to antlr-di...@googlegroups.com

Hi Yiqing,

 

ANTLR 4 does not use tree parsers. Instead, it provides automatically-generated listener and visitor data structures that walk a parse tree, and the parse tree shape is always determined by the grammar. We have found this strategy to be much more maintainable in project ranging in size from small school assignments to large-scale applications such as IDE integrations and data processing pipelines.

 

Including support for parse trees, listeners, and visitors will certainly be a high priority for the target. However, keep in mind that the C++ target is still not ready for use, and there is no timeframe established for when that will change.

 

Thank you,

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

 

From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of Yiqing Yang
Sent: Wednesday, May 27, 2015 3:16 PM
To: antlr-di...@googlegroups.com
Subject: Re: [antlr-discussion] C++ target

 

Hi,

dasc...@gmail.com

unread,
Aug 15, 2015, 2:20:52 PM8/15/15
to antlr-discussion
Are you still working on (the very promosing) C++ target Sam?

Thanks.
dcg

dasc...@gmail.com

unread,
Dec 21, 2015, 9:21:51 AM12/21/15
to antlr-discussion
Bump.

Sam, are you still going to work on this? I want to put in some (but not much) work into the c++ target, but both attempts seem dead. Any news?

Thanks.
dcg
Reply all
Reply to author
Forward
0 new messages