Using Boost.Spirit in production code

Richard Smith

unread,

Nov 10, 2009, 4:26:21 PM11/10/09

to

I would be interested to know the opinion of this newsgroup on whether
it is sensible using Boost.Spirit in production code. In a project
I'm working on, I'm likely to need to produce three non-trivial C++
parsers: one for a network protocol quite similar in structure to
HTTP, one for a expression parser (not unlike the language used by the
Unix tool, bc, but for domain-specific data-types), and one for a
particular SGML language (considerably simpler than XML, assuming
there are no unanticipated complications).

All of these components are things for which I would be comfortable
implementing hand-crafted parsers, but equally if there are better
ways of generating moderately efficient and, critically, easily
maintainable parsers, I would be keen to use them. Boost.Spirit seems
to be one of the more obvious possibilities.

However, having experimented with Boost.Spirit a bit, I have a number
of concerns about its appropriateness for use in production code and I
would be interested in others' opinions.

* Documentation. Given the size of the library, its documentation is
really fairly lightweight and I've invariably found myself reading the
code to find out how things work. Just to take two examples, where is
it documented what characters alpha_p matches, and where is it
documented which headers I should include to use it?

* Compile times. Perhaps there are implementation techniques that I'm
missing, but most of the non-trivial examples I've experimented with
take serious long times to compile. In one case, over an hour for a
single translation unit. I prefer to work with a rapid modify-
recompile-test development cycle, but I don't see that being feasible
if I use significant Boost.Spirit components.

* Error messages. Introduce an error into the code and, frankly, the
resulting verbiage emitted by the compiler is utterly impenetrable.
This is, of course, true of many complex template libraries in C++,
and maybe when C++ (eventually) gains concepts, it will improve. But
it doesn't help with today's language.

* Poor IOStream interoperability. There are two aspects here. First,
it would be nice if, when I produced an LALR(1) parser, it would work
with InputIterators without my needing to adapt them with multi_pass.
(Admittedly, I'm not sure exactly how that could work as I cannot see
how the compiler can work out at compile time whether the grammar is
LALR(1).) Careless buffering by multi_pass could easily kill one of
the applications I have in mind. Secondly, it would be nice if there
were some easy way to keep input and output in sync, if not by having
a single function that does both (in simple cases, I've seen the %
operator overloaded to reasonable effect to implement both << and >>),
then by having similar-looking input and output functions that leave
it easy to verify by eye their compatibility. Maybe that's something
I can still build on top of Boost.Spirit, but that sounds a daunting
dask.

However, in other ways, I like the look of Spirit. The BNF-form of
the code is much closer to the specification I'm working to -- this
sounds like a good way of making sure the two stay in sync as the
underlying specifications evolve (which I expect them to do). To my
pleasant surprise, the object code produced by Spirit is concise and
efficient. And I'm sure that as I get more familiar with it, I'll get
better at writing correct code faster. Looking around, I also see
many quite positive comments about it.

So, what is the opinion here? Is it worth pursuing Boost.Spirit?

--
Richard Smith

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Chris Morley

unread,

Nov 11, 2009, 7:59:09 PM11/11/09

to

> So, what is the opinion here? Is it worth pursuing Boost.Spirit?

I would definitely recommend you use machine written parsers and not hand
written ones, even if the language is pretty trivial. The grammar files are
much easier to review at a later date than c++ when you look at the system
6,12,24 months time. You are quite right that the grammar is essentially
documentation.

I've not used Boost.Spirit, in the intro they say it is simpler than BISON
or ANTLR. I wonder what functions you might need down the line which the
likes of Bison/Antlr have... I use Bison and it is actually very easy to use
with either a machine written scanner (e.g. Flex) or hand written. Bison
doesn't do C++ very well (its roots are C) but the skeleton is ok and easy
enough to handle. Building a parse tree or running semantic actions directly
in the grammar is equally simple. Now I know it, I don't think Bison is
overkill for even the most trivial parsers - in fact I wish I'd used Flex
too for some trivial scanners despite fact it doesn't play brilliantly with
C++.

To answer you concerns about Spirit with my Bison experience... (i'm sure
most would apply to antlr too)
> * Documentation
There is stacks on the internet & paper books. Anything related to YACC is
applicable too. comp.compilers users are very helpful if you truly get stuck
with your grammar. In fact you can often find antlr/yacc/bison grammars on
the internet for common languages/formats.

>* Compile times
Fast. Set up the dependency in the project, bison turns the .y/.yy grammar
into C/C++ quickly & compile/link as normal. Matter of seconds for Bison to
parse a C complexity grammar on a modern machine.

>* Error messages
Basic errors are easy enough but bugs in your grammar can take a bit of
learning. The debug output is there though to find out what you did!
Shift/reduce & reduce/reduce conflicts etc. not complicated to learn about
but can be confusing if you're new to e.g. LALR(1) parsing. (sounds like you
aren't though) As with most things there are newbie pitfalls.

> * Poor IOStream interoperability
Roll your own scanner & interface how you like. Bison eats tokens & will
reduce what is can based on the input upto that point - so it is in sync.
Even use different scanners at runtime (wide char support? no problem!).
Parsing human input (e.g. calculator) line by line, no problem.

So my advise boils down to: Yes definitely use a machine parser. Don't worry
if it seems overkill as if your "specifications evolve" as you suggest they
might you may save yourself a lot of effort later on. Bison/Antlr are well
established and well used, fast and reliable. I'd suggest you use one of
those two (or similar) - sounds like you already have too many question
marks about Spirit for your application.

Chris

Jeff Flinn

unread,

Nov 12, 2009, 1:33:12 PM11/12/09

to

Richard Smith wrote:
> I would be interested to know the opinion of this newsgroup on whether
> it is sensible using Boost.Spirit in production code.

I've done just that in several high end engineering applications for
several different companies. These were all using what is now known as
spirit classic, I haven't been on a platform lately where I could use
the soon to be released spirit 2.1.

[snip]

> * Documentation. Given the size of the library, its documentation is
> really fairly lightweight and I've invariably found myself reading the
> code to find out how things work. Just to take two examples, where is
> it documented what characters alpha_p matches, and where is it
> documented which headers I should include to use it?

I thought the docs were fabulous, they were my 1st exposure to parsing
technology and got me up and running quickly. alpha_p is in the "More
character parsers" section at the bottom of:

http://www.boost.org/doc/libs/1_40_0/libs/spirit/classic/doc/primitives.html

"alpha_p Matches alphabetic characters"

> * Compile times. Perhaps there are implementation techniques that I'm
> missing, but most of the non-trivial examples I've experimented with
> take serious long times to compile. In one case, over an hour for a
> single translation unit. I prefer to work with a rapid modify-
> recompile-test development cycle, but I don't see that being feasible
> if I use significant Boost.Spirit components.

I'm using spirit in an agile environment with no problem. Parser compile
times are comparable to some of the existing hand coded crap, oops I
mean parsers, that are spread over several files. I extensively use
functor parsers which allow fine grain development and unit testing of
components which can then be reused in larger components.

> * Error messages. Introduce an error into the code and, frankly, the
> resulting verbiage emitted by the compiler is utterly impenetrable.
> This is, of course, true of many complex template libraries in C++,
> and maybe when C++ (eventually) gains concepts, it will improve. But
> it doesn't help with today's language.

I'm no error message whiz, but after a few of these you recognize
patterns w/out needing to read the verbiage. 9 out of 10 times compile
errors are due to using boost::bind with improper number/type of arguments.

> * Poor IOStream interoperability. There are two aspects here. First,
> it would be nice if, when I produced an LALR(1) parser, it would work
> with InputIterators without my needing to adapt them with multi_pass.
> (Admittedly, I'm not sure exactly how that could work as I cannot see
> how the compiler can work out at compile time whether the grammar is
> LALR(1).) Careless buffering by multi_pass could easily kill one of
> the applications I have in mind. Secondly, it would be nice if there
> were some easy way to keep input and output in sync, if not by having
> a single function that does both (in simple cases, I've seen the %
> operator overloaded to reasonable effect to implement both << and >>),
> then by having similar-looking input and output functions that leave
> it easy to verify by eye their compatibility. Maybe that's something
> I can still build on top of Boost.Spirit, but that sounds a daunting

> task.

IIRC, the boost mailing list had a posting that spirit 2.1 now accepts
forward iterators at least. I've either directly used memory mapped
addresses or string input.

> However, in other ways, I like the look of Spirit. The BNF-form of
> the code is much closer to the specification I'm working to -- this
> sounds like a good way of making sure the two stay in sync as the
> underlying specifications evolve (which I expect them to do). To my
> pleasant surprise, the object code produced by Spirit is concise and
> efficient. And I'm sure that as I get more familiar with it, I'll get
> better at writing correct code faster. Looking around, I also see
> many quite positive comments about it.
>
> So, what is the opinion here? Is it worth pursuing Boost.Spirit?

Yes, raise your specific needs on the boost spirit mailing list. The
support from joel, hartmut, et al is phenomenal.

Jeff

Joe

unread,

Nov 13, 2009, 1:59:22 AM11/13/09

to

On Nov 10, 3:26 pm, Richard Smith <rich...@ex-parrot.com> wrote:
> I would be interested to know the opinion of this newsgroup on whether
> it is sensible using Boost.Spirit in production code.

I have had the same questions, but unlike you have not tested the library
yet. I am very interest in the results of the thread. BTW, was your review
of Spirit based upon the Version 2.x of the library that i think is about to
be release in the next version of boost. It is suppose to be much better.

Also have you look at the Boost library, Xpressive.

Joe

SeanW

unread,

Nov 13, 2009, 3:42:52 PM11/13/09

to

On Nov 10, 4:26 pm, Richard Smith <rich...@ex-parrot.com> wrote:
> So, what is the opinion here? Is it worth pursuing Boost.Spirit?

I think Spirit is very slick, but decided against
using it when I considered what would happen if
I got into trouble. It's one thing to see a 100KB
error message and try to sort it into one of a few
categories as Jeff Flinn says above, but what if
you've got to actually stick your hand in that toilet
with a debugger when you have some problem in the
field? I couldn't bear the thought, so went with
one of the old-school parser generators.

Sean

CornedBee

unread,

Nov 13, 2009, 3:41:40 PM11/13/09

to

On Nov 10, 10:26 pm, Richard Smith <rich...@ex-parrot.com> wrote:
>
> However, having experimented with Boost.Spirit a bit, I have a number
> of concerns about its appropriateness for use in production code and I
> would be interested in others' opinions.
>
> * Documentation. Given the size of the library, its documentation is
> really fairly lightweight and I've invariably found myself reading the
> code to find out how things work. Just to take two examples, where is
> it documented what characters alpha_p matches, and where is it
> documented which headers I should include to use it?

I agree about the headers, but other than that I found the docs to be
quite good.

>
> * Compile times. Perhaps there are implementation techniques that I'm
> missing, but most of the non-trivial examples I've experimented with
> take serious long times to compile. In one case, over an hour for a
> single translation unit. I prefer to work with a rapid modify-
> recompile-test development cycle, but I don't see that being feasible
> if I use significant Boost.Spirit components.

If you're using GCC, upgrade to 4.4. It should have greatly increased
the speed here. But yes, Spirit is a very metaprogramming-heavy
library and takes a long time to compile. You should make sure that
you separate spirit parsers into their own source files.

>
> * Error messages. Introduce an error into the code and, frankly, the
> resulting verbiage emitted by the compiler is utterly impenetrable.
> This is, of course, true of many complex template libraries in C++,
> and maybe when C++ (eventually) gains concepts, it will improve. But
> it doesn't help with today's language.

Yes. It's the fate of any template library in C++.

> Secondly, it would be nice if there
> were some easy way to keep input and output in sync, if not by having
> a single function that does both (in simple cases, I've seen the %
> operator overloaded to reasonable effect to implement both << and >>),
> then by having similar-looking input and output functions that leave
> it easy to verify by eye their compatibility. Maybe that's something
> I can still build on top of Boost.Spirit, but that sounds a daunting
> dask.

Spirit 2 contains Karma and Qi, one for producing output, the other
for parsing. They use extremely similar syntax specifications.

Sebastian

Maxim Yegorushkin

unread,

Nov 15, 2009, 4:37:57 PM11/15/09

to

On 13/11/09 20:42, SeanW wrote:
> On Nov 10, 4:26 pm, Richard Smith<rich...@ex-parrot.com> wrote:
>> So, what is the opinion here? Is it worth pursuing Boost.Spirit?
>
> I think Spirit is very slick, but decided against
> using it when I considered what would happen if
> I got into trouble. It's one thing to see a 100KB
> error message and try to sort it into one of a few
> categories as Jeff Flinn says above, but what if
> you've got to actually stick your hand in that toilet
> with a debugger when you have some problem in the
> field? I couldn't bear the thought, so went with
> one of the old-school parser generators.

Been there.

Debugging Spirit parsing is fairly trivial: define BOOST_SPIRIT_DEBUG
macro.
http://www.boost.org/doc/libs/1_40_0/libs/spirit/classic/doc/debugging.html

--
Max

shoosh

unread,

Nov 23, 2009, 6:29:56 AM11/23/09

to

I have had some experience using Spirit for production code. Looking
back, I might have made a different decision mainly due to two very
painful points:

- Compile times are a real issue. It got to the point where I changed
the structure of my code specifically for the purpose of isolating the
dependencies of the cpp which contain the spirit code in order to
avoid changes which force a recompile. The worst I've seen in my code
is a 10 minute compile for a single file and this is really unbearable
when you're debugging.

- Impossible to decipher compile errors - When you're making a mistake
somewhere it can often be impossible to find the exact place where you
made the mistake since the error massage points to some template
instantiation deep inside the Spirit code. The chain of "as referenced
from" of templates can span 20 hops until you reach your own code. If
you want any fighting change to figure out what's going on getting
inside Spirit code is inevitable and is often not pretty.