[Boost-users] Simple fast parsing

26 views
Skip to first unread message

Christopher Jefferson

unread,
Jun 21, 2009, 4:28:54 PM6/21/09
to boost...@lists.boost.org
I am currently using in a project a custom-writen recursive-decent
parser, which uses an filtering_istream so I can support compressed
input.

The only functionality used is peekchar, getchar and integer
extraction with 'stream >> i'.

I am finding that C++ streams, particularly when it comes to integer
extraction, are giving terrible performance. In a small test,
uncompressing to a buffer and then writing a simple custom reader on
that memory buffer gives 10x performance.

I would perfer not to write and support such a tool if possible. Is
there a boost library which easily supports a parser with 1 character
lookahead and simple things like integer extraction?

Looking at the documentation, Boost::Spirit seems like a very big
hammer to crack this quite small nut, and it is unclear to me how well
it would fit into an existing recursive decent parser. Has anyone ever
used it as such? Is there a simple alternative?

Chris
_______________________________________________
Boost-users mailing list
Boost...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Hartmut Kaiser

unread,
Jun 21, 2009, 5:18:59 PM6/21/09
to boost...@lists.boost.org
> I am currently using in a project a custom-writen recursive-decent
> parser, which uses an filtering_istream so I can support compressed
> input.
>
> The only functionality used is peekchar, getchar and integer
> extraction with 'stream >> i'.
>
> I am finding that C++ streams, particularly when it comes to integer
> extraction, are giving terrible performance. In a small test,
> uncompressing to a buffer and then writing a simple custom reader on
> that memory buffer gives 10x performance.
>
> I would perfer not to write and support such a tool if possible. Is
> there a boost library which easily supports a parser with 1 character
> lookahead and simple things like integer extraction?
>
> Looking at the documentation, Boost::Spirit seems like a very big
> hammer to crack this quite small nut, and it is unclear to me how well
> it would fit into an existing recursive decent parser. Has anyone ever
> used it as such? Is there a simple alternative?

Spirit (which is recursive decent) is definitely your friend. Especially the
new version (V2.1), which is to be released with Boost V1.41, but in SVN
trunk now and quite mature already. Parsing an integer from a buffer of
characters is as easy as:

std::string buffer("1234");
int value = 0;

using namespace boost::spirit;
bool r = qi::parse(buffer.begin(), buffer.end(), int_, value);
assert(r && value == 1234);

And BTW: measurements show that the Spirit int_ parser is faster than atoi
(gcc/vc using their respective crtlib)!

Regards Hartmut

Cory Nelson

unread,
Jun 21, 2009, 5:48:02 PM6/21/09
to boost...@lists.boost.org
Just write the parser to use forward iterators, then it is trivial to support any source you have.

-----Original Message-----
From: Christopher Jefferson <ch...@bubblescope.net>
Sent: Sunday, June 21, 2009 1:28 PM
To: boost...@lists.boost.org
Subject: [Boost-users] Simple fast parsing

I am currently using in a project a custom-writen recursive-decent
parser, which uses an filtering_istream so I can support compressed
input.

The only functionality used is peekchar, getchar and integer
extraction with 'stream >> i'.

I am finding that C++ streams, particularly when it comes to integer
extraction, are giving terrible performance. In a small test,
uncompressing to a buffer and then writing a simple custom reader on
that memory buffer gives 10x performance.

I would perfer not to write and support such a tool if possible. Is
there a boost library which easily supports a parser with 1 character
lookahead and simple things like integer extraction?

Looking at the documentation, Boost::Spirit seems like a very big
hammer to crack this quite small nut, and it is unclear to me how well
it would fit into an existing recursive decent parser. Has anyone ever
used it as such? Is there a simple alternative?

Chris

Joel de Guzman

unread,
Jun 21, 2009, 6:49:35 PM6/21/09
to boost...@lists.boost.org
Christopher Jefferson wrote:

> Looking at the documentation, Boost::Spirit seems like a very big hammer
> to crack this quite small nut, and it is unclear to me how well it would
> fit into an existing recursive decent parser. Has anyone ever used it as
> such? Is there a simple alternative?

Spirit is well tuned for small parsing tasks like this. It is a
modular RD parser. What you need is what you pay for. The code is
as tight as it can be. Try the int parsing examples and see the
generated assembler. One of Spirit's original goal is for such
micro-parsing. You don't have to write an RD parser, Spirit
is an RD parser. However, if you need to use an existing RD
parser, then good! You can use any or all of Spirit facilities
as-is by calling Spirit's parse functions (which accept forward
iterators) from your RD environment. Or, the other way around,
you can write a custom-parser in spirit that calls your RD parser.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

Alan M. Carroll

unread,
Jun 22, 2009, 10:21:34 AM6/22/09
to boost...@lists.boost.org
At 05:49 PM 6/21/2009, you wrote:
>Christopher Jefferson wrote:
>
>>Looking at the documentation, Boost::Spirit seems like a very big hammer to crack this quite small nut, and it is unclear to me how well it would fit into an existing recursive decent parser. Has anyone ever used it as such? Is there a simple alternative?
>
>Spirit is well tuned for small parsing tasks like this. It is a
>modular RD parser. What you need is what you pay for. The code is
>as tight as it can be.

I don't think this solves his problem. Note that he got a 10X speed up by changing to a buffer with his existing parser, so the _parsing_ code isn't the bottle neck. It's better I/O that's needed. I suspect it's the locking done by streams on each operation, so he's basically doing a lock on every character. I think he'd be better off using the forward iterator idea and then either writing a small wrapper class on streams to block read or use the lower level read buffer interface.

Joel de Guzman

unread,
Jun 22, 2009, 11:11:50 AM6/22/09
to boost...@lists.boost.org
Alan M. Carroll wrote:
> At 05:49 PM 6/21/2009, you wrote:
>> Christopher Jefferson wrote:
>>
>>> Looking at the documentation, Boost::Spirit seems like a very big
>>> hammer to crack this quite small nut, and it is unclear to me how
>>> well it would fit into an existing recursive decent parser. Has
>>> anyone ever used it as such? Is there a simple alternative?
>> Spirit is well tuned for small parsing tasks like this. It is a
>> modular RD parser. What you need is what you pay for. The code is
>> as tight as it can be.
>
> I don't think this solves his problem.

What is the problem? Speed, right? Then I don't see why it does
not solve the problem.

> Note that he got a 10X speed
> up by changing to a buffer with his existing parser, so the _parsing_
> code isn't the bottle neck. It's better I/O that's needed. I suspect

And Spirit uses a better I/O through generic Forward Iterators.

> it's the locking done by streams on each operation, so he's basically
> doing a lock on every character. I think he'd be better off using the
> forward iterator idea and then either writing a small wrapper class
> on streams to block read or use the lower level read buffer
> interface.

I think you are under-estimating the complexity of writing a
humble number parser from a Forward Iterator. It's not as simple
as it looks and gets pretty hairy when you get past the toy
examples.

_______________________________________________

Christopher Jefferson

unread,
Jun 22, 2009, 11:25:32 AM6/22/09
to boost...@lists.boost.org

Yes, this is my concern. As a proof of principle I wrote my own number
generator, but it's only passing about 30% of our internal tests. This
is enough to convince me, possibly falsely, it works "in principle"
and could be fixed without too much loss of speed, but I worry how
much work that might be.

(For those curious, a quick glance suggests the two main problems are
not handling negative numbers or recognising overflow. Both easily
fixable, but I don't feel necessary for an experiment).

Chris

Reply all
Reply to author
Forward
0 new messages