Gmail Calendar Documents Reader Web more »
Recently Visited Groups | Help | Sign in
Google Groups Home
Simple fast parsing
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  7 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Christopher Jefferson  
View profile  
 More options Jun 21, 4:28 pm
From: Christopher Jefferson <ch...@bubblescope.net>
Date: Sun, 21 Jun 2009 21:28:54 +0100
Local: Sun, Jun 21 2009 4:28 pm
Subject: [Boost-users] Simple fast parsing
I am currently using in a project a custom-writen recursive-decent  
parser, which uses an filtering_istream so I can support compressed  
input.

The only functionality used is peekchar, getchar and integer  
extraction with 'stream >> i'.

I am finding that C++ streams, particularly when it comes to integer  
extraction, are giving terrible performance. In a small test,  
uncompressing to a buffer and then writing a simple custom reader on  
that memory buffer gives 10x performance.

I would perfer not to write and support such a tool if possible. Is  
there a boost library which easily supports a parser with 1 character  
lookahead and simple things like integer extraction?

Looking at the documentation, Boost::Spirit seems like a very big  
hammer to crack this quite small nut, and it is unclear to me how well  
it would fit into an existing recursive decent parser. Has anyone ever  
used it as such? Is there a simple alternative?

Chris
_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Hartmut Kaiser  
View profile  
 More options Jun 21, 5:18 pm
From: "Hartmut Kaiser" <hartmut.kai...@gmail.com>
Date: Sun, 21 Jun 2009 16:18:59 -0500
Local: Sun, Jun 21 2009 5:18 pm
Subject: Re: [Boost-users] Simple fast parsing

Spirit (which is recursive decent) is definitely your friend. Especially the
new version (V2.1), which is to be released with Boost V1.41, but in SVN
trunk now and quite mature already. Parsing an integer from a buffer of
characters is as easy as:

    std::string buffer("1234");
    int value = 0;

    using namespace boost::spirit;
    bool r = qi::parse(buffer.begin(), buffer.end(), int_, value);
    assert(r && value == 1234);

And BTW: measurements show that the Spirit int_ parser is faster than atoi
(gcc/vc using their respective crtlib)!

Regards Hartmut

_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Cory Nelson  
View profile  
 More options Jun 21, 5:48 pm
From: Cory Nelson <phro...@gmail.com>
Date: Sun, 21 Jun 2009 14:48:02 -0700
Local: Sun, Jun 21 2009 5:48 pm
Subject: Re: [Boost-users] Simple fast parsing
Just write the parser to use forward iterators, then it is trivial to support any source you have.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joel de Guzman  
View profile  
 More options Jun 21, 6:49 pm
From: Joel de Guzman <j...@boost-consulting.com>
Date: Mon, 22 Jun 2009 06:49:35 +0800
Local: Sun, Jun 21 2009 6:49 pm
Subject: Re: [Boost-users] Simple fast parsing

Christopher Jefferson wrote:
> Looking at the documentation, Boost::Spirit seems like a very big hammer
> to crack this quite small nut, and it is unclear to me how well it would
> fit into an existing recursive decent parser. Has anyone ever used it as
> such? Is there a simple alternative?

Spirit is well tuned for small parsing tasks like this. It is a
modular RD parser. What you need is what you pay for. The code is
as tight as it can be. Try the int parsing examples and see the
generated assembler. One of Spirit's original goal is for such
micro-parsing. You don't have to write an RD parser, Spirit
is an RD parser. However, if you need to use an existing RD
parser, then good! You can use any or all of Spirit facilities
as-is by calling Spirit's parse functions (which accept forward
iterators) from your RD environment. Or, the other way around,
you can write a custom-parser in spirit that calls your RD parser.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alan M. Carroll  
View profile  
 More options Jun 22, 10:21 am
From: "Alan M. Carroll" <a...@network-geographics.com>
Date: Mon, 22 Jun 2009 09:21:34 -0500
Local: Mon, Jun 22 2009 10:21 am
Subject: Re: [Boost-users] Simple fast parsing
At 05:49 PM 6/21/2009, you wrote:

>Christopher Jefferson wrote:

>>Looking at the documentation, Boost::Spirit seems like a very big hammer to crack this quite small nut, and it is unclear to me how well it would fit into an existing recursive decent parser. Has anyone ever used it as such? Is there a simple alternative?

>Spirit is well tuned for small parsing tasks like this. It is a
>modular RD parser. What you need is what you pay for. The code is
>as tight as it can be.

I don't think this solves his problem. Note that he got a 10X speed up by changing to a buffer with his existing parser, so the _parsing_ code isn't the bottle neck.  It's better I/O that's needed. I suspect it's the locking done by streams on each operation, so he's basically doing a lock on every character. I think he'd be better off using the forward iterator idea and then either writing a small wrapper class on streams to block read or use the lower level read buffer interface.

_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joel de Guzman  
View profile  
 More options Jun 22, 11:11 am
From: Joel de Guzman <j...@boost-consulting.com>
Date: Mon, 22 Jun 2009 23:11:50 +0800
Local: Mon, Jun 22 2009 11:11 am
Subject: Re: [Boost-users] Simple fast parsing

Alan M. Carroll wrote:
> At 05:49 PM 6/21/2009, you wrote:
>> Christopher Jefferson wrote:

>>> Looking at the documentation, Boost::Spirit seems like a very big
>>> hammer to crack this quite small nut, and it is unclear to me how
>>> well it would fit into an existing recursive decent parser. Has
>>> anyone ever used it as such? Is there a simple alternative?
>> Spirit is well tuned for small parsing tasks like this. It is a
>> modular RD parser. What you need is what you pay for. The code is
>> as tight as it can be.

> I don't think this solves his problem.

What is the problem? Speed, right? Then I don't see why it does
not solve the problem.

> Note that he got a 10X speed
> up by changing to a buffer with his existing parser, so the _parsing_
> code isn't the bottle neck.  It's better I/O that's needed. I suspect

And Spirit uses a better I/O through generic Forward Iterators.

> it's the locking done by streams on each operation, so he's basically
> doing a lock on every character. I think he'd be better off using the
> forward iterator idea and then either writing a small wrapper class
> on streams to block read or use the lower level read buffer
> interface.

I think you are under-estimating the complexity of writing a
humble number parser from a Forward Iterator. It's not as simple
as it looks and gets pretty hairy when you get past the toy
examples.

Regards,
--
Joel de Guzman
http://www.boostpro.com
http://spirit.sf.net

_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Christopher Jefferson  
View profile  
 More options Jun 22, 11:25 am
From: Christopher Jefferson <ch...@bubblescope.net>
Date: Mon, 22 Jun 2009 16:25:32 +0100
Local: Mon, Jun 22 2009 11:25 am
Subject: Re: [Boost-users] Simple fast parsing

On 22 Jun 2009, at 16:11, Joel de Guzman wrote:

Yes, this is my concern. As a proof of principle I wrote my own number  
generator, but it's only passing about 30% of our internal tests. This  
is enough to convince me, possibly falsely, it works "in principle"  
and could be fixed without too much loss of speed, but I worry how  
much work that might be.

(For those curious, a quick glance suggests the two main problems are  
not handling negative numbers or recognising overflow. Both easily  
fixable, but I don't feel necessary for an experiment).

Chris
_______________________________________________
Boost-users mailing list
Boost-us...@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2009 Google