Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion wired question mark

Received: by 10.50.46.194 with SMTP id x2mr13108009igm.5.1325054873472;
        Tue, 27 Dec 2011 22:47:53 -0800 (PST)
X-BeenThere: citrus-users@googlegroups.com
Received: by 10.231.29.8 with SMTP id o8ls18959599ibc.6.gmail; Tue, 27 Dec
 2011 22:47:53 -0800 (PST)
Received: by 10.42.158.4 with SMTP id f4mr20064490icx.4.1325054873104;
        Tue, 27 Dec 2011 22:47:53 -0800 (PST)
Received: by 10.42.158.4 with SMTP id f4mr20064489icx.4.1325054873095;
        Tue, 27 Dec 2011 22:47:53 -0800 (PST)
Return-Path: <clifford.he...@gmail.com>
Received: from mail-iy0-f179.google.com (mail-iy0-f179.google.com [209.85.210.179])
        by gmr-mx.google.com with ESMTPS id gx2si1724265igb.1.2011.12.27.22.47.53
        (version=TLSv1/SSLv3 cipher=OTHER);
        Tue, 27 Dec 2011 22:47:53 -0800 (PST)
Received-SPF: pass (google.com: domain of clifford.he...@gmail.com designates 209.85.210.179 as permitted sender) client-ip=209.85.210.179;
Authentication-Results: gmr-mx.google.com; spf=pass (google.com: domain of clifford.he...@gmail.com designates 209.85.210.179 as permitted sender) smtp.mail=clifford.he...@gmail.com; dkim=pass (test mode) header...@gmail.com
Received: by mail-iy0-f179.google.com with SMTP id e16so20881438iaa.38
        for <citrus-users@googlegroups.com>; Tue, 27 Dec 2011 22:47:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=content-type:mime-version:subject:from:in-reply-to:date
         :content-transfer-encoding:message-id:references:to:x-mailer;
        bh=tBCZJdG+nXAbkO7DwU5DfdPulaPjpW4bZIMUXejjbMc=;
        b=s+7n+0yjjZpjr6Z4yBd7HGmnB/FD/YeiB3VWbL13LWxm4nV5Gndq+kiMtmflkkvKp5
         lz4pG+ImgxUAklykF4Gcyecfd/G0cYT9g36cnTpoixiXI/F79oXj1jiFrWWsyUW6ddA0
         MU0XN3nqIfzw1cWDGyUJRyHXL2/rtE6RoKQqc=
Received: by 10.50.186.226 with SMTP id fn2mr34813984igc.25.1325054872996;
        Tue, 27 Dec 2011 22:47:52 -0800 (PST)
Return-Path: <clifford.he...@gmail.com>
Received: from helios.0.0.10.in-addr.arpa (c220-239-40-136.eburwd6.vic.optusnet.com.au. [220.239.40.136])
        by mx.google.com with ESMTPS id lu10sm49336087igc.0.2011.12.27.22.47.50
        (version=TLSv1/SSLv3 cipher=OTHER);
        Tue, 27 Dec 2011 22:47:52 -0800 (PST)
Content-Type: text/plain; charset=iso-8859-1
Mime-Version: 1.0 (Apple Message framework v1251.1)
Subject: Re: wired question mark
From: Clifford Heath <clifford.he...@gmail.com>
In-Reply-To: <9b882ff5-d776-4e97-82ec-16af18408...@h7g2000prn.googlegroups.com>
Date: Wed, 28 Dec 2011 17:47:46 +1100
Content-Transfer-Encoding: 7bit
Message-Id: <572FF74C-6A7E-458E-9DED-0C1BA5868...@gmail.com>
References: <5f7787ae-c144-4504-a8fa-3e0aaa3fb...@y25g2000prg.googlegroups.com> <DBFF3306-277B-46C6-898C-2D086CBE4...@gmail.com> <81886796-c885-417d-bb07-f922b8935...@b14g2000prn.googlegroups.com> <CEFFE6F0-9C3F-45EC-8C44-174EFA9EF...@gmail.com> <9b882ff5-d776-4e97-82ec-16af18408...@h7g2000prn.googlegroups.com>
To: citrus-users@googlegroups.com
X-Mailer: Apple Mail (2.1251.1)

On 28/12/2011, at 5:38 PM, Ken wrote:
> On Dec 28, Clifford Heath wrote:
>> Both parts of the sequence match.
>> 
> You are right. I have another example,
> 
>  [a-d] | [c-f]+
> 
> If I use this rule to parse "cc" with option :consume=>true,
> I hope it consumes all the chars, [a-d] matches the first "c",
> but it can't match the 2nd "c", so it can't consume all, why it
> will not try the 2nd rule since it can't match all chars, PEG is
> designed this way?

Yes. It is. This might seem strange, but the packrat algorithm
relies on it to know what to memoize. In order to avoid polynomial
time performance, it assumes that anything that has ever matched
is valid and never goes back on that (memo-ised) decision.


> To fix the issue, we have to move the longer rule to the first,
> 
>   [c-f]+ | [a-d]
> 
> then this will work, or rewrite the rule by removing the common
> part.
> 
>  [a-b] | [c-f]+
> 
> however, when there are lots of common and noterminal rules
> are combined together for reusing purpose, we have to consider
> above similar cases such as common part, writing longer rule
> matching first, this will be difficult.

Now you have it. That's the cost of using a PEG. The benefit
is linear performance in both time and memory.

> Since first rule can't consume all chars why it won't try others?

No rule has to consume all input, just as much as it can.
The requirement to match all is done at the parser level.

You're asking why the parser doesn't try a second alternative
of this rule. What if it was a 2nd, 3rd, or 4th level rule, nesting
up such failures? You'd have polynomial (N^2, N^3, N^4)
performance, and PEG/packrat avoids that.

Clifford Heath.