capture last element of a repeated sequence separately

104 views
Skip to first unread message

谢非

unread,
Jul 25, 2015, 4:52:46 AM7/25/15
to parboiled2.org User List
Hello,  here is the simplified code of my case.  There is a repeated sequence of '/' ~ `id` , the last one is interpreted as the maven artifact name, and the previous ones are interpreted as the group id. 

I learned from here that `oneOrMore` is greedy so I understand why the current implement does not succed while the one in the comment does. But I wonder if it is possible to do this kind of capture in parboiled2, i.e. capture the last one separately from the previous ones.

I know I can just capture a `zeroOrMore` and split the result Seq[String] to `init` and `last` but that feels not so elegant an direct.   

Regards

Alexander Myltsev

unread,
Jul 25, 2015, 7:50:43 AM7/25/15
to parboiled2.org User List, heliu...@gmail.com
Hi,

please, try this beast and let me know if it works for you:

oneOrMore('/' ~ 'id' ~ &('/' ~ 'id')) ~ '/' ~ 'id'

Note, that this is pretty uneffective because you would match '\' ~ 'id' sequence twice.

Also, I am thinking on extending `times` operator to: 

times(1, inf) that is equal to oneOrMore
times(0, inf) that is equal to zeroOrMore
times(1, inf - 1) that covers your your case: match one or more, but leave last match unparsed

Let me know if it looks delicious for you:

谢非

unread,
Jul 25, 2015, 11:42:01 AM7/25/15
to parboiled2.org User List, alexande...@phystech.edu
Thanks for the reply. Really helped a lot to a parboiled newbie like me.

oneOrMore('/' ~ id ~ &('/' ~ id)) ~ '/' ~ 'id'  is really a good example for the '&' combinator. But can only solve this particular case and is not a general solution. For example, what if we need to capture the last two matches separately? 

The proposal of extended `times` operator is interesting, looks like a more general "non-greedy" implementation. But I failed to relate these with the `times` introduced in the document which has the shape of `(1 to 5).times(rule)` or `3.times(rule)`.  Some elaboration would be greatly appreciated.

Alexander Myltsev

unread,
Jul 25, 2015, 11:48:39 AM7/25/15
to parboiled2.org User List, heliu...@gmail.com
On Saturday, 25 July 2015 18:42:01 UTC+3, 谢非 wrote:
Thanks for the reply. Really helped a lot to a parboiled newbie like me.

oneOrMore('/' ~ id ~ &('/' ~ id)) ~ '/' ~ 'id'  is really a good example for the '&' combinator. But can only solve this particular case and is not a general solution. For example, what if we need to capture the last two matches separately? 
oneOrMore('/' ~ id ~ (&('/' ~ id)).times(2)) ?
 

The proposal of extended `times` operator is interesting, looks like a more general "non-greedy" implementation. But I failed to relate these with the `times` introduced in the document which has the shape of `(1 to 5).times(rule)` or `3.times(rule)`.  Some elaboration would be greatly appreciated.
Current implementation of times is not capable of it. Somebody should implement it.

谢非

unread,
Jul 30, 2015, 4:19:26 AM7/30/15
to parboiled2.org User List, heliu...@gmail.com
For those seeking for a general 'non-greedy' strategy using parboiled2, I finally found one [here](http://stackoverflow.com/a/2225208/448141) . And in my case, the rule can be defined like: 

def groupIdsAndArtifactName: Rule2[Seq[String], String] = rule {
('/' ~ capture(id) ~ groupIdsAndArtifactName ~> { (oldStr: String, seq: Seq[String], newStr: String) =>
(oldStr +: seq) :: newStr :: HNil
}) | ('/' ~ capture(id) ~> { str: String => Seq() :: str :: HNil })
}
Reply all
Reply to author
Forward
0 new messages