Thoughts on position tracking?

285 views
Skip to first unread message

Bob Ziuchkovski

unread,
Jun 13, 2014, 2:35:32 PM6/13/14
to parboil...@googlegroups.com
Hi,

I implemented a couple parsers with parboiled2 and they are working fantastically.  My sincere thanks to everyone who has contributed to this project!

Now I'm working on semantic analysis of the parsed AST trees and wish to report positional information when I detect semantic errors.  Ideally I would like to track the cursor position every time a value is pushed onto the parser value stack, so that I can report that position if semantic analysis detects an error while processing the AST.  I'm aware that Parser classes can query the cursor position, but that seems to be the current extent of position support.

I have a couple half-baked implementation ideas in the back of my mind, but figured this is probably a common problem.  Has anyone else attempted to track and store cursor position for each parsed value?  If so, how did you end up implementing the functionality?  Any thoughts, suggestions, or feedback on the topic?

Thanks,
Bob

Chad Retz

unread,
Jun 15, 2014, 9:44:35 PM6/15/14
to parboil...@googlegroups.com
My best guess is to get the cursor (via cursor def on parser class) in your action and to get line and col from it, count the newlines between start of input and the cursor. At least that's what the error handler appears to do.

Mathias Doenitz

unread,
Jun 16, 2014, 3:05:45 AM6/16/14
to parboil...@googlegroups.com
Bob,

sorry for the late response.
I agree with Chad that the best solution for having your parser record position information along with AST nodes is to look at the ‘cursor’ member of the Parser and use it to track the line position.
Since parboiled doesn’t know about line ending by itself you should probably at a `line` counter to your parser, which you increment whenever you have matched a line ending.
In the same place you can record the `cursor` position of the line start and use it to determine the current cursor column in the line.

One you have these things in place you simply need to make place for an instance of something like this in your AST nodes:

case class Position(index: Int, line: Int, column: Int)

HTH and cheers,
Mathias

---
mat...@parboiled.org
http://www.parboiled.org
> --
> You received this message because you are subscribed to the Google Groups "parboiled2.org User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to parboiled-use...@googlegroups.com.
> Visit this group at http://groups.google.com/group/parboiled-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/parboiled-user/3660c121-a8e3-4f07-91f1-00374b1be4fc%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Bob Ziuchkovski

unread,
Jun 16, 2014, 1:07:52 PM6/16/14
to parboil...@googlegroups.com
Thanks for the feedback, Mathias and Chad.

For now, I'll investigate polling the cursor position within my actions, but I've created a feature request for adding a general mechanism: https://github.com/sirthias/parboiled2/issues/79

Ideally it would be nice to support something like scala.util.parsing.input.Position, where the push action would automatically set the Position on any value object that includes that trait.  Then an AST tree could look something like:

sealed trait Expr extends Position
case class Add(left: Expr, right: Expr) extends Expr
case class Subtract(left: Expr, right: Expr) extends Expr

And the end user could query position without worrying about setting it manually from each rule action.

Bob Ziuchkovski

unread,
Jun 16, 2014, 1:09:02 PM6/16/14
to parboil...@googlegroups.com
Err...my apologies, I meant scala.util.parsing.input.Positional.

Mathias Doenitz

unread,
Jun 20, 2014, 9:18:51 AM6/20/14
to parboil...@googlegroups.com
> Ideally it would be nice to support something like scala.util.parsing.input.Position, where the push action would automatically set the Position on any value object that includes that trait. Then an AST tree could look something like:
>
> sealed trait Expr extends Position
> case class Add(left: Expr, right: Expr) extends Expr
> case class Subtract(left: Expr, right: Expr) extends Expr
>
> And the end user could query position without worrying about setting it manually from each rule action.

As this proposal introduces mutability into cases classes it is not something that I’d recommend.
Why not go for fully immutable AST nodes?:

case class Position(index: Int, line: Int, column: Int)

sealed trait Expr {
def pos: Position
}
case class Add(pos: Position, left: Expr, right: Expr) extends Expr
case class Subtract(pos: Position, left: Expr, right: Expr) extends Expr

class MyParser extends Parser(val input: ParserInput) {
private var line = 1 // the current line number
private var lineStartIndex = 0 // the start index of the current line


}

Cheers,
> To view this discussion on the web visit https://groups.google.com/d/msgid/parboiled-user/dcb04997-cc84-43e1-b874-d4b450a2c159%40googlegroups.com.

Bob Ziuchkovski

unread,
Jun 20, 2014, 11:01:47 AM6/20/14
to parboil...@googlegroups.com
> Ideally it would be nice to support something like scala.util.parsing.input.Position, where the push action would automatically set the Position on any value object that includes that trait.  Then an AST tree could look something like:
>
> sealed trait Expr extends Position
> case class Add(left: Expr, right: Expr) extends Expr
> case class Subtract(left: Expr, right: Expr) extends Expr
>
> And the end user could query position without worrying about setting it manually from each rule action.

As this proposal introduces mutability into cases classes it is not something that I’d recommend.

Understood.  That's definitely a downside.  Extending Positional was just one idea on implementation, but even something as straightforward as a Map of AST nodes -> offset they were created, maintained by the Parser, would be helpful.  It gets very repetitive to attach actions to every rule to poll the cursor position and attach to created AST nodes.  I figured this is likely a common problem for users, but I understand if you'd like to keep that out of the core parser lib.
 
Why not go for fully immutable AST nodes?:

   case class Position(index: Int, line: Int, column: Int)

   sealed trait Expr {
     def pos: Position
   }
   case class Add(pos: Position, left: Expr, right: Expr) extends Expr
   case class Subtract(pos: Position, left: Expr, right: Expr) extends Expr

I have approx ~100 different case classes to model the AST (DCE/MSRPC), so it gets very repetitive to attach Position instances directly to each class.  Since it's really only metadata, I was looking to sidecar the position data to reduce noise in my code.

Mathias Doenitz

unread,
Jun 23, 2014, 5:20:34 AM6/23/14
to parboil...@googlegroups.com
> Extending Positional was just one idea on implementation, but even something as straightforward as a Map of AST nodes -> offset they were created, maintained by the Parser, would be helpful. It gets very repetitive to attach actions to every rule to poll the cursor position and attach to created AST nodes.

Have you tried building such a map of AST nodes -> Position?
It shouldn’t require more than a few lines of code from your side to get it done.
> --
> You received this message because you are subscribed to the Google Groups "parboiled2.org User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to parboiled-use...@googlegroups.com.
> Visit this group at http://groups.google.com/group/parboiled-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/parboiled-user/730d1c12-8416-4aa2-92e6-cb0e0fe215bf%40googlegroups.com.

Bob Ziuchkovski

unread,
Jun 23, 2014, 12:22:44 PM6/23/14
to parboil...@googlegroups.com
On Monday, June 23, 2014 3:20:34 AM UTC-6, Mathias wrote:
> Extending Positional was just one idea on implementation, but even something as straightforward as a Map of AST nodes -> offset they were created, maintained by the Parser, would be helpful.  It gets very repetitive to attach actions to every rule to poll the cursor position and attach to created AST nodes.

Have you tried building such a map of AST nodes -> Position?
It shouldn’t require more than a few lines of code from your side to get it done.

That's probably the route I'll take, but I wanted to request it as a public API feature before I pursued hooking into the private API -- at a glance it looks like I'll probably end up touching some of the areas that are marked "THIS IS NOT PUBLIC API".

Bob

Chad Retz

unread,
Jun 23, 2014, 1:16:43 PM6/23/14
to parboil...@googlegroups.com
I needed position of start and position of end. I tried out a hack like so: https://github.com/cretz/gulliver/blob/fae61614ff7961f582d8bf6cba10703efa820cc4/src/main/scala/gulliver/Parser.scala#L10. I haven't really used it on may of my ASTs yet, but I plan to. Basically I created a Rule0 that pushed the cursor on the value stack and then I make my AST's take an implicit position which pops the latest off of the value stack for the start, and the current for the end. Haven't tested with nesting, but I figured it'd be fine so long as you guarantee that you pop it off upon successful match. Note, I use the value stack directly instead of something like ~ push because I want it implicit, I don't want to accept the position in my actions.

Mathias Doenitz

unread,
Jun 23, 2014, 2:00:33 PM6/23/14
to parboil...@googlegroups.com
Bob,

I don’t see why you’d need to access internal methods for any of this.
Can you show your code?
> --
> You received this message because you are subscribed to the Google Groups "parboiled2.org User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to parboiled-use...@googlegroups.com.
> Visit this group at http://groups.google.com/group/parboiled-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/parboiled-user/16d4ca02-a7f1-40fa-ba6f-7fb9fb473654%40googlegroups.com.

Mathias Doenitz

unread,
Jun 23, 2014, 2:33:55 PM6/23/14
to parboil...@googlegroups.com
I’d recommend staying within the bounds of the value stack even for things like position tracking.
The type safety the DSL provides comes with very little cost, so there should be no real reason not to rely on it.

One way of managing the position map might be something like this (untested):

var nodes = Map.empty[AstNode, Position]

def someAstNode = rule {
startAstNode ~ … /* match AST node syntax */ … ~> MyAstNode ~ endAstNode
}

def startAstNode = rule { push(cursor) }

def endAstNode[A <: AstNode]: Rule[Int :: A :: HNil, A :: HNil] = rule {
run { (start: Int, node: A) =>
val pos = Position(start, cursor)
nodes = nodes.updated(node, pos)
node
> --
> You received this message because you are subscribed to the Google Groups "parboiled2.org User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to parboiled-use...@googlegroups.com.
> Visit this group at http://groups.google.com/group/parboiled-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/parboiled-user/c9ba0305-cf53-4fa2-a1fe-d284be87ebf7%40googlegroups.com.

Bob Ziuchkovski

unread,
Jun 23, 2014, 2:58:30 PM6/23/14
to parboil...@googlegroups.com
I don’t see why you’d need to access internal methods for any of this.
Can you show your code?

The code doesn't exist yet (waiting on the outcome of this discussion),  but my tentative plan was to hook into the Parser's __push method so that I don't need to modify the grammar to explicitly poll cursor position for each value that gets pushed.  Grabbing the cursor position will be common to every value that I push.  Since the grammar is moderately complex, I'm trying to avoid too much repetition.

Bob Ziuchkovski

unread,
Jun 23, 2014, 3:02:54 PM6/23/14
to parboil...@googlegroups.com
I’d recommend staying within the bounds of the value stack even for things like position tracking.
The type safety the DSL provides comes with very little cost, so there should be no real reason not to rely on it.

One way of managing the position map might be something like this (untested):

    var nodes = Map.empty[AstNode, Position]

    def someAstNode = rule {
      startAstNode ~ … /* match AST node syntax */ … ~> MyAstNode ~ endAstNode
    }

    def startAstNode = rule { push(cursor) }

    def endAstNode[A <: AstNode]: Rule[Int :: A :: HNil, A :: HNil] = rule {
      run { (start: Int, node: A) =>
        val pos = Position(start, cursor)
        nodes = nodes.updated(node, pos)
        node
      }
    }
 
Thanks for the example.  Ideally I'd love to avoid adding start + end matchings to every AST rule, as there are a fair number of them.  That said, I'll keep this in mind as a viable option.

Bob 

Chad Retz

unread,
Jun 23, 2014, 8:27:14 PM6/23/14
to parboil...@googlegroups.com
Mathias, thanks for that! I didn't even think about putting more on the RHS of the action, perfect. I wasted lots of time trying to override or compose "rule"

Mathias Doenitz

unread,
Jun 24, 2014, 8:47:54 AM6/24/14
to parboil...@googlegroups.com
Ideally you’d be able to factor out the startAstNode / endAstNode wrapping into something like an `astRule` helper, so you can say:

def someAstNode = astRule {
/* match AST node syntax */ … ~> MyAstNode
}

instead of

def someAstNode = rule {
startAstNode ~ … /* match AST node syntax */ … ~> MyAstNode ~ endAstNode
}

However, that requires that `astRule` is implemented as a macro and thus cannot live in the same compilation unit as the code using it, which is unfortunate.
We are still thinking about a good solution to this common problem of wanting to write custom rule transformation logic (which is what this is actually about).
> --
> You received this message because you are subscribed to the Google Groups "parboiled2.org User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to parboiled-use...@googlegroups.com.
> Visit this group at http://groups.google.com/group/parboiled-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/parboiled-user/55115788-408f-454f-a34d-7c6d3fd2c003%40googlegroups.com.

Bob Ziuchkovski

unread,
Jun 24, 2014, 11:22:29 AM6/24/14
to parboil...@googlegroups.com
On Tuesday, June 24, 2014 6:47:54 AM UTC-6, Mathias wrote:
Ideally you’d be able to factor out the startAstNode / endAstNode wrapping into something like an `astRule` helper, so you can say:

    def someAstNode = astRule {
      /* match AST node syntax */ … ~> MyAstNode
    }

instead of

    def someAstNode = rule {
      startAstNode ~ … /* match AST node syntax */ … ~> MyAstNode ~ endAstNode
    }

However, that requires that `astRule` is implemented as a macro and thus cannot live in the same compilation unit as the code using it, which is unfortunate.
We are still thinking about a good solution to this common problem of wanting to write custom rule transformation logic (which is what this is actually about).

Interesting.  I hadn't considered implementing a helper macro.  Separate compilation is a pain, but I think that might end up the cleanest solution to keep repetition out of the grammar without using the Parser's private API.

I'm thrilled to hear custom rule transformation logic is something that you're trying to solve.  Have you been following scala.meta (formerly project Palladium)?  I think removing the macro separate compilation restriction is one of their goals, though the project is still really young.

Bob

Mathias Doenitz

unread,
Jun 25, 2014, 8:37:49 AM6/25/14
to parboil...@googlegroups.com
FWIW I have just added another test demonstrating how you can already do rule transformation using `val`s and `run`:
https://github.com/sirthias/parboiled2/commit/def74c99f6fe5a2bd55232ec7a36f1a597ff23da

It’s certainly not as nice as it could be but it works and it doesn’t require per-rule-invocation allocations.
> --
> You received this message because you are subscribed to the Google Groups "parboiled2.org User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to parboiled-use...@googlegroups.com.
> Visit this group at http://groups.google.com/group/parboiled-user.
> To view this discussion on the web visit https://groups.google.com/d/msgid/parboiled-user/9044780c-d015-449e-b3e8-d262f8190372%40googlegroups.com.

Bob Ziuchkovski

unread,
Jun 25, 2014, 11:14:14 AM6/25/14
to parboil...@googlegroups.com
Thanks.  I think that approach will work just fine.  I really appreciate the help you've given on this.

Bob

Alexander Myltsev

unread,
Jul 26, 2015, 3:27:04 PM7/26/15
to parboiled2.org User List, bob.ziu...@gmail.com
Hi Bob, 

Would that rule cover your demands: https://github.com/sirthias/parboiled2/pull/152?
Reply all
Reply to author
Forward
0 new messages