recovering syntax from an AST

29 views
Skip to first unread message

Dan Kortschak

unread,
Apr 23, 2024, 2:40:32 AMApr 23
to 'Dan Kortschak' via CEL Go Discussion Forum
I would like to be able to use a CEL source formatter, so if this
already exists and I have missed it, the rest of this email can be
ignored.

To get this I need to made a source printer that can take an AST (of
either flavour), be able to walk it and obtain the syntax from the
source based on positions held by nodes in the AST. To investigate
this, I've been looking at what is available from the nodes during a
walk and it seems that end positions are not available; they always
match the node position.

https://go.dev/play/p/sYuS-1r2YJ_f

looks at
```
{
"aye": 12 <= 1,
"bee": {"dee": state.a.size()},
}
```

output:
```
expr id=1 MapKind {Start:0 Stop:0} ``: (0…0)
entry expr id=2 MapEntryKind {Start:9 Stop:9} ``: (7…7)
expr id=3 LiteralKind {Start:4 Stop:4} ``: (2…2)
expr id=5 CallKind {Start:14 Stop:14} ``: (12…12)
expr id=4 LiteralKind {Start:11 Stop:11} ``: (9…9)
expr id=6 LiteralKind {Start:17 Stop:17} ``: (15…15)
entry expr id=7 MapEntryKind {Start:27 Stop:27} ``: (7…7)
expr id=8 LiteralKind {Start:22 Stop:22} ``: (2…2)
expr id=9 MapKind {Start:29 Stop:29} ``: (9…9)
entry expr id=10 MapEntryKind {Start:35 Stop:35} ``: (15…15)
expr id=11 LiteralKind {Start:30 Stop:30} ``: (10…10)
expr id=14 CallKind {Start:49 Stop:49} ``: (29…29)
expr id=13 SelectKind {Start:42 Stop:42} ``: (22…22)
expr id=12 IdentKind {Start:37 Stop:37} ``: (17…17)
```

The documentation for GetStopLocation[1] says

> If the SourceInfo was generated from a serialized protobuf
representation, the stop location will be identical to the start
location for the expression.

but this is not the case here. The issue appears to be in
`(*github.com/google/cel-go/parser).parserHelper.id` which only
provides an informative end pos for `antlr.ParserRuleContext` types[2].

It looks like the antlr bindings don't provide a way to obtain this
information in a sensible way, though it does provide access to the
actual syntax, so if I make the following change, I can get the
information that is needed.

```
diff --git a/parser/helper.go b/parser/helper.go
index 182ff03..dc0c540 100644
--- a/parser/helper.go
+++ b/parser/helper.go
@@ -140,15 +140,12 @@ func (p *parserHelper) id(ctx any) int64 {
var offset ast.OffsetRange
switch c := ctx.(type) {
case antlr.ParserRuleContext:
- start, stop := c.GetStart(), c.GetStop()
- if stop == nil {
- stop = start
- }
+ start := c.GetStart()
offset.Start = p.sourceInfo.ComputeOffset(int32(start.GetLine()), int32(start.GetColumn()))
- offset.Stop = p.sourceInfo.ComputeOffset(int32(stop.GetLine()), int32(stop.GetColumn()))
+ offset.Stop = offset.Start + int32(len(c.GetText()))
case antlr.Token:
offset.Start = p.sourceInfo.ComputeOffset(int32(c.GetLine()), int32(c.GetColumn()))
- offset.Stop = offset.Start
+ offset.Stop = offset.Start + int32(len(c.GetText()))
case common.Location:
offset.Start = p.sourceInfo.ComputeOffset(int32(c.Line()), int32(c.Column()))
offset.Stop = offset.Start
```

new output:
```
expr id=1 MapKind {Start:0 Stop:1} `{`: (0…1)
entry expr id=2 MapEntryKind {Start:9 Stop:10} `:`: (7…8)
expr id=3 LiteralKind {Start:4 Stop:9} `"aye"`: (2…7)
expr id=5 CallKind {Start:14 Stop:16} `<=`: (12…14)
expr id=4 LiteralKind {Start:11 Stop:13} `12`: (9…11)
expr id=6 LiteralKind {Start:17 Stop:18} `1`: (15…16)
entry expr id=7 MapEntryKind {Start:27 Stop:28} `:`: (7…8)
expr id=8 LiteralKind {Start:22 Stop:27} `"bee"`: (2…7)
expr id=9 MapKind {Start:29 Stop:30} `{`: (9…10)
entry expr id=10 MapEntryKind {Start:35 Stop:36} `:`: (15…16)
expr id=11 LiteralKind {Start:30 Stop:35} `"dee"`: (10…15)
expr id=14 CallKind {Start:49 Stop:50} `(`: (29…30)
expr id=13 SelectKind {Start:42 Stop:43} `.`: (22…23)
expr id=12 IdentKind {Start:37 Stop:42} `state`: (17…22)
```

Is this a reasonable change and should I send a PR or some equivalent?

thanks
Dan

[1]https://pkg.go.dev/github.com/google/cel...@v0.20.1/common/ast#SourceInfo.GetStopLocation
[2]https://github.com/google/cel-go/blob/057d4c8318b3901a9d7769832752ef3df87da9ac/parser/helper.go#L139-L165

Tristan Swadell

unread,
Apr 23, 2024, 12:38:06 PMApr 23
to Dan Kortschak, 'Dan Kortschak' via CEL Go Discussion Forum
Hi Dan,

This is a pretty reasonable change. I recall having trouble getting meaningful source token stopping points in the past, but what you're proposing is reasonable.

The parser/unparser.go file has a very limited sense of formatting which also relies on enabling macro call tracking in order to recover the unexpanded macro call sites. If you are interested in updating the formatting logic, I'd be happy to review the PRs.

Cheers,

-Tristan

--
You received this message because you are subscribed to the Google Groups "CEL Go Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cel-go-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cel-go-discuss/b4333e8730ce7f614f911e2c3d483672d9cb2cb9.camel%40kortschak.io.

Dan Kortschak

unread,
Apr 23, 2024, 10:40:05 PMApr 23
to cel-go-...@googlegroups.com
On Tue, 2024-04-23 at 09:37 -0700, 'Tristan Swadell' via CEL Go
Discussion Forum wrote:
> This is a pretty reasonable change. I recall having trouble getting
> meaningful source token stopping points in the past, but what you're
> proposing is reasonable.
>
> The parser/unparser.go file has a very limited sense of formatting
> which also relies on enabling macro call tracking in order to recover
> the unexpanded macro call sites. If you are interested in updating
> the formatting logic, I'd be happy to review the PRs.

I'll take a look.

The initial change has hit a bump though, the common/ast/TestASTCopy
test fails because it relies on the old behaviour; end == start and
that matching the documented behaviour of ranges that have gone through
protobuf serialisation.

There are two approaches I can take here; 1. walk the ast and conform
the end pos to the start, or 2. use go-cmp and ask it to ignore the end
pos.

Are either of these OK with you (I'd prefer the latter), or is there
another approach you'd prefer.

cheers
Dan

Tristan Swadell

unread,
Apr 24, 2024, 3:54:03 PMApr 24
to Dan Kortschak, cel-go-...@googlegroups.com
Hi Dan,

Since the end position isn't serializable into the proto that CEL uses on the wire, I would use go-cmp and ignore the end position during comparison. Option 2, ftw!

Cheers,

-Tristan

--
You received this message because you are subscribed to the Google Groups "CEL Go Discussion Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cel-go-discus...@googlegroups.com.

Dan Kortschak

unread,
May 5, 2024, 2:15:27 AMMay 5
to cel-go-...@googlegroups.com
On Tue, 2024-04-23 at 09:37 -0700, 'Tristan Swadell' via CEL Go
Discussion Forum wrote:
> The parser/unparser.go file has a very limited sense of formatting
> which also relies on enabling macro call tracking in order to recover
> the unexpanded macro call sites. If you are interested in updating
> the formatting logic, I'd be happy to review the PRs.

I've had a look at this and in the first instance I'd like to make a
third-party package. If it looks like something that would be good to
bring back into to the cel-go repo, then that's something that can be
considered after. I think that probably the goals of the two things are
slightly different, but maybe they could live in parallel.

cheers
Dan


Dan Kortschak

unread,
Jul 26, 2024, 2:02:25 AM (yesterday) Jul 26
to cel-go-...@googlegroups.com
On Sun, 2024-05-05 at 06:15 +0000, 'Dan Kortschak' via CEL Go
I have put together something that works for out needs and could
possibly form the basis for discussion of a more general approach. It
is available here https://github.com/efd6/celfmt.

It performs a complete compilation rather that just a parsing of the
source (I found this necessary to get macros to format properly at some
stage, but I haven't revisited that), and I had some issues getting
token position information, so for examples I can't perfectly reliably
recover string literal formatting when a triple quote is at the start
of a new line.

Dan

Reply all
Reply to author
Forward
0 new messages