Towards an Incremental Racket Parser for better IDE experience?

110 views
Skip to first unread message

nicobao

unread,
Dec 2, 2020, 9:53:52 AM12/2/20
to Racket Users
Hi!

The Racket Reader and the Racket Expander always return "Error : blabla" when you send it a bad Racket source code.
As a consequence, when there is a source code error, DrRacket and the Racket LSP cannot provide IDE functionalities like "find references", "info on hover", "find definition"...etc.
This is an issue, because 99% of the time one write code, the code is incorrect. Other languages (Rust, Typescript/JS, Java, OCaml...etc) rely on an incremental parser than can provide a tree even if the source code is wrong. Basically it adds an "ERROR" node in the tree, and go on instead of stopping everything and returning at the first error.
Currently this compiler issue is blocking the Racket IDE to provide better user experience.
For my practical use case of Racket, it is important.

I would like to help working towards that direction.
I see two possible solutions to that:
1) improve the recursive descent parser of the Reader, as well as the Expander to make them incremental and fault-tolerant
2) re-writing the parser in something like tree-sitter or Menhir, at the cost of having to re-write the Reader/Expander logic (!!!)

Both solutions are daunting tasks.

For solution 1), could you point me to the Racket's recursive descent parser source code? What about the Expander ?

For solution 2), I was thinking of writing a tree-sitter grammar for racket. However, I can't find a formal description of the grammar, like Scheme did here:
https://www.scheme.com/tspl4/grammar.html#APPENDIXFORMALSYNTAX
Of course, the Racket documentation is still quite comprehensive, but it would be nice if anyone could tell me if there is such formal document somewhere?
Besides, I wonder whether Racket/Scheme could even be described using a LR(1) or a GLR grammar?

Finally, is any work have been started towards this direction?

Totally off-topic, but has anyone ever thought of compiling Racket down to OCaml, in order to reuse js_of_ocaml and produce optimized JS code from Racket?
I was wondering whether it would be feasible.

Final note: I know all of that is _very_ ambitious!

Kind regards,
Nicolas

Robby Findler

unread,
Dec 2, 2020, 10:08:30 AM12/2/20
to nicobao, Racket Users
I'm not sure this approach is going to work for Racket. Being able to run `read` when the input is malformed is going to only get you so far, as the macro expansion seems unlikely to work and this is a point of extension for programers using Racket.

Maybe a better approach would be to help DrRacket be better at keeping information from the last time the expansion was successful and apply that information to a program that has been edited since then?

Robby


--
You received this message because you are subscribed to the Google Groups "Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/d77440e3-1876-44e5-b52b-323d5715df66n%40googlegroups.com.

Sorawee Porncharoenwase

unread,
Dec 2, 2020, 10:10:40 AM12/2/20
to Robby Findler, nicobao, Racket Users

IIUC, many tools (Racket Mode, drcomplete) do exactly what Robby said: cache the most recent successfully expanded code.



Robby Findler

unread,
Dec 2, 2020, 10:12:04 AM12/2/20
to Sorawee Porncharoenwase, nicobao, Racket Users
Oh, and I should have pointed out that, at least for the read level, DrRacket already is incremental and can recover from errors. This helps only with the syntax coloring, however.

Robby

Sam Tobin-Hochstadt

unread,
Dec 2, 2020, 1:56:58 PM12/2/20
to nicobao, Racket Users
A few thoughts on these topics, which I've been thinking about for a while.

First, let's distinguish two things. One is an _incremental_ system,
such as a parser, which is one which does less work in response to a
small change than it would need to do from scratch. The other is a
system with _error recovery_, which is one where in the presence of
one error, the system can still provide a useful answer and/or
continue on to discover other errors. tree-sitter, for example, aims
to do both of these, but they're quite different.

With that in mind, several points:

1. It would be relatively straightforward to build an incremental
_reader_ -- going from text to s-expressions. You could start from the
grammar here: https://github.com/racket/parser-tools/blob/master/parser-tools-lib/parser-tools/examples/read.rkt
which is just for Scheme, and the lexer here:
https://github.com/racket/syntax-color/blob/master/syntax-color-lib/syntax-color/racket-lexer.rkt
which is for full Racket, which as Robby says is already
error-tolerant. The read syntax (in the absence of reader extensions)
is definitely context-free and probably LR(1). The code for the reader
is here: https://github.com/racket/racket/tree/master/racket/src/expander/read

However, just calling `read` from scratch every time isn't a big
bottleneck -- the biggest Racket-syntax file I have around is about
86000 lines and takes 700ms to `read`.

2. As Robby points out, the big challenge is the macro expander, which
is (a) not a grammar, (b) large and complicated (the code is here:
https://github.com/racket/racket/tree/master/racket/src/expander and
it's about 35k lines) and (c) it runs arbitrary Racket code in the
form of macros. I'm definitely interested in thinking about what an
incremental expander would look like, but that's a big research
project and probably would require a different model of macros than
Racket has right now. It would not work to use some existing parsing
toolkit like tree-sitter. You could perhaps write a new macro expander
using an incremental computation framework such as Adapton
[https://docs.rs/adapton/0.3.31/adapton/] or write something like
Adapton for Racket. How well that would work is an interesting
question. You could also rewrite the macro expander to be incremental
more directly.

3. An error-tolerant macro expander is more plausible, but would again
require substantial changes to the expander. One possible idea is to
use the information the macro stepper already uses to reconstruct the
partial program right before it went wrong, and supply that to the IDE
to use for completion/etc. Another idea would be to replace pieces of
erroneous syntax with something that allows the expander to continue
(this is how error-tolerant parsers work). There are probably lots
more ideas that we could come up with.

4. Compiling to one of the OCaml intermediate languages is an
interesting idea -- I've thought about their flambda language as a
possible target before. The place to start is the `schemify` layer:
https://github.com/racket/racket/tree/master/racket/src/schemify that
turns fully-expanded Racket code into Scheme code for Chez Scheme.
Changing that to produce flambda would be plausible, although there
are a lot of mismatches between the languages that would be tricky to
overcome. Another possibility would be to directly produce JavaScript
from that layer. You might be interested in the RacketScript project:
https://github.com/vishesh/racketscript

If you're interested in thinking more about these topics, or working
on them, I'm happy to offer more advice.

Sam

nicobao

unread,
Dec 9, 2020, 12:46:40 PM12/9/20
to Racket Users
Hi all,

I've read with great attention your messages, especially Sam's very comprehensive answer.
I now clearly understand that it's a research-level work, and I was definitely too ambitious in trying to dig into that - as I have limited time after my day job (and probably too limited knowledge too, but that's a non-problem as it wouldn't be a one-person task anyway).
Nevertheless, I really appreciate the exchange.

Kind regards,
Nicolas

Sam Tobin-Hochstadt

unread,
Dec 9, 2020, 2:24:28 PM12/9/20
to nicobao, Racket Users
Hi Nicolas,

I do want to encourage you to keep thinking about this stuff -- some
of the things I described are definitely doable and would be
interesting projects, even if re-writing the entire expander is a big
task.

Sam
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/1a94fbdb-69f5-4c29-9dcf-4349de89ac16n%40googlegroups.com.

William G Hatch

unread,
Dec 14, 2020, 6:43:32 PM12/14/20
to Sam Tobin-Hochstadt, nicobao, Racket Users
On Wed, Dec 02, 2020 at 01:56:41PM -0500, Sam Tobin-Hochstadt wrote:
>A few thoughts on these topics, which I've been thinking about for a while.
>
>First, let's distinguish two things. One is an _incremental_ system,
>such as a parser, which is one which does less work in response to a
>small change than it would need to do from scratch. The other is a
>system with _error recovery_, which is one where in the presence of
>one error, the system can still provide a useful answer and/or
>continue on to discover other errors. tree-sitter, for example, aims
>to do both of these, but they're quite different.


I thought I might as well chime in here that I'm currently working on
an incremental parsing system for Racket that will support parser
combinators, BNF, and arbitrary ad-hoc parsing procedures. I already
have a parsing system[1] that supports expressive parsing with any mix
of those constructors that supports all context-free grammars and
beyond, so that much is already “successful”. Unfortunately, I
haven't been able to make it as performant as I would like. The
original goal wasn't to support incremental parsing, and I didn't
write incremental support into that implementation. However, while
trying to optimize it I also decided to figure out how to make it
incremental, so now I have the solution for that too.

I'm now working on another similar parsing system that is a little
less expressive (in particular it has more limited support for
ambiguous grammars, and thus doesn't support the full class of
context-free grammars, though it will still support useful things
outside of CFG, such as non-CFG things that Racket and Scribble
parsers do), but should be much faster (because it gets to avoid extra
work supporting the possibility of ambiguity). I'm adding my solution
to incremental parsing support to this version. If I can ever make
the original algorithm faster I'll add incremental support to it too.
Both systems leverage delimited continuations at their core -- it
turns out they are really useful for parsing, both for improving
expressiveness and for incrementality.

That said, I'm not currently aiming to solve the error recovery part
of the problem.

[1] https://github.com/willghatch/racket-chido-parse

Hendrik Boom

unread,
Dec 14, 2020, 11:35:17 PM12/14/20
to Racket Users
Error recovery on reparsing?? Just use the parse you had from the
last good parse, but mark the parse tree where it doesn't match
the parsed string.

-- hendrik

>
> [1] https://github.com/willghatch/racket-chido-parse
>
> --
> You received this message because you are subscribed to the Google Groups "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-users...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/20201214234324.GA13213%40conspirator.
Reply all
Reply to author
Forward
0 new messages