[erlang-questions] incremental parsing

Joel Reymont

unread,

Jun 26, 2011, 7:51:30 PM6/26/11

to erlang-questions@erlang.org Questions

Assuming an editor backend written in Erlang,
any suggestions on how to implement incremental parsing of Erlang code?

Thanks, Joel

--------------------------------------------------------------------------
- for hire: mac osx device driver ninja, kernel extensions and usb drivers
---------------------+------------+---------------------------------------
http://wagerlabs.com | @wagerlabs | http://www.linkedin.com/in/joelreymont
---------------------+------------+---------------------------------------

_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Vlad Dumitrescu

unread,

Jun 27, 2011, 2:49:26 AM6/27/11

to Joel Reymont, erlang-questions@erlang.org Questions

Hi Joel,

On Mon, Jun 27, 2011 at 01:51, Joel Reymont <joe...@gmail.com> wrote:
> Assuming an editor backend written in Erlang,
> any suggestions on how to implement incremental parsing of Erlang code?

We're doing a somewhat incremental parsing in erlide. The idea is that
you keep track of the token stream created by the lexer and when there
is a change in the text, you know that only some tokens can be
affected: in the worst case all tokens following the place where the
change was, but if you detect a token that is identical to what it was
before then you don't need to continue. In a similar way, the syntax
tree needs to keep track of the tokens it is created from and when
there are changes only these parts need to get parsed again.This works
ok when there are no syntax errors (which happen all the time while
typing, until one is done). We try to be smart about that, for example
keep track of all the forms in the module and not directly let the
retokenization pass the form boundary: if the user starts typing a
string literal, we don't want the rest of the file to become marked as
a string, but we guess that the string will be ended in the same form
that it was started in.

I said we do "somewhat incremental" parsing because we are using
coarser granularity than possible (i.e. at the form level). We found
it's fast enough and the additional work isn't worth it. The parser
would need deeper changes and I don't like having to maintain a
parallel lexer and parser. The code is at
https://github.com/erlide/erlide/tree/master/org.erlide.kernel.ide/src/,
in erlide_scan.erl and the parser/ directory, hopefully it's
understandable. If not and you want to know more, please ask.

regards,
Vlad

Toby Thain

unread,

Jun 27, 2011, 8:24:44 AM6/27/11

to erlang-q...@erlang.org

On 27/06/11 2:49 AM, Vlad Dumitrescu wrote:
> Hi Joel,
>
> On Mon, Jun 27, 2011 at 01:51, Joel Reymont <joe...@gmail.com> wrote:
>> Assuming an editor backend written in Erlang,
>> any suggestions on how to implement incremental parsing of Erlang code?
>
> We're doing a somewhat incremental parsing in erlide. The idea is that
> you keep track of the token stream created by the lexer and when there
> is a change in the text, you know that only some tokens can be
> affected: in the worst case all tokens following the place where the
> change was, but if you detect a token that is identical to what it was
> before then you don't need to continue. In a similar way, the syntax
> tree needs to keep track of the tokens it is created from and when
> there are changes only these parts need to get parsed again.This works
> ok when there are no syntax errors (which happen all the time while
> typing, until one is done). We try to be smart about that, for example
> keep track of all the forms in the module and not directly let the
> retokenization pass the form boundary: if the user starts typing a
> string literal, we don't want the rest of the file to become marked as
> a string, but we guess that the string will be ended in the same form
> that it was started in.

I wish all syntax highlighting editors were clever like this.

--Toby

>
> I said we do "somewhat incremental" parsing because we are using
> coarser granularity than possible (i.e. at the form level). We found

> it's fast enough and the additional work isn't worth it. ...

Reply all

Reply to author

Forward