Introducing Aulx — an Autocompletion Library for the Web.

107 views
Skip to first unread message

Thaddee Tyl

unread,
Mar 7, 2013, 6:44:13 AM3/7/13
to js-t...@googlegroups.com
Hi everyone!
I'd like to introduce the Aulx project.
An example of a running instance is at <http://espadrine.github.com/aulx/>,
the source code is at <https://github.com/espadrine/aulx>.

The purpose is to give any editor advanced autocompletion capabilities,
for the languages of the Web: JS, CSS are available right now,
HTML is coming.
It is very easy to plug into an editor, and has advanced features
(more about this below).

The idea is to reuse the autocompletion system so as to have, for instance,
autocompletion of JS and CSS inside HTML.
Also, quite intuitively, each autocompletion for a language actually
shares a bit of code.

Let me start by comparing my system to competitors.

I started off this journey by looking at the system in use in Orion and Scripted,
but was put off because it is difficult to reuse outside of the Orion project.
Furthermore, it relies on a fork of Esprima which has very unpredictable
(and usually poor) performance. That fork attempts to obtain an
abstract syntax tree
even when the source code is not valid JS.

DoctorJS attempts to do the closest thing to a full type analysis,
next to actually
running the code (which I believe inspired SpiderMonkey's own type
inference algorithm).
It heavily relies on Narcissus, a JS interpreter whose parser is not
acclaimed for its speed.
It doesn't matter for that project, because the analysis itself is
quite expensive.
As a result, it fits better as a one-off ctags than a constantly
updated autocompletion.
Also, it only performs a static type analysis.
It may be defunct, but I still use it when coding in vim; I even
contributed to it to make
the jsctag command work.

Unlike those, I took speed very seriously. The AST walker isn't a set
of recursive functions,
it is a (surprisingly small) while loop.
The static analysis doesn't attempt to do a thorough type analysis.
It relies on property usage (if a variable A is used with a property
B, B will show up
as a candidate) and the result of functions (if a variable A is set to
the result of calling B,
eg, `A = B()` or `A = new B()` even, then all variables set to the
result of B() will have
the same candidate properties).

It may seem simple until you compare working on the full jQuery
source with Aulx
and with Orion. Orion is both slower and less precise.
The only completion for `jQuery.` that Orion's autocompletion gives is the
alphabetically (or rather, ASCII-betically) ordered list of the
properties on Array.prototype.
That's the type that was inferred.
Aulx gives you the full list of properties you might want, including
'fx', 'Event', 'ready', 'Animation'
and so on.

The belief behind the type analysis that I do is that the user wants
autocompletion
to help him explore the source code, rather than just give him type information.
Remember that Aulx is still being worked on to provide more subtle information,
but what it has is already great.

But the project is slightly more than just a static analysis tool.

For starters, it only needs the source code and a {line:, ch:} caret
to get started.
Knowing what type of completion is needed doesn't even rely on parsing the file.
That part of the system, which I call the contextualizer, has a fancy,
proven (for ES5),
algorithm which is both very fast and very resilient.
At its heart is Esprima's tokenizer.

It also includes no-side-effects dynamic analysis, given a JS environment.
I would love to see this feature combined with live development
(and V8's live edit features).
It doesn't stop with JS, too: CSS is there and improving
(I've already started to work on CSS static analysis), HTML will come,
and I hope to have optional languages, too, like CoffeeScript.

I am exploring ideas with the weighting system, too.
Right now, the heuristics are a simple combination of the following
two criteria:

- A candidate that is frequently used is heavier,
- A candidate that is used close to the caret is heavier.

A heavier candidate appears first. It yields subjectively good results so far.
I still have ideas to make it better and more dynamic, too.

This system is likely to make it into Firefox' DevTools,
and I'd love to see it used in other places.

Thanks a lot for reading this!
I'd love to have feedback.

Cheers!
Thaddée Tyl.

John J Barton

unread,
Mar 7, 2013, 11:02:42 AM3/7/13
to js-t...@googlegroups.com
On Thu, Mar 7, 2013 at 3:44 AM, Thaddee Tyl <thadd...@gmail.com> wrote:
Hi everyone!
I'd like to introduce the Aulx project.
An example of a running instance is at <http://espadrine.github.com/aulx/>,
the source code is at <https://github.com/espadrine/aulx>.

The purpose is to give any editor advanced autocompletion capabilities,
for the languages of the Web: JS, CSS are available right now,
HTML is coming.
It is very easy to plug into an editor, and has advanced features
(more about this below).

The idea is to reuse the autocompletion system so as to have, for instance,
autocompletion of JS and CSS inside HTML.

Isn't autocompletion of JS inside of HTML just autocompletion of JS? Does the surrounding HTML affect the completions?
 
Also, quite intuitively, each autocompletion for a language actually
shares a bit of code.

Let me start by comparing my system to competitors.

Another competitor that may be more important: Sublime Text 2. I find its autocompletion to be better than mature systems like Eclipse for example.
 

I started off this journey by looking at the system in use in Orion and Scripted,
but was put off because it is difficult to reuse outside of the Orion project.
Furthermore, it relies on a fork of Esprima which has very unpredictable
(and usually poor) performance. That fork attempts to obtain an
abstract syntax tree
even when the source code is not valid JS.

Did you intend to be critical of the attempt to deal with invalid JS?
 


DoctorJS attempts to do the closest thing to a full type analysis,
next to actually
running the code (which I believe inspired SpiderMonkey's own type
inference algorithm).
It heavily relies on Narcissus, a JS interpreter whose parser is not
acclaimed for its speed.
It doesn't matter for that project, because the analysis itself is
quite expensive.
As a result, it fits better as a one-off ctags than a constantly
updated autocompletion.
Also, it only performs a static type analysis.

This comparison implies that you use dynamic analysis.
 
It may be defunct, but I still use it when coding in vim; I even
contributed to it to make
the jsctag command work.

Unlike those, I took speed very seriously.

(Unfortunately the editor in your demo is very sluggish, it feels like it was deliberately styled to be slow).
 
The AST walker isn't a set
of recursive functions,
it is a (surprisingly small) while loop.
The static analysis doesn't attempt to do a thorough type analysis.
It relies on property usage (if a variable A is used with a property
B, B will show up
as a candidate) and the result of functions (if a variable A is set to
the result of calling B,
eg, `A = B()` or `A = new B()` even, then all variables set to the
result of B() will have
the same candidate properties).

Based on the demo, you are doing something more. For example, after 'i' you offer 'item' in the first function and 'item' is an argument.  

Do you compute and use scope (rather than just distance?).

On the other hand, after 't' it does not offer 'this', which seems unfortunate.
 

It may seem simple until you compare working on the full jQuery
source with Aulx
and with Orion. Orion is both slower and less precise.
The only completion for `jQuery.` that Orion's autocompletion gives is the
alphabetically (or rather, ASCII-betically) ordered list of the
properties on Array.prototype.
That's the type that was inferred.
Aulx gives you the full list of properties you might want, including
'fx', 'Event', 'ready', 'Animation'
and so on.

The belief behind the type analysis that I do is that the user wants
autocompletion
to help him explore the source code, rather than just give him type information.

I guess the user wants autocompletion to guess what they will type next. So I think you mean: type analysis is only one means to the goal of good autocompletion and thus it can be crude if it can be fast.
 

Remember that Aulx is still being worked on to provide more subtle information,
but what it has is already great.

But the project is slightly more than just a static analysis tool.

In my experience the biggest flaw in autocompletion -- after speed as you say -- is the reliance on single file analysis. Modular code places functions for classes in different files. Using an object of a class in a different file means attempting to call its methods; in single file analysis these methods will not be visible to the autocompleter. This issue alone is why Eclipse succeeds despite its terrible performance and lame UI.
 

For starters, it only needs the source code and a {line:, ch:} caret
to get started.
Knowing what type of completion is needed doesn't even rely on parsing the file.
That part of the system, which I call the contextualizer, has a fancy,
proven (for ES5),
algorithm which is both very fast and very resilient.
At its heart is Esprima's tokenizer.

Now this is starting to sound like magic. I have a very hard time matching 'no parsing' with the rest of your description.
 


It also includes no-side-effects dynamic analysis, given a JS environment.

Again this does not match what you say earlier.
 
I would love to see this feature combined with live development
(and V8's live edit features).
It doesn't stop with JS, too: CSS is there and improving
(I've already started to work on CSS static analysis), HTML will come,
and I hope to have optional languages, too, like CoffeeScript.

I am exploring ideas with the weighting system, too.
Right now, the heuristics are a simple combination of the following
two criteria:

- A candidate that is frequently used is heavier,
- A candidate that is used close to the caret is heavier.

A heavier candidate appears first. It yields subjectively good results so far.
I still have ideas to make it better and more dynamic, too.

This system is likely to make it into Firefox' DevTools,
and I'd love to see it used in other places.

Thanks a lot for reading this!
I'd love to have feedback.

Well my feedback was intended to be supportive even if my tone is sometimes critical!

jjb
 

Cheers!
Thaddée Tyl.

Andrew Eisenberg

unread,
Mar 8, 2013, 12:07:47 PM3/8/13
to js-t...@googlegroups.com

I'm the author of the esprima content assist plugin in Orion as well as the related work in scripted.  This sounds like an interesting project and thanks for pointing it out, but so far your demo doesn't seem to do much of what you say it does.  That's fine, since I understand this is an early release, but I'd like to know what is a limitation of your approach vs what is just not yet implemented.  For example, if I type:

    var xxx = { yyy : { zzz : 9 } };
    xxx.yyy.|

I would expect to get zzz as a completion, but I don't get anything.  Similarly, I would expect to see zzz as a completion over here:

    var xxx = { yyy : { zzz : 9 } };
    var aaa = xxx.yyy;
    aaa.|
   
You are saying that you are not using a parse, but only a tokenizer.  That's an interesting way of doing things, but it seems that it would limit you and that may explain some of what I mention above.  Also, do you have any plans for integrating cross-file awareness?

Thaddee Tyl

unread,
Mar 9, 2013, 1:16:20 PM3/9/13
to js-t...@googlegroups.com
On Friday, March 8, 2013 6:07:47 PM UTC+1, Andrew Eisenberg wrote:
I'm the author of the esprima content assist plugin in Orion as well as the related work in scripted.  This sounds like an interesting project and thanks for pointing it out, but so far your demo doesn't seem to do much of what you say it does.  That's fine, since I understand this is an early release, but I'd like to know what is a limitation of your approach vs what is just not yet implemented.  For example, if I type:

    var xxx = { yyy : { zzz : 9 } };
    xxx.yyy.|  
I would expect to get zzz as a completion, but I don't get anything.

This works as of today.
 
Similarly, I would expect to see zzz as a completion over here: 
    var xxx = { yyy : { zzz : 9 } };
    var aaa = xxx.yyy;
    aaa.|

That requires more work, but will happen eventually.
 
You are saying that you are not using a parse, but only a tokenizer.  That's an interesting way of doing things, but it seems that it would limit you and that may explain some of what I mention above.

I actually do use a parser, but only rarely, and just for static analysis.
Obtaining information about the current position of the cursor, however, relies solely on a tokenizer.
 
Also, do you have any plans for integrating cross-file awareness?

 I am looking into it. Do you have any advice on the subject?

Peter van der Zee

unread,
Mar 9, 2013, 2:57:30 PM3/9/13
to js-t...@googlegroups.com
I can back this up. Assuming that by "only use tokenizer" you mean
you're just using the token stream (and you do not mean "just using a
tokenizer, no parser at all"), that is.

You'll find that for by far most of the static analysis (in JS, at
least, though I expect this is true on a more general level..) it is
often sufficient to just see the tokens that preceed it or follow it.
In JS it helps if function keywords have a reference to the start of
their body and an end of their body (allows jumping over function
headers and bodies arbitrarily).

I'm not counting scope related checks here, because they only need the
hierarchy of scopes, starting from the current function. No need for a
parse tree here either, just stick that scope to the function keyword
and you should be good to go. Of course it is the parser that
generates this scope tree :)

Just wanted to put that out here. In my experience while building
ZeonJS, I found that I did most stuff with the token stream. (Of
course, you can find things for which you do need a parse tree.)

- peter

Andrew Eisenberg

unread,
Mar 9, 2013, 5:07:23 PM3/9/13
to js-t...@googlegroups.com
>> Also, do you have any plans for integrating cross-file awareness?
> I am looking into it. Do you have any advice on the subject?

Take a look at scripted. It tracks down (AMD and commonjs) module
references by searching through the filesystem and (for amd modules)
looking for require configuration blocks.

Thaddee Tyl

unread,
Aug 20, 2013, 2:02:35 PM8/20/13
to js-t...@googlegroups.com
I did some (late) write-up on the techniques I used.

http://isawsomecode.tumblr.com/post/58801223806/aulx-the-tricks

I started to work on cross-file awareness, but that isn't really in yet.

Steven Roussey

unread,
Aug 21, 2013, 11:40:01 AM8/21/13
to js-t...@googlegroups.com
You talk about hovering information for type info, does that include JSDoc comments?
--
--
http://clausreinke.github.com/js-tools/resources.html - tool information
http://groups.google.com/group/js-tools - mailing list information
 
---
You received this message because you are subscribed to the Google Groups "js-tools" group.
To unsubscribe from this group and stop receiving emails from it, send an email to js-tools+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


--
Steven Roussey

Thaddee Tyl

unread,
Aug 21, 2013, 12:36:48 PM8/21/13
to js-t...@googlegroups.com
On Wed, Aug 21, 2013 at 3:40 PM, Steven Roussey <srou...@gmail.com> wrote:
> You talk about hovering information for type info, does that include JSDoc
> comments?

Fetching that information would definitely be a great addition! It's
not in for now, though. I only look at the use of each variable in the
source.

I believe Scripted does that.

Kevin Dangoor

unread,
Aug 21, 2013, 1:35:41 PM8/21/13
to js-t...@googlegroups.com
Tern also looks at JSDoc comments when doing hinting.
Reply all
Reply to author
Forward
0 new messages