Source positions, ie line and column numbers

johnjbarton

unread,

Aug 12, 2011, 5:50:52 PM8/12/11

to UglifyJS

Hi. I need to track the positions of (at least) functions in the
source.

Currently uglify has the embed_tokens option which replaces the syntax
tree node name with a NodeWithToken object. The object has .name
giving the node name, plus .start and .end objects giving the boundary
of the corresponding source token.

This means that code written to work with embed_tokens === true will
not work with code written for the default output. You have to change
every access to ast[0] to be ast[0].name.

On my fork I made a small change to move the NodeWithToken from ast[0]
to ast.loc. That way my parse tree analysis code written with
embed_tokens off works with embed_tokens true.

Of course my test is very minimal. Does any one know whether this
should work or if not what problems it may cause? Or is there a way
to achieve this goal without changing the parser source? (I guess I
would parse with embed_tokens true and walk the tree to convert from
inline to .loc versions.)

Thanks,
jjb

Mihai Călin Bazon

unread,

Aug 13, 2011, 5:03:44 AM8/13/11

to ugli...@googlegroups.com

On Sat, Aug 13, 2011 at 12:50 AM, johnjbarton <johnj...@johnjbarton.com> wrote:

Hi. I need to track the positions of (at least) functions in the
source.

Currently uglify has the embed_tokens option which replaces the syntax
tree node name with a NodeWithToken object. The object has .name
giving the node name, plus .start and .end objects giving the boundary
of the corresponding source token.

... and a toString() method, which means ast[0] == "defun" will work both if ast[0] is a NodeWithToken or a string. But this is an ugly hack, of course. UglifyJS AST processors used to work with embed_tokens, but I'm not sure this is the case anymore (using „switch” in some places, and that involves strict equality).

On my fork I made a small change to move the NodeWithToken from ast[0]
to ast.loc. That way my parse tree analysis code written with
embed_tokens off works with embed_tokens true.

Does this mean that you turned all nodes into objects instead of arrays? Seems like a comprehensive change...

Of course my test is very minimal. Does any one know whether this
should work or if not what problems it may cause? Or is there a way
to achieve this goal without changing the parser source? (I guess I
would parse with embed_tokens true and walk the tree to convert from
inline to .loc versions.)

I can't be sure about how safe your change is. But indeed, it would be rather trivial to write an AST processor that walks the tree and returns object nodes instead of arrays. I'd do this rather than modifying the parser.

Cheers,
--
Mihai Bazon,
http://mihai.bazon.net/blog

John J Barton

unread,

Aug 14, 2011, 12:05:39 PM8/14/11

to ugli...@googlegroups.com

2011/8/13 Mihai Călin Bazon <mihai...@gmail.com>:

> On Sat, Aug 13, 2011 at 12:50 AM, johnjbarton <johnj...@johnjbarton.com>
> wrote:
>>
>> Hi. I need to track the positions of (at least) functions in the
>> source.
>>
>> Currently uglify has the embed_tokens option which replaces the syntax
>> tree node name with a NodeWithToken object. The object has .name
>> giving the node name, plus .start and .end objects giving the boundary
>> of the corresponding source token.
>
> ... and a toString() method, which means ast[0] == "defun" will work both if
> ast[0] is a NodeWithToken or a string. But this is an ugly hack, of
> course. UglifyJS AST processors used to work with embed_tokens, but I'm not
> sure this is the case anymore (using „switch” in some places, and that
> involves strict equality).

Yes, I guess I have === wired into my fingers now.

>
>>
>> On my fork I made a small change to move the NodeWithToken from ast[0]
>> to ast.loc. That way my parse tree analysis code written with
>> embed_tokens off works with embed_tokens true.
>
> Does this mean that you turned all nodes into objects instead of arrays?
> Seems like a comprehensive change...

No, JavaScript has no array type:

>>> var anArray = ['decl',['name','foo'],undefined];
undefined
>>> typeof(anArray);
"object"
>>> anArray instanceof Array
true
>>> anArray.loc = {line:1, col:4};
Object { line=1, col=4}
>>> typeof(anArray);
"object"
>>> anArray instanceof Array
true

>
>>
>> Of course my test is very minimal. Does any one know whether this
>> should work or if not what problems it may cause? Or is there a way
>> to achieve this goal without changing the parser source? (I guess I
>> would parse with embed_tokens true and walk the tree to convert from
>> inline to .loc versions.)
>
> I can't be sure about how safe your change is. But indeed, it would be
> rather trivial to write an AST processor that walks the tree and returns
> object nodes instead of arrays. I'd do this rather than modifying the
> parser.