Walking a JavaScript source tree

61 views
Skip to first unread message

Jeremy Gillick

unread,
Dec 28, 2005, 9:45:57 PM12/28/05
to

I'm trying to write a tool, similar to JavaDoc, that will take JavaScript source code and find all the functions, classes, methods & properties to build documentation from.  I could use a whole bunch of regular expressions to do the job but would rather use either Rhino or SpiderMonkey for complete ECMAScript standard compliance (why re-invent the wheel).

The most important thing I need to get is a list of ALL JS elements (functions, methods, classes & properties) *with* their line numbers.

So far I've been playing with Rhino and can get it to give me all the functions and their line numbers, but cannot get any of the class properties.

Is this possible to do with either Rhino or SpiderMonkey?  Can somebody explain how?  Code examples would be great!

Thanks,
Jeremy

Brendan Eich

unread,
Dec 29, 2005, 1:17:01 PM12/29/05
to Jeremy Gillick
Jeremy Gillick wrote:
>
> The most important thing I need to get is a list of ALL JS elements
> (functions, methods, classes & properties) **with** their line numbers.

No classes in JS1.x -- what do you mean?

> Is this possible to do with either Rhino or SpiderMonkey? Can somebody
> explain how? Code examples would be great!

For SpiderMonkey, see jsparse.h. The JSParseNode discriminated union is
commented in a minimal mock-markup-table style, so you have to read for
commas after TOK_* type entries in the left column to see that that
column's token type entry continues down another row, and the table
cells to its right also stack to the same row-height (if you get what I
mean).

The JS_FRIEND_API entry point of interest is js_ParseTokenStream. The
memory for the JSParseNodes is allocated from cx->tempPool, so you will
have to include jsarena.h too, and use JS_ARENA_MARK before parsing, and
JS_ARENA_RELEASE after successfully parsing and walking the parse node
tree to do whatever it is you want to do, to ultimately free that space.

Good luck. If you develop a separate jsparseapi.[ch] pair of files to
hide these details better, with some kind of visitor design pattern for
the tree walk, and abstraction from the messy, concrete details of the
JSParseNode struct, that would be winning.

I'd welcome such a new API module as a contribution to SpiderMonkey. We
should talk about the design of such an API first if you want to do it,
or you could dive in, get concrete to learn the details, then take a
stab at it. Your call.

/be

Jeremy Gillick

unread,
Dec 29, 2005, 2:29:03 PM12/29/05
to Brendan Eich


Brendan Eich wrote:
Jeremy Gillick wrote:

The most important thing I need to get is a list of ALL JS elements (functions, methods, classes & properties) **with** their line numbers.

No classes in JS1.x -- what do you mean?

Sorry, I simply meant objects.  As I understand 'object' refers to something that has been instantiated, so I use the word 'class' to describe the constructor code.  I've also been playing with JSDoc (http://jsdoc.sourceforge.net/) and got into the habit of calling them classes.  Should I just call them objects, or is there a better term?


Is this possible to do with either Rhino or SpiderMonkey?  Can somebody explain how?  Code examples would be great!

For SpiderMonkey, see jsparse.h.  The JSParseNode discriminated union is commented in a minimal mock-markup-table style, so you have to read for commas after TOK_* type entries in the left column to see that that column's token type entry continues down another row, and the table cells to its right also stack to the same row-height (if you get what I mean).

The JS_FRIEND_API entry point of interest is js_ParseTokenStream.  The memory for the JSParseNodes is allocated from cx->tempPool, so you will have to include jsarena.h too, and use JS_ARENA_MARK before parsing, and JS_ARENA_RELEASE after successfully parsing and walking the parse node tree to do whatever it is you want to do, to ultimately free that space.

Good luck.  If you develop a separate jsparseapi.[ch] pair of files to hide these details better, with some kind of visitor design pattern for the tree walk, and abstraction from the messy, concrete details of the JSParseNode struct, that would be winning.

I'd welcome such a new API module as a contribution to SpiderMonkey.  We should talk about the design of such an API first if you want to do it, or you could dive in, get concrete to learn the details, then take a stab at it.  Your call.

/be
Thanks for the info, this is very helpful.  I would love to discuss this further with you, if you have the time (and yes, I'm local).

Thanks,
Jeremy

Todd Fisher

unread,
Dec 29, 2005, 3:18:38 PM12/29/05
to
You might look at the jslint project, it's a source code validator for
javascript. Might be able to do what you want in javascript :-)

-Todd

Brendan Eich

unread,
Dec 29, 2005, 5:01:52 PM12/29/05
to Jeremy Gillick
Jeremy Gillick wrote:

> Sorry, I simply meant objects. As I understand 'object' refers to
something that has been instantiated, so I use the word 'class' to
describe the constructor code. I've also been playing with JSDoc
(http://jsdoc.sourceforge.net/) and got into the habit of calling them
classes. Should I just call them objects, or is there a better term?


If you mean a constructor function, then constructor is the word to use.
If you mean an object initaliser (esp. assigned to a class prototype,
e.g., C.prototype = {m: function(){...}, ...}), then object initialiser
is the ECMA term (some use "object literal", but it's an abuse of
"literal").

> Thanks for the info, this is very helpful. I would love to discuss
this further with you, if you have the time (and yes, I'm local).


I'm on vacation this week. Suggest we use wiki.mozilla.org to hash
things out so that others can see and participate. I'll make a page to
get things started and post to the newsgroup. I may not get to this
before returning to work next week, but I'll do it then if not sooner.

/be

Jeremy Gillick

unread,
Jan 3, 2006, 9:33:32 PM1/3/06
to
I've been playing around with the code and so far it's really great.  I'm able to step through all the elements by using the JSParseNodes struct.  The only thing I am now stuck with is, how do I map the ParseNode element with the actual (compiled?) JS element?  For example, it very nicely tells me there's a function on line 32 and a variable on line 15, however, how do I get the name of the function and variable?  Is there a way to see if the variable is set as a const?  Can I get the params defined in the function?  Is there's some documentation (other than the embedder's guide) that can bring me up to speed on how this works?

Anyways, this is a great start and I'm very close to having what I need and hopefully something I can contribute back to the SpiderMonkey project.

Thanks,
Jeremy

Brendan Eich

unread,
Jan 4, 2006, 1:30:37 AM1/4/06
to Jeremy Gillick
Jeremy Gillick wrote:
> I've been playing around with the code and so far it's really great.
> I'm able to step through all the elements by using the JSParseNodes
> struct.


Shaver had a thought: would you be even better off using Narcissus's
parser (http://lxr.mozilla.org/mozilla/source/js/narcissus/ -- look at
jsdefs.js and jsparse.js)? It should work as-is (unlike all of the
Narcissus metacircular interpreter) in Firefox.


> The only thing I am now stuck with is, how do I map the
> ParseNode element with the actual (compiled?) JS element? For example,
> it very nicely tells me there's a function on line 32 and a variable on
> line 15, however, how do I get the name of the function and variable?
> Is there a way to see if the variable is set as a const? Can I get the
> params defined in the function? Is there's some documentation (other
> than the embedder's guide) that can bring me up to speed on how this works?


Again, the long comment in jsparse.h is the only guide. Then you also
have to unpack pointed-at data structures such as JSAtom and JSFunction.
Given a JSAtom *, js_AtomToPrintableString (JS_FRIEND_API in jsatom.h)
gives a const char * rendition. A JSFunction * can lead to the function
object (JS_GetFunctionObject, public API in jsapi.h), and you can then
use jsdbgapi.h's JS_GetPropertyDescArray to find out about local vars
and args.

But this is hairy C stuff, much of it private rather than friend, even.
So shaver's Narcissus idea is looking better all the time -- that way
everything is an easily inspected JS object.

/be

Jeremy Gillick

unread,
Jan 4, 2006, 4:12:57 PM1/4/06
to

Brendan Eich wrote:
> Jeremy Gillick wrote:
>> I've been playing around with the code and so far it's really great.
>> I'm able to step through all the elements by using the JSParseNodes
>> struct.
>
>
> Shaver had a thought: would you be even better off using Narcissus's
> parser (http://lxr.mozilla.org/mozilla/source/js/narcissus/ -- look at
> jsdefs.js and jsparse.js)? It should work as-is (unlike all of the
> Narcissus metacircular interpreter) in Firefox.
>

Narcissus does look cool and I'm in the process of setting it up now.


>
>> The only thing I am now stuck with is, how do I map the ParseNode
>> element with the actual (compiled?) JS element? For example, it very
>> nicely tells me there's a function on line 32 and a variable on line
>> 15, however, how do I get the name of the function and variable? Is
>> there a way to see if the variable is set as a const? Can I get the
>> params defined in the function? Is there's some documentation (other
>> than the embedder's guide) that can bring me up to speed on how this
>> works?
>
>
> Again, the long comment in jsparse.h is the only guide. Then you also
> have to unpack pointed-at data structures such as JSAtom and
> JSFunction. Given a JSAtom *, js_AtomToPrintableString (JS_FRIEND_API
> in jsatom.h) gives a const char * rendition. A JSFunction * can lead
> to the function object (JS_GetFunctionObject, public API in jsapi.h),
> and you can then use jsdbgapi.h's JS_GetPropertyDescArray to find out
> about local vars and args.
>
> But this is hairy C stuff, much of it private rather than friend,
> even. So shaver's Narcissus idea is looking better all the time --
> that way everything is an easily inspected JS object.
>

I would prefer to get all the data I need through the C libraries. I've
been reading through the source code and trying to figure it out. The
only problem I have with Narcissus is that I'll be doing everything else
(template generation, etc) through C and would like to keep from
throwing data back and forth between languages. However, if it does
provide an easier interface, that may push me to use it. Thanks for
your comments and support.

Thanks,
Jeremy

Mike Shaver

unread,
Jan 4, 2006, 5:42:59 PM1/4/06
to
On 1/4/06, Jeremy Gillick <j...@mozmonkey.com> wrote:
> I would prefer to get all the data I need through the C libraries. I've
> been reading through the source code and trying to figure it out. The
> only problem I have with Narcissus is that I'll be doing everything else
> (template generation, etc) through C and would like to keep from
> throwing data back and forth between languages. However, if it does
> provide an easier interface, that may push me to use it. Thanks for
> your comments and support.

I'm not the sort of person to advocate use of XML for XML's sake, but
it seems like narcissus+E4X could produce a pretty nice XML
representation of the parse tree for you, which you could then process
nicely with libxml2 or whatever you prefer on the C side.

You would also be able to manipulate with XSLT or such tools as an
intermediate phase, should such things be to your liking.

Mike

Jeremy Gillick

unread,
Jan 4, 2006, 11:32:29 PM1/4/06
to

Brendan Eich wrote:
>
> Shaver had a thought: would you be even better off using Narcissus's
> parser (http://lxr.mozilla.org/mozilla/source/js/narcissus/ -- look at
> jsdefs.js and jsparse.js)? It should work as-is (unlike all of the
> Narcissus metacircular interpreter) in Firefox.
>

I've been working with Narcissus and have a couple thoughts:

1) This is a really cool library that is easy to use and it gives me ALL
the information I needed.

2) However, when testing it with a 24k JavaScript file it took 2.5
minutes to complete the my_load() call. My tool will be used over a
variety of scripts and might even be run in batch, so I'll need
something fast.

I'll continue working with the C libs and figure out how to do it that
way. Through the process I'll probably create a tree
building/organizing library that could be integrated back into the
SpiderMonkey project. All your input is appreciated and very helpful.

Thanks,
Jeremy

bki...@gmail.com

unread,
Jan 5, 2006, 3:35:42 AM1/5/06
to
> I'll continue working with the C libs and figure out how to do it that
> way. Through the process I'll probably create a tree
> building/organizing library that could be integrated back into the
> SpiderMonkey project. All your input is appreciated and very helpful.
>

Jeremy, I wouldnt mind helping with your effort. You can lead the
effort and let me know what kind of assistance I can provide.

Kimman

Brendan Eich

unread,
Jan 5, 2006, 4:54:18 PM1/5/06
to Jeremy Gillick
Brendan Eich wrote:

> I'm on vacation this week. Suggest we use wiki.mozilla.org to hash
> things out so that others can see and participate. I'll make a page to
> get things started and post to the newsgroup. I may not get to this
> before returning to work next week, but I'll do it then if not sooner.


As promised, http://wiki.mozilla.org/JavaScript:SpiderMonkey:Parser_API.
Edit well!

/be

Brendan Eich

unread,
Jan 5, 2006, 4:56:59 PM1/5/06
to Jeremy Gillick
Jeremy Gillick wrote:
>
> 1) This is a really cool library that is easy to use and it gives me ALL
> the information I needed.

Good to hear.

> 2) However, when testing it with a 24k JavaScript file it took 2.5
> minutes to complete the my_load() call. My tool will be used over a
> variety of scripts and might even be run in batch, so I'll need
> something fast.

Yeah, Narcissus is poky right now. It will speed up, but perhaps not no
your schedule.

/be

Blake Kaplan

unread,
Jan 6, 2006, 1:42:08 AM1/6/06
to
Jeremy Gillick wrote:
> 2) However, when testing it with a 24k JavaScript file it took 2.5
> minutes to complete the my_load() call. My tool will be used over a
> variety of scripts and might even be run in batch, so I'll need
> something fast.

Out of curiosity, what happens if you call parse() directly (instead of
calling my_load which also evaluates the given script). That is, what is
the rough breakdown of parse time vs. execution time?

There's been talk in the past of trying to point a JavaScript profiler
at Narcissus, but unfortunately nobody has had time to do so, yet.
--
Blake Kaplan

Jeremy Gillick

unread,
Jan 6, 2006, 4:06:13 AM1/6/06
to

I created a couple functions (below) to do this. When I used the big
file it took 1 minute and 27 seconds to parse and only 0.002 seconds to
execute. Is it really possible to speed up the parse time very much?
Is there any idea how much and when that is planned to be accomplished?

CODE:

/**
* Profile the Narcissus parser
* @param {String} filename The path the the script to use in profiling
*/
function profile(filename){

// Process script
print("Parsing...");
var start1 = new Date().getTime();
var tree = parse(snarf(filename), filename, 1);
print("Done in "+ getTimeString((new Date().getTime()) - start1));

// Execute script
print("Executing...");
var start2 = new Date().getTime();

var x = new ExecutionContext(GLOBAL_CODE);
ExecutionContext.current = x;
try {
execute(tree, x);
} catch(e){
print("A JavaScript error occurred");
}

print("Done in "+ getTimeString((new Date().getTime()) - start2));
}

/**
* Return a friendly string representation of the time in minutes and
seconds.
* @param {int} time The time in milliseconds
* @return {String}
*/
function getTimeString(time){
var out = "";
time = time / 1000;

// Minutes
if(time > 60){
out = Math.floor(time / 60) +" minute(s) ";
time = Math.round(time % 60);
}

// Seconds
out += time +" second(s)";

return out;
}

Jeremy Gillick

unread,
Jan 19, 2006, 6:15:41 PM1/19/06
to
Brendan,

Just wanted to give an update and let you know that Kimman Balakrishnan and I are working on this and will update the wiki with our notes.  Kimman has already made a great deal of progress and the outlook of a robust jsparseapi.[ch] is looking very promising.

Thanks,
Jeremy
Reply all
Reply to author
Forward
0 new messages