Re: Improvements to the Sphinx in-tree documentation generation

Gijs Kruitbosch

unread,

Dec 16, 2013, 3:58:24 AM12/16/13

to Gregory Szorc, Firefox Dev

From what I've been able to tell from the interwebs, extracting
documentation (autodoc-fashion) from JS/C++/IDL isn't possible with
Sphinx. Is that correct?

This makes me sad because it means I probably have to end up
copy-pasting my source code documentation elsewhere either way (in which
case I might as well copy/paste into MDN myself). :-(

~ Gijs

On 12/12/13, 09:08 , Gregory Szorc wrote:
> After I announced the in-tree build docs powered by Sphinx a few
> months ago [1], a few people came to me and said "that's really cool -
> I want something like that for my module."
>
> I'm pleased to announce that as of bug 939367 landing in inbound a few
> hours ago, you can now deposit Sphinx docs anywhere in the tree and
> they will get picked up during `mach build-docs`. This feature is
> self-documenting and you can find the instructions in the output of
> `mach build-docs` or at [2]. See build/docs or services/docs in the
> tree for what this looks like in practice.
>
> Yes, the docs use the old MDN theme. They will use the new theme once
> someone updates [3]. For all I know, someone is already hard at work
> doing that. We also have bug 920314 tracking getting these docs
> published on MDN.
>
> Sphinx is extremely extensible. If you have ideas for better
> integrating it with anything, let me know!
>
> Please report any issues or questions you may have. Core :: Build
> Config for now.
>
> [1]
> https://groups.google.com/d/msg/mozilla.dev.platform/HQOL8YKiJmE/wlvktOlQSpIJ
> [2]
> https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/index.html
> [3] https://github.com/lmorchard/mozilla-mdn-sphinx-theme
> _______________________________________________
> firefox-dev mailing list
> firef...@mozilla.org
> https://mail.mozilla.org/listinfo/firefox-dev

Gregory Szorc

unread,

Dec 16, 2013, 1:17:34 PM12/16/13

to Gijs Kruitbosch, Firefox Dev, dev-platform

On 12/16/13, 12:58 AM, Gijs Kruitbosch wrote:
> From what I've been able to tell from the interwebs, extracting
> documentation (autodoc-fashion) from JS/C++/IDL isn't possible with
> Sphinx. Is that correct?

It is possible.

Sphinx can consume Doxygen's XML output to generate C++ docs via Breathe
[1]. I experimented with this during Summit, but one of Doxygen or
Sphinx was allocating 10+ GB memory trying to build the docs for all of
the tree and made my laptop at the time explode. We could probably make
things work by only reading part of the tree or incrementally generating
docs for the entire tree. There was also some XML Unicode encoding bug
in Doxygen I ran into. That's what happens when you attempt to construct
XML via string concatenation rather than going through an XML API (oh,
Doxygen). It would probably be pretty easy to write something against
the Clang API, since Clang's AST exposes parsed Doxygen doc entities [2].

As for JS and IDL, it's all /possible/. You just need a way to feed
stuff into Sphinx. If someone writes a tool that can extract JS
docblocks into a machine readable format, we can integrate them into
Sphinx. It's doable - I just don't think anyone has done it yet.

I agree our current mechanism for JS documentation is pretty bad. We
desire to document both the source and MDN for obvious reasons. But
nobody wants to burdened with writing docs twice. So typically in-tree
or MDN docs suffer. Neither is great for maintainability or consumers.
IMO we should just write in-tree source docs and export to MDN. Goodbye
syncing problem.

Are there any JS doc tools that can export to a machine readable format?
Does SpiderMonkey expose documentation blocks to the AST? If not, should it?

[1] http://michaeljones.github.io/breathe/
[2] http://clang.llvm.org/doxygen/group__CINDEX__COMMENT.html

Gijs Kruitbosch

unread,

Dec 16, 2013, 1:28:13 PM12/16/13

to Gregory Szorc, Firefox Dev

On 16/12/13, 18:17 , Gregory Szorc wrote:
> Are there any JS doc tools that can export to a machine readable format?
> Does SpiderMonkey expose documentation blocks to the AST? If not, should
> it?

Other parsers certainly expose comments in the AST; I don't think they
explicitly have pointers from the functions/objects to their docblocks,
but it shouldn't be too hard to add a post-processing step that does
that. I don't know if there's agreement on building that kind of thing
into SpiderMonkey.

~ Gijs

Jeff Walden

unread,

Dec 16, 2013, 1:46:25 PM12/16/13

to

On 12/16/2013 01:17 PM, Gregory Szorc wrote:
> Does SpiderMonkey expose documentation blocks to the AST? If not, should it?

No, and probably not. Comments are not tokens, so they're not in the AST. Right now SpiderMonkey pretty much just throws them away (except to the extent the comment includes a line break, in which case /*\n*/ and similar represent a line break for semicolon-insertion rules). And it'd be really unfortunate to have to include them -- then things like a DoWhileNode would have to be gunked up a bunch to store information like in |do { } /* fnord */ while /* quoi */ (true);|.

Maybe there's some sort of intelligent exposure that could nonetheless be done. But I doubt it's a good idea to build it atop a parser designed for executing code, and not for exactly faithfully representing it.

Jeff

Gregory Szorc

unread,

Dec 16, 2013, 3:09:24 PM12/16/13

to Jeff Walden, dev-pl...@lists.mozilla.org

Perhaps Reflect.parse() could grow a new option to expose "comment"
nodes or could attach comment metadata to specific node types? This API
is SpiderMonkey proprietary (implying we can do what we want with it) right?

FWIW, someone could build a comment parser on top of Reflect.parse().
But you'd have to scan the lines before each node and parse out comment
blocks. Seems much more robust, easier (over the long run), and more
beneficial to have the forward-scanning parser in the engine just do it.
Perhaps someone should build a proof-of-concept docs parser on top of
Reflect.parse()?

Eric Shepherd

unread,

Dec 16, 2013, 3:40:30 PM12/16/13

to dev-platform, Gijs Kruitbosch, Firefox Dev, Gregory Szorc

On December 16, 2013 at 1:21:06 PM, Gregory Szorc (g...@mozilla.com) wrote:
I agree our current mechanism for JS documentation is pretty bad. We
desire to document both the source and MDN for obvious reasons. But
nobody wants to burdened with writing docs twice. So typically in-tree
or MDN docs suffer. Neither is great for maintainability or consumers.
IMO we should just write in-tree source docs and export to MDN. Goodbye
syncing problem.

We have plans to add support to MDN for allowing content to be "pushed" onto MDN from other sites using scripts or the like. There is already the beginnings of a "write" API that allows content to be injected into MDN, and with that in concert with permissions management, it would be possible to set up the JSAPI section of MDN such that it was pushed onto the wiki by a tool that interpreted in-source comments and output HTML formatted for the wiki.

This could be a "best of both worlds" scenario that you guys could be quite happy with.

The only reason we haven't finished implementing support for this is that no teams have stepped up to say they'd definitely use it; once one does, I think it wouldn't take all that terribly long to complete, since much of the underlying functionality is partly or even mostly implemented.

--
Eric Shepherd
Developer Documentation Lead
Mozilla
Blog: http://www.bitstampede.com/
Twitter: http://twitter.com/sheppy

Benjamin Smedberg

unread,

Dec 16, 2013, 3:50:10 PM12/16/13

to Eric Shepherd, dev-platform, Gijs Kruitbosch, Firefox Dev, Gregory Szorc

On 12/16/2013 3:40 PM, Eric Shepherd wrote:
> This could be a "best of both worlds" scenario that you guys could be quite happy with.
>
> The only reason we haven't finished implementing support for this is that no teams have stepped up to say they'd definitely use it; once one does, I think it wouldn't take all that terribly long to complete, since much of the underlying functionality is partly or even mostly implemented.

Is there any objection on your side to just having sphinx `mach
build-docs` push directly to MDN? If not, that seems slightly preferable
to posting them at
https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/index.html

--BDS

Gregory Szorc

unread,

Dec 16, 2013, 3:52:44 PM12/16/13

to Benjamin Smedberg, Eric Shepherd, dev-platform, Gijs Kruitbosch, Firefox Dev

This is what we're tracking in
https://bugzilla.mozilla.org/show_bug.cgi?id=920314. I'm just waiting
for someone to tell me how to publish to MDN and it will happen.

Andrew Sutherland

unread,

Dec 16, 2013, 3:57:09 PM12/16/13

to dev-pl...@lists.mozilla.org

On 12/16/2013 03:09 PM, Gregory Szorc wrote:
> Perhaps Reflect.parse() could grow a new option to expose "comment"
> nodes or could attach comment metadata to specific node types? This
> API is SpiderMonkey proprietary (implying we can do what we want with
> it) right?
>
> FWIW, someone could build a comment parser on top of Reflect.parse().
> But you'd have to scan the lines before each node and parse out
> comment blocks. Seems much more robust, easier (over the long run),
> and more beneficial to have the forward-scanning parser in the engine
> just do it. Perhaps someone should build a proof-of-concept docs
> parser on top of Reflect.parse()?

The Esprima JS parser can already generate comment nodes. The API
otherwise conforms to the same output standards as SpiderMonkey's Parser
API.

ex: esprima.parse(code, { comment: true });

See:
http://esprima.org/doc/
http://esprima.org/demo/parse.html (and click "include comments")

Andrew

Eric Shepherd

unread,

Dec 16, 2013, 4:00:22 PM12/16/13

to dev-platform, Gijs Kruitbosch, Benjamin Smedberg, Firefox Dev, Gregory Szorc

On December 16, 2013 at 3:50:13 PM, Benjamin Smedberg (benj...@smedbergs.us) wrote:
Is there any objection on your side to just having sphinx `mach
build-docs` push directly to MDN? If not, that seems slightly preferable
to posting them at
https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/index.html

Once we've got the feature finished? No, I can't see why, as long as we can customize the output to use the appropriate styles and the like, it should work well and look great.

Brandon Benvie

unread,

Dec 16, 2013, 4:30:24 PM12/16/13

to dev-pl...@lists.mozilla.org

On 12/16/2013 12:57 PM, Andrew Sutherland wrote:
> The Esprima JS parser can already generate comment nodes. The API
> otherwise conforms to the same output standards as SpiderMonkey's
> Parser API.

There's also acorn, which is an ES5 parser written in JS that's in the
tree
(http://mxr.mozilla.org/mozilla-central/source/toolkit/devtools/acorn).
The down side to both acorn and esprima is that neither support
SpiderMonkey specific extensions and syntax. Esprima supports most or
all of ES6 syntax (in its harmony branch) but some of the features in SM
aren't compatible with ES6 (for example, array/generator expressions
have the reverse order in SM from ES6).

We added acorn to the tree for devtools because we needed a tokenizer
and access to comments.

Jeff Walden

unread,

Dec 17, 2013, 4:57:07 PM12/17/13

to Gregory Szorc

On 12/16/2013 03:09 PM, Gregory Szorc wrote:

> Perhaps Reflect.parse() could grow a new option to expose "comment" nodes or could attach comment metadata to specific node types?

It's really not possible to do the latter. Comments don't appertain to specific nodes at all. They're just random non-token things interspersed in the token stream. Note that the Esprima stuff mentioned in the other email doesn't attempt to put comments in the parse tree -- comment text and locations are stored in a side-along array. The user has to re-associate the comments with particular nodes after the fact. Fundamentally, comments are a token-level feature, not a parse tree-level feature.

We sort of can do whatever we want with the parser API. Except for Esprima compatibility for one. And also except that our parser doesn't track comments at all, so we'd have to spend extra time and memory to record locations and such, beyond what we do now. The side-along Esprima approach is probably the only way this can work sensibly, given just how many places comments can be put; I don't think any API we might have should attempt associations itself.

Jeff

Jason Orendorff

unread,

Dec 19, 2013, 12:38:46 PM12/19/13

to Gregory Szorc, dev-platform

Moving back to dev-platform.

On 12/17/13 4:41 PM, Gregory Szorc wrote:
> I guess what I'm trying to say is that SpiderMonkey/JavaScript appears
> lacking in the "language services" arena.

The concrete features you mentioned are extracting doc comments and
minimization, so that's what I'll address here. (What else did you have
in mind? Tokenization, perhaps, for syntax highlighting? Anything else?)

Doc comments can be done two ways:

* In pure JS, using a Reflect.parse() implementation that provides
comment syntax. This requires either supporting SpiderMonkey extensions
in Esprima or supporting comment output in our parser. Neither sounds hard.

* We could bless a particular notion of what a doc comment is and
implement that in our parser, and surface the results via Reflect.parse.

The former sounds better. It's more flexible. It's less code. And
improving Esprima and/or our Reflect.parse implementation is valuable
for tools other than documentation tools.

> FWIW, this issue dovetails with us trying to minify chrome JS (bug
> 903149). One can argue that if we had access to robust JavaScript
> language services (possibly including minifying) from the engine
> itself, we wouldn't be stuck in a rut there either.

I interpret this as saying: We should have JS minimization, and
architecturally it makes sense for that to be implemented in the JS
engine. But I'm not sure I agree on either point.

According to the latest in bug 903149, we're not sure minimization is
worth pursuing.

Architecturally, I think the argument is that minimization will be
brittle in the face of new syntax, unless the JS engine hackers maintain
it. OK. But I don't think this balances all the arguments against.

There are already several good open-source minimizers for JS. We don't
have the bandwidth to stand up and maintain a new minimizer nearly as
good as (say) Google Closure Compiler. The reason existing minimizers
don't work with Mozilla code is SpiderMonkey extensions; I think the
long-term fix for that is to converge on ES6. (Short-term fixes are
possible too.)

Minimization comes in many flavors (consider bug 903149 comment 37) and
I tend to think the JS engine should provide powerful general APIs that
can be used to implement many kinds of minimization and many other
things too. Reflect.parse is that kind of API. In bug 903149, I got to
write a 7-line program with Reflect.parse that detected bugs in jsmin.
You could write a primitive minimizer by starting with the code in
Reflect_stringify.js, in
<https://bug590755.bugzilla.mozilla.org/attachment.cgi?id=558508>, and
just deleting the bits that output indentation and spacing. This is
really cool! It's pure JS. Anyone who can hack JS could do it. That's
the kind of API I want to support.

-j