Build-integrated API Reference HELP system Evaluation

5 views
Skip to first unread message

Stephen Bohlen

unread,
Mar 29, 2009, 6:34:59 PM3/29/09
to nhibernate-...@googlegroups.com
All:
 
As I committed to in a prior thread ( http://groups.google.com/group/nhibernate-development/browse_thread/thread/729819b625001217 ) I have now completed a preliminary review of the two possible approaches to auto-generating API docs for NHibernate from the XML code comments as part of our build process.
 
Recall that one suggestion (offered by Will) was to investigate James Gregory's new alpha build of 'Docu' ( http://docu.jagregory.com/ ), the light-weight code-comment-compiler he wrote for generating help content for the Fluent NHibernate project and offered as OSS to anyone wanting to use it for any other project.  I countered that we should also consider the more robust DocProject ( http://www.codeplex.com/DocProject ) app that shields the developer from having to interact with the incredible complexity that is the SandCastle + MSHelp compiler infrastructure from MS.  What follows are the results of my doing just that.
 
Test platform:
Dell D830 laptop, Intel Core 2 Duo, 32-bit WinXP/SP3, 4GB RAM
Visual Studio Pro 2008 SP1
NHibernate 1.2.1 GA release (binaries and XML comment files, no source needed)
 
Test platform notes: I used the 1.2.1 GA release of NH just because its what I happened to grab off my hard drive at first; I have no reason to believe that the results of any of my tests would be materially affected by running them on any subsequent build/release/version of NH so I don't think this to be an impact on the tests or their results.
 
***Docu Testing and Observations***
 
Docu works by simply firing off a command-line and passing it the path to your binary (nhibernate.dll in this case).  It then constructs pure HTML output that can be loaded/viewed in a browser without needing to be hosted on a webserver (though the content could of course be posted to a web server for others to view as desired).
 
From the get-go, I had a number of issues (unhandled null-reference exceptions) thrown by the Docu EXE iteself when operating on the NHibernate.dll and its XML code comments.  I eventually grabbed the latest code from Docu's hosted location on GitHub and built it myself in VS.  The latest code had the same unhandled exceptions but at least with the code in hand I could troubleshoot the issue(s) myself :D
 
My 'fixes' to Docu probably aren't worth committing back to that project since most of them basically checked for null instances of variables at critical points and return from the methods if nulls are passed to them (probably NOT the desired behavior, but certainly enough for me to get Docu to successfully produce output from the nhibernate.dll assembly without throwing exceptions).  I have no idea if these exceptions are due to any strange (unexpected?) syntax we are using in the code comments within the NHibernate codebase or are simply the result of Docu being in early-alpha and not properly handling otherwise legitimate code comment syntaxes, but none-the-less we need to be aware that as it exists RIGHT NOW TODAY, Docu and the NHibernate project's XML code comments are fundamentally incompatible with each other without there being changes to at least one or the other :(
 
Once I tweaked the Docu souce code to successfully run against the NHibernate.dll without throwing null-reference-exceptions, I was able to produce the API refererence docs that I have posted on my server for download by anyone interested at the following URL: http://unhandled-exceptions.com/downloads/NHibernate_121_Docu_Test.zip.  The good news is that this documentation is light-weight (pure HTML) and the ZIP file is barely 2MB in size for the entire help collection.  To view the results, unzip it somewhere and just click on the index.htm to load the 'site' in your browser of choice.
 
The generation of these comments by Docu is *NOT* speedy; Docu took approximately 15+ minutes to generate the output, most of that was spent with my dual-core processor locked @ 50% utilization with near-zero disk activity, suggesting that Docu is processor-bound in its performance and expects and uses just a single core to do its work (suggesting that throwing more hardware at it isn't likley to help much unless/until Docu becomes multi-threaded).  Significant disk activity only occured briefly at the end of the 15 minutes when the final output was rendered to the files included in the aforementioned ZIP file, suggesting that the compilation isn't disk I/O-bound at all.
 
This suggests that even though running Docu is as simple as passing a single-argument command-line to it, its largely infeasible to invoke it as part of *every* build sequence while someone was working on the NH codebase since the post-build documentation compilation step would take a prohibitively long time.  It might be reasonable to setup Docu to run remotely on some dedicated CI build-server (e.g., the codebetter teamcity installation, etc.) so that it happened post-checkin, but due to the long runtime for the doc-compilation process, its nearly certain that this process would always have to run out-of-band as a developer worked on the project and made their check-ins.
 
Since all that's needed is a single command-line invocation, integrating Docu into the CI server's build process would be trivial and Docu's lack of dependency on any other infrastructure (e.g., SandCastle, help compilers, etc.) makes it trivial for anyone to run the thing themselves were they to check out the source to their own PC (although as mentioned, they would have to wait the 15+ minutes for the process to run to completion were they to invoke Docu against the project).
 
 
 
***DocProject Testing and Observations***
 
DocProject is a significantly more complex and signficantly more feature-rich XML code-compilation solution than Docu.  This is both a positive (better, more useful API reference compilation) and a negative (significant complexity and dependencies on other tools, etc.).
 
DocProject works by automating the MS SandCastle infrastructure and, optionally, the MS Help compiler v 1.x and/or v 2.x to produce its output.  As such, these dependencies have to be present (and properly installed) in order for DocProject to funciton properly.  The good news is that once this is accomplished, DocProject is capable of producing compiled help as a single .CHM file, a .HxS visual-studio-integrated help file that can be installed right into the VS help subsystem and accessible via F1 from within Visual Studio, and a complete ASP.NET web site that can be deployed to a server for wider access to the content.  As DocProject installs and is controlled as a new 'project type' in Visual Studio, you select as part of the New-Project-Wizard in Visual Studio which of these output targets you are interested in compiling to.
 
I had little trouble getting DocProject up and running on my system; after installation of the SandCastle infrastructure and the requisite MS help compilers, the DocProject installer capably interrogates the registry to discover the paths to these items and wires itself up to them just fine.  Once installed, I need not interact with the underlying components at all and can control/confgure the behavior of the compiled help output solely from with Visual Studio by editing the DocProject settings for the custom VS project type.  This makes for a familiar UI (build property pages, etc.) for configuring the output of the system.
 
Performance of the DocProject system in generating the help output was no better (or worse!) than that of Docu, taking about the same 15+ minutes to produce its output.  This suggests that performance/speed isn't a factor in determining which of these directions to pursue.  There doesn't seem to be a significant change in the compilation time based on what output targets you select (e.g., CHM, ASP.NET web site, etc.) so I am strongly guessing that the vast bulk of the 15+ minutes is spent in processing the comments rather than spitting them out to actual help artifacts.  Since this is about the same 15+ minutes that Docu took, I'm going to conclude that there is little that could be done to reduce this processing time significantly.
 
The results of my running the nhibernate.dll and its related comments through the DocProject process are posted for download by anyone interested at the following URL: http://unhandled-exceptions.com/downloads/NHibernate_121_DocProject_Test.zip .  Becuase the DocProject output is a complete ASP.NET web site including graphics, icons, etc. instead of just standard HTML and because this download also contains the complete CHM file, this download is over 90+ MB in size.  Since its an ASP.NET web site, to view this content you will need to unzip it somewhere and then point an IIS virtual directory to it in order to view/consume it.  Once you do this, the website also contains a link (in the upper right) which leads to the compiled 15 MB .chm file if you are interested in seeing that content as well (but it looks almost 100% identical to the ASP.NET content, so not much need for that).
 
DocProject is (ultimately) invoked from MSBUILD, and so it would be possible to wire it up as well as a post-build event or a CI task that automatically happened out-of-band when code is checked into the repository (just as with Docu) but since its MSBUILD this task-integration would probably be more complex than the simpler command-line invocation that Docu provides.  Also, since DocProject is dependent on Sandcastle and the MS help compilers to do its work, these dependencies would need to be installed/configured on whatever CI platform invoked the API Reference compilation step of course.
 
 
 
 
***SUMMARY OF COMPARISON***
 
Docu
---------
Pros:
  • simple to configure/invoke
  • no external dependencies on other tools
  • light-wt output (small output size 2MB+/-)
  • final output can be viewed in browser w/out a web server (e.g., just HTML files)
  • web output can be posted to a non-IIS/ASP.NET web server for public access
Cons:
  • early alpha tool
  • presently throws exceptions and crashes when pointed at the NH project :(
  • no search capability in the output (beyond CTRL+F on page-by-page basis); intended usage pattern seems to be BROWSE, not SEARCH
  • no single-file output target (e.g., CHM)
  • no integration of output with Visual Studio Help system
  • takes 15+ minutes to run
 
DocProject
----------------
Pros:
  • output looks/feels like rest of Microsoft (MSDN) help and offers familiar navigation of content
  • offers single-file output target (CHM)
  • output is searchable in its entirety at once (vs. page-at-a-time)
  • index automatically built and integrated into output
  • Visual Studio integrated help can be an output target
  • configuration is performed in a familiar environment (Visual Studio)
Cons:
  • external dependency on MS tools (sandcastle, help compilers, etc.)
  • significantly larger website output (90+ MB)
  • web content needs IIS/ASP.NET to host it for public access
  • more complex process of integrating it into build scripts
  • takes 15+ mnutes to run
 
***RECOMMENDATION***
 
IMO the DocProject approach is the more robust of the two options, offering a more familiar presentation of content to the end-user and richer experience in interacting with the content (e.g., integrated seach, indexed keywords, etc.).  If we are going to bother to do this, I think it would be most valuable to do it in a way that the resulting content is the most approachable and the most usable by as many people as possible and IMO that's the output provided by the DocProject approach.  It offers the web-based content that should be posted to the internet as well as the CHM file for those wanting offline reference to the content.  For the adventuresome, there is even the VS-integrated content making the NHibernate API reference a full-fledged participant in the VS help system (supporting valuable learning scenarios such as placing your cursor on an NH class/method and being able to jump to help on it via a simple F1 keystroke from inside Visual Studio -- followed, sadly, by the interminable 10-minute wait for the VS help system to spool up and load, of course!).
 
The biggest challenge to the DocProject approach IMO is the dependency on SandCastle, the MS help compilers, etc. and if the DocProject help generator VS project were added directly to the NHibernate trunk solution, then anyone interested in building NH would need to either unload the DocProject VS project from the solution or else get all of those dependencies installed just to build/compile NH and that's too high a burden to ask anyone to achieve if all they want is to check out the core NH project and build/compile it for themselves IMO.
 
One of the important things to understand about *either* of these help compilation tools is that neither of them actually require access to *any* of the NH source code directly -- instead they simply require access to the compiled binaries and the XML code-comment files extracted from the source code by the C# compiler at build-time.  This actually means that I think the best way to accomplish the creation of a rich API reference for NH is to create a separate parallel solution (NH_API_Reference?) that is *not* part of the main NH solution but contains (relative) path-pointers to the location of the compiled NH binaries from the actual NH solution itself.  This way, the 'API Reference Project' can be completely separate and distinct from the actual NH source trunk.
 
This would support the following scenarios:
 
1) if you want to build just NH, you get that trunk and build it; the sln, nant scripts, etc. make no refernece to the DocProject stuff at all and nobody is affected (nobody needs SandCastle, MS Help compilers, etc. to build the NH trunk just as is the case today)
 
2) if you want to build the API ref docs, you check out BOTH the NH trunk and the API_REF trunk, build the NH trunk, and then build the API_REF trunk that points to the bin output folder from the NH source trunk to get the binaries and the XML it needs to process; this scenario would (of course) require you to have installed SandCastle, the MS Help compilers, etc. in order to perform the compilation of the API reference docs but only such people would be affected
 
It seems to me that this would support the needs of everyone in a way that would have the least negative impact on the 'real' NH source trunk and yet still permit us to construct the most robust API reference content for any NH adopter.
 
Sorry to all for the (ridiculous) length of this thing, but as this is hardly the kind of decision I think I should (could!) make on my own, I wanted to try to summarize as much of my findings as I could so that everyone can understand the factors that will play into our decision and help form the basis for any discussion anyone wants to have about how best to proceed.
 
Thoughts (as always) welcome; I'm sure I'm overlooking several pros and cons for either solution so am hoping a discussion here about this will surface some of my oversights.

XOR

unread,
Mar 30, 2009, 6:32:20 AM3/30/09
to nhibernate-development
+1 for DocProject

It's definitely more standard way, and it produces more standard-
looking help. Remember that most of .NET developers are used to MSDN-
style documentation. So all "deviations" would be considered
inprofessional.

I also think that use of Sandcastle is good, rather than bad, thing.
Why can't we use nDoc anymore? IMHO, main reason is that they got
tired of following Microsoft additions to the framework. And this can
be the end of any community tool, while MS would most likely keep
Sandcastle up to date.

Best,
Andrew

Fabio Maulo

unread,
Mar 30, 2009, 9:43:50 AM3/30/09
to nhibernate-...@googlegroups.com
+1 DocProject

About compilation we may use an explicit NAnt/MsBuild task.

P.S. don't sorry for the length of the mail.

2009/3/29 Stephen Bohlen <sbo...@gmail.com>



--
Fabio Maulo

Stephen Bohlen

unread,
Mar 30, 2009, 9:51:05 AM3/30/09
to nhibernate-...@googlegroups.com
Great idea.

Something like...

nant reference-api-docs

...or similar.

Since I wasnt working w the NH source but rather the binaries, I was testing purely in VS and not considering that (of course) the build scripts = the way NH gets compiled rather then using 'pure' VS 'F6' to compile/build.

-Steve B.


From: Fabio Maulo
Date: Mon, 30 Mar 2009 10:43:50 -0300
To: <nhibernate-...@googlegroups.com>
Subject: [nhibernate-development] Re: Build-integrated API Reference HELP system Evaluation

Fabio Maulo

unread,
Mar 30, 2009, 9:56:50 AM3/30/09
to nhibernate-...@googlegroups.com
Only to no forget it...
The other thing needed is a HowTo_CompileDocumentation.txt with all info/links needed before run the NAnt/MsBuild task.

2009/3/30 Stephen Bohlen <sbo...@gmail.com>



--
Fabio Maulo

Stephen Bohlen

unread,
Mar 30, 2009, 10:00:03 AM3/30/09
to nhibernate-...@googlegroups.com
Yes, excellent pt.

I imagine with links to SandCastle, help compilers, etc as needed.

We might actually also consider KEEPING installs of all this stuff downloadable from us (NH) so that we can ensure correct versions of these things, etc (e.g. proper CTP build of sandcastlean etc) so long as this doesnt violate some distro lic agreement for these tools.

-Steve B.


From: Fabio Maulo
Date: Mon, 30 Mar 2009 10:56:50 -0300

Fabio Maulo

unread,
Mar 30, 2009, 10:08:26 AM3/30/09
to nhibernate-...@googlegroups.com
Ok.
We can wait 2 days to evaluate votes. 
Note: noVote.Is.IdenticalTo(+1)

2009/3/30 Stephen Bohlen <sbo...@gmail.com>



--
Fabio Maulo

Will Shaver

unread,
Mar 30, 2009, 10:40:03 AM3/30/09
to nhibernate-...@googlegroups.com
Excellent work on the analysis, thanks for doing that. Is our nhforge.org server IIS? Could we get the docs on it from DocProject?

 -Will

Stephen Bohlen

unread,
Mar 30, 2009, 10:43:00 AM3/30/09
to nhibernate-...@googlegroups.com
I did check that myself actually + the ans = yes (at least thats my guess based on the fact that the homepage is default.aspx <g>).

We should be good -- thats one thing I failed to mention in my war-and-peace-length analysis ;)

-Steve B.


From: Will Shaver
Date: Mon, 30 Mar 2009 07:40:03 -0700

Fabio Maulo

unread,
Mar 30, 2009, 10:52:09 AM3/30/09
to nhibernate-...@googlegroups.com
2009/3/30 Stephen Bohlen <sbo...@gmail.com>

We should be good -- thats one thing I failed to mention in my war-and-peace-length analysis ;)

LOL

Never is enough, old man ;)
--
Fabio Maulo

James Gregory

unread,
Mar 30, 2009, 12:24:45 PM3/30/09
to nhibernate-development
Hello everyone,

Don't let me disturb your voting, I'm not here to try to sway your
opinions in even the slightest. I just wanted to say thank you for
even considering Docu, it's very flattering. Your comments have been
absolutely invaluable and I will do my best to try to improve Docu
based on what you've said.

I plan on bringing up this up on our mailing list, and anyone who is
interested is more than welcome to weigh in on the discussion.

I hope whichever you choose (I think we know which one it'll be! :) )
works out for you guys.

James

On Mar 30, 3:52 pm, Fabio Maulo <fabioma...@gmail.com> wrote:
> 2009/3/30 Stephen Bohlen <sboh...@gmail.com>

Stephen Bohlen

unread,
Mar 30, 2009, 1:03:14 PM3/30/09
to nhibernate-...@googlegroups.com
James:

Thanks for the input; I 100% realize that Docu and DocProject are only really *casually* trying to solve the same general problem and are approaching it in completely different ways with completely different value systems of course.

If you're interested, I'd be more than glad to share more details on exactly where the null-ref exceptions I stumbled upon in Docu occurred so that you could dig deeper into their cause; as I mentioned in my initial post my code-fixes to it were speedy band-aids rather than probably ideal fixes to the issues but they *do* point to at least the location of some of the problems even as they don't (probably) represent the 'right' way to fix them.

I'll jump over to the Docu discussion list and share them there since this is (a bit!) OT for this list, if that's the best way to do this (else I can just send you a patch of the changes and you can inspect them for yourself at your leisure).

For the record, I love what you're trying to do with Docu -- its simplicity (and lack of dependencies especially!) is a really nice value proposition.

Thanks again and I'll jump over to the Docu disucussion list when I have a sec to elaborate on the technical details of what I found,

-Steve B.

Tuna Toksoz

unread,
Jul 2, 2009, 11:06:47 AM7/2/09
to nhibernate-...@googlegroups.com
What is the progress on this? I can try integrating sandcastle command line thing into the trunk as ndoc2 is throwing on me at the moment.

Tuna Toksöz
Eternal sunshine of the open source mind.

http://devlicio.us/blogs/tuna_toksoz
http://tunatoksoz.com
http://twitter.com/tehlike

Tuna Toksoz

unread,
Jul 2, 2009, 11:07:28 AM7/2/09
to nhibernate-...@googlegroups.com
and i know that ndoc2 is not for api documentation :)

Stephen Bohlen

unread,
Jul 2, 2009, 11:13:34 AM7/2/09
to nhibernate-...@googlegroups.com
I have this basically completed using DocProject back-ended with SandCastle under the hood.

The remaining tasks are:
  • convert the vs-integrated help file compilation to a command-line invocation that's build-script-invokable
  • customize the graphics, color, etc. of the output HTML for the web-based version of the API reference
  • puzzle out the best way to include/deploy the dependencies (probably in \TOOLS\) needed for successful invocation of SandCastle-based projects
  • commit the above work to the 2.1 branch
I would anticipate being able to complete this over the coming weekend without any troubles.

Tuna Toksoz

unread,
Jul 2, 2009, 11:16:25 AM7/2/09
to nhibernate-...@googlegroups.com
Awesome, thanks Stephen! And congrats for the MVP award once more!!
Reply all
Reply to author
Forward
0 new messages