Writing a book: DocBook or HTML5/XHTML5?

833 views
Skip to first unread message

Fabrizio Giudici

unread,
Feb 3, 2012, 5:07:51 AM2/3/12
to java...@googlegroups.com
I've (laaaazily) started collecting my blog posts, reorganized, into a
book that I'd use for my mentoring activities. The choice so far has been
DocBook + Maven + a specific Maven plugin that allows me to include live
portions of code samples. So far so good. But so far I've worked
copypasting my existing blog posts, translated from HTML to DocBook XML,
while I realize that it would be much more efficient, for new blog posts,
to write them in DocBook XML and convert to HTML. At this point, the
question: with HTML5 which supports microformats, is DocBook still worth
while? Cay Hortsmann, in an interview, said he's using XHTML for writing
his book about Scala:

http://blog.eisele.net/2011/10/heroes-of-java-cay-horstmann.html


Actually, what I find annoying for DocBook is editing. I'm using XmlEditor
from XmlMind (http://www.xmlmind.com/xmleditor/) which explicitly supports
DocBook XML, but it's not the most agile editing I've experienced so far.
Until a few weeks ago I was not satisfied with HTML editors as well, so it
made little difference. But now I've found Bluegriffon and I work wery
well with it. That's why I'm evaluating whether moving definitely to HTML5
is a good idea. More in detail, as I'm moving all my sites to a compact
CMS I've written, the idea for publishing DocBook stuff was to integrate
into my CMS the XSLT transformation from DocBook XML to HTML. The HTML5
approach could make this integration unneeded.

And, BTW, I'm a bit confused about the relationship between HTML5 and
XHTML5, and whether the latter will be really adopted or not.

Thanks for any suggestion.

--
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."
fabrizio...@tidalwave.it
http://tidalwave.it - http://fabriziogiudici.it

A McDowell

unread,
Feb 3, 2012, 5:36:38 AM2/3/12
to java...@googlegroups.com

And, BTW, I'm a bit confused about the relationship between HTML5 and XHTML5, and whether the latter will be really adopted or not.

Just my 2¢, but the value of XHTML isn't necessarily in delivering XML-compliant web pages. With XML namespacing it's easy to integrate XHTML into other XML formats to provide (for example) rich text elements.

Fabrizio Giudici

unread,
Feb 3, 2012, 8:10:26 AM2/3/12
to java...@googlegroups.com, A McDowell

Good points, but it depends on the context. For instance, for my sites I
do some postprocessing on (X)HTML files (e.g. with XSLT) and XML can be
manipulated in a better way (e.g. you don't need JTidy). On the other
side, the idea of integrating elements from other namespaces is good, but
it fails if you want to use a WYSIWYG editor. In my experience, most HTML
editors won't properly manage extra-HTML elements in an efficient visual
way. What I'm trying is a different approach, that is to stay with the
XHTML scheme and using divs with special classes. For instance, I have the
problem of representing photos in a compact way such as:

<myns:photo>
<id>20050817-0092</id>
<caption>The lower Albegna valley and Isola del Giglio on the
horizon.</caption>
</myns:photo>

The idea is that this compact information can be converted in something
more complex, including JQuery support for pop ups, slideshows, etc...
when rendered in a browser. This approach for me failed for visual
editing. What seems to work better is to use <div>s with classes having a
semantic meaning:

<div class="nwXsltMacro_Photo">
<p class="nwXsltMacro_Photo_photoId">20050817-0092</p>
<p class="nwXsltMacro_Photo_caption">The lower Albegna
valley and Isola del Giglio on the horizon.</p>
</div>

Those classes don't match a CSS, but are used for a XSLT transformation.
Actually, with XHTML5 I could do the same thing staying in HTML:


<figure>
<legend>The lower Albegna valley and Isola del Giglio on the
horizon.</legend>
<img src="20050817-0092" alt="" />
</figure>

which seems to be supported by editors such as Bluegriffon, unfortunately
that 20050817-0092 is not a full image path, just an id that is later
manipulated. With the 2nd approach I can see it while editing because it's
a <p>, <img> in this case is rendered as blank.

clay

unread,
Feb 3, 2012, 11:28:30 AM2/3/12
to The Java Posse
Have you considered LaTeX? I haven't used DocBook so I don't know how
that compares.

Fabrizio Giudici

unread,
Feb 3, 2012, 12:36:27 PM2/3/12
to The Java Posse, clay
On Fri, 03 Feb 2012 17:28:30 +0100, clay <clayt...@gmail.com> wrote:

> Have you considered LaTeX? I haven't used DocBook so I don't know how
> that compares.


I did LaTeX in my youth. Now I'm too old :-)

Alex Buckley

unread,
Feb 3, 2012, 8:08:41 PM2/3/12
to The Java Posse
DocBook is good for "real" books where you want to/have to produce a
PDF, thanks to the DocBook-XSL stylesheet package. How do you produce
PDF from HTML5 source?

DocBook is good if you can live within its schema. I store all kinds
of metadata in its attributes and haven't felt limited yet. There is a
fairly strong movement against modifying the schema, though I guess
that's better than HTML5 where you have no choice at all for elements
and very little choice (except at your own risk) for attributes.

The DocBook community is obviously much smaller than the HTML[5]
community, and composed primarily of writers rather than developers. I
get the feeling that most writers work in companies and are provided
with a toolchain which significantly insulates them from DocBook. I
edit in emacs, which isn't so unusual in web-world but is definitely
unusual in DocBook-land.

Overall I think DocBook is a worthy industrial alternative to LaTeX,
but I think that most people will find DocBook too heavyweight and
will instead be satisfied with [X]HTML5.

On Feb 3, 2:07 am, "Fabrizio Giudici" <Fabrizio.Giud...@tidalwave.it>
wrote:
> fabrizio.giud...@tidalwave.ithttp://tidalwave.it-http://fabriziogiudici.it

Alex Buckley

unread,
Feb 3, 2012, 8:13:56 PM2/3/12
to The Java Posse
DocBook is good for "real" books where you want to/have to produce a
PDF, thanks to the DocBook-XSL stylesheet package. How do you produce
PDF from HTML5 source?

DocBook is good if you can live within its schema. I store all kinds
of metadata in its attributes and haven't felt limited yet. There is a
fairly strong movement against modifying the schema, though I guess
that's better than HTML5 where you have no choice at all for elements
and very little choice (except at your own risk) for attributes.

The DocBook community is obviously much smaller than the HTML[5]
community, and composed primarily of writers rather than developers. I
get the feeling that most writers use a toolchain which significantly
insulates them from DocBook. I edit in emacs, which isn't so unusual
in web-world, but is definitely unusual in DocBook-land.

Overall I think DocBook is a worthy industrial alternative to LaTeX,
but I have a feeling that most people will find DocBook too
"corporate" and will instead be satisfied with [X]HTML5.

ngocdaothanh

unread,
Feb 8, 2012, 2:13:39 AM2/8/12
to The Java Posse
Sphinx is much easier than DocBook:
http://sphinx.pocoo.org/

Akka is using Sphinx, have a look at the source and the final result:
https://github.com/jboner/akka/tree/master/akka-docs
http://akka.io/docs/

Lea Hayes

unread,
Jul 25, 2012, 9:15:37 PM7/25/12
to java...@googlegroups.com
Hi Alex


On Saturday, February 4, 2012 1:08:41 AM UTC, Alex Buckley wrote:
DocBook is good for "real" books where you want to/have to produce a
PDF, thanks to the DocBook-XSL stylesheet package. How do you produce
PDF from HTML5 source?

I have been experimenting with using DocBook / DITA and XML-safe HTML5 and my requirements are:

- Produce PDF output with both clickable TOC pages with outline/bookmarks
- Produce vanilla HTML5 output files that can be assembled by some very basic PHP scripts
- Produce JSON TOC that can be used by the very basic PHP scripts for context sensitive sidebar

My initial thoughts were to use DocBook or a custom XML schema. On the plus side this would mean that I can take advantage of DocBook's powerful feature set, but on the minus side the schema is quite a learning curve given limited time constraints plus WYSIWYG editing seems like a no go. Certainly the XML tools that I tried have terrible WYSIWYG for XML formats.

My next thought was to take advantage of the new semantic elements of HTML5 and make use of classes and data-* attributes when additional semantics are useful. I had a number of concerns with this approach including those mentioned in this thread (inconsistent WYSIWYG experience with excess browser generated junk; plus inability to produce quality PDF file). The only CSS3 processor that I can find that fully supports paged media types (Prince) is way too expensive.

After a LOT of research here are my findings:

- wkhtmltopdf is absolutely fantastic at converting HTML5 to PDF. Whilst its support for paged media types is limited to the offerings of WebKit, the command line interface allows custom cover page(s) to be added, an automatically generated TOC (using HTML5 outline), ability to specify custom header and footers using custom HTML files (with JavaScript access to wkhtmltopdf properties). Plus all links (and TOC) are clickable and the PDF outline is generated beautifully.

PDF Output: Tick

- WYSIWYG support that is consistent across browsers with clean HTML5 compliance is possible with thanks to the Aloha Editor: http://aloha-editor.org/. With the addition of a very simple "static" content management system the process of creating, managing and editing static pages is very easy.

WYSIWYG: Tick

- Generation of JSON TOC to support navigation on PHP powered website is easily generated with custom XSLT2 stylesheets by first concatenating the contents of all HTML pages in order, and then scanning the H1-6 tags (whilst respecting HTML5 section/article/aside/etc)

Easy to use website: Tick

For me the final part of the puzzle has been bringing all of these things together in a way that is easy to manage. I am considering using chromiumembedded within a C# Forms application using a simple embedded HTTP server to glue all of the above together. Note: I am not writing the CMS using PHP, but rather using C# for easier access to Saxon (XSLT2 processing).

After serious consideration this seems to be the easiest approach overall (whilst a little extra initial preparation is required). Though at this stage I am not committed to this approach, I am still in the experimental phase really. I am looking for something with flexibility over visual styles (which DocBook seems to lack), whilst maintaining good semantics, whilst having both HTML and PDF output that are both consistent in style and easy to use. And hopefully far easier to edit using WYSIWYG. Whilst I do not mind manually typing XML elements around my text when writing XML comments, I can see this becoming very tedious when writing large amounts of technical documentation.

Mark Derricutt

unread,
Jul 25, 2012, 11:03:20 PM7/25/12
to java...@googlegroups.com
On 26/07/12 1:15 PM, Lea Hayes wrote:
Hi Alex

On Saturday, February 4, 2012 1:08:41 AM UTC, Alex Buckley wrote:
DocBook is good for "real" books where you want to/have to produce a
PDF, thanks to the DocBook-XSL stylesheet package. How do you produce
PDF from HTML5 source?

I have been experimenting with using DocBook / DITA and XML-safe HTML5 and my requirements are:

- Produce PDF output with both clickable TOC pages with outline/bookmarks
- Produce vanilla HTML5 output files that can be assembled by some very basic PHP scripts
- Produce JSON TOC that can be used by the very basic PHP scripts for context sensitive sidebar

When I get home I'll put a small scoffold maven project for how I do my maven->docbook->pdf+customfonts+styles projects up on github as well.

Fabrizio Giudici

unread,
Jul 26, 2012, 3:23:18 AM7/26/12
to java...@googlegroups.com, Lea Hayes
On Thu, 26 Jul 2012 03:15:37 +0200, Lea Hayes <leah...@gmail.com> wrote:

> Hi Alex

> After serious consideration this seems to be the easiest approach overall
> (whilst a little extra initial preparation is required). Though at this
> stage I am not committed to this approach, I am still in the experimental
> phase really. I am looking for something with flexibility over visual
> styles (which DocBook seems to lack), whilst maintaining good semantics,
> whilst having both HTML and PDF output that are both consistent in style
> and easy to use. And hopefully far easier to edit using WYSIWYG. Whilst I
> do not mind manually typing XML elements around my text when writing XML
> comments, I can see this becoming very tedious when writing large amounts
> of technical documentation.

Many thanks for this. Actually I'm going on holidays and, among other
things, I'd like to find a reasonable solution for the problem. I've
written a few docbook code (with a Maven plugin which embeds source
examples) but I'm tired of it because I didn't have a good experience with
any of the available editors. In the meantime, my tiny CMS is now
feature-ready and entering the beta stage, it runs all of my sites and
it's based on HTML 5, which I appreciate and I think it should be enough
for decent document writing. The idea is to have a unique platform for
writing articles, embedding code samples, both for my blog posts and
eventually being collected in book form. The missing point was conversion
to PDF, which seems to be solved by the tool you pointed to.


--
Fabrizio Giudici - Java Architect, Project Manager
Tidalwave s.a.s. - "We make Java work. Everywhere."

Jon Kiparsky

unread,
Jul 26, 2012, 8:12:51 AM7/26/12
to java...@googlegroups.com
I haven't done anything very fancy with it, but if you want a wysiwyg editor for DocBook, Oxygen is pretty good and reasonably priced, and the support is very good. I've used it to generate small documentation sets, and I like it.
As I say, though, I haven't really put it through its paces.



--
You received this message because you are subscribed to the Google Groups "Java Posse" group.
To post to this group, send email to java...@googlegroups.com.
To unsubscribe from this group, send email to javaposse+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/javaposse?hl=en.


Lea Hayes

unread,
Jul 26, 2012, 8:20:55 AM7/26/12
to java...@googlegroups.com
I have an Oxygen 10.3 license and have tried both its DocBook and DITA capabilities; but I really hate the "Author" mode; too fiddly, it'd be easier to just write the markup imo. Perhaps this has improved in new versions of Oxygen.

But in all fairness, Oxygen excels at writing XSLT2 and RelaxNG.

I found a REALLY nice DITA editing platform, except seems a little pricey to me: EasyDITA. I am not sure how they can say "$1000 a month" is affordable for the Lite version.

Josh Berry

unread,
Jul 26, 2012, 8:34:13 AM7/26/12
to java...@googlegroups.com
Just curious, what do these options get you that a LaTeX workflow
lacks? I confess that any .tex direct to .html or .docx attempt I
have ever made was less than successful. However, with proper
macros/commands, it seems that it shouldn't be too tough to get
something decent going.
>>> javaposse+...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/javaposse?hl=en.
>>>
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Java Posse" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/javaposse/-/FlrdPBOwtz4J.
>
> To post to this group, send email to java...@googlegroups.com.
> To unsubscribe from this group, send email to
> javaposse+...@googlegroups.com.

Righter

unread,
Aug 22, 2012, 11:11:14 AM8/22/12
to java...@googlegroups.com
I am trying to reduce the size of the preface material in our docs. I was thinking of a + or - button that would reveal or hide this content. I did it once with a flash code but wonder if I can use HTML5 to produce this with a simple button. Thoughts?
BTW I have used both Oxygen and XMLMind and I perfer the later.

> To unsubscribe from this group, send email to javaposse+...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages