>>>>> Marko Rauhamaa <
ma...@pacujo.net> writes:
[I took the liberty of adding news:comp.infosystems.www.misc to
Newsgroups:, as the discussion would seem quite on-topic there.]
[...]
> I mean HTML only has a handful of element types, regardless of
> visuals. Compare it with (La)TeX, which allows you to define new
> element types with semantics and all. In HTML, you do it by defining
> a div and a class name and externalizing the semantics outside HTML.
> There's no <aphorism>, <adage>, <saw>, <definition>, <theorem>,
> <conjecture>, <joke>, <guess>, <lie>, <rumor> etc...
But of course you can define new HTML elements, much the same as
you do that with TeX-based systems. As for the "prior art"
example, Wayback Machine routinely uses <wb_p /> in place of
<p /> (and more) for their markup on the archived pages^1.
There are, of course, a couple drawbacks to this approach --
which, arguably, are not all that different to what one gets
with TeX.
First of all, such HTML has little chance of passing "validity"
tests^2. That said, TeX-based systems do not introduce the
concept of "validity" at all; the document is deemed "good" as
long as it renders the way it's intended. Or at the least, I'm
not aware of any "LaTeX validators" currently in wide use.
Also, while CSS makes it possible to specify the rendering^3,
the semantics remain undefined. On the TeX side, using
\def \foo \mathbf doesn't convey semantics, either -- only the
presentation. And that's (partially) covered by CSS.
And of course, CSS-wise, using such new elements is hardly any
different to using the standard "blank" <div /> and <span />
elements with a 'class' attribute. A more sensible approach is
to use some standard element (thus "inheriting" its semantics
for any third-party processors of said document) -- along with
suitable 'class' and 'role' attributes, RDFa, etc.
Now, as an aside, while I can imagine specific documents that
would benefit from the elements above, I fail to see their
utility to the Web at large. Should the search engine really
treat <joke /> any differently to <theorem />, for instance?
How <saw /> would be any different to <definition /> when
interpreted by the Web user agent? (Other than for their
presentation -- but we've got that covered with CSS, right?)
If anything, it feels like over-engineering to me, alas.
Ultimately, however, I'd like to note that the flexibility of
TeX comes from it being a full-weight programming language --
contrary to HTML and CSS, which are merely data languages.
Then, it was already noted that the modern Web ecosystem employs
both data languages, such as HTML and CSS (but also SVG, MathML,
RDFa, various "microformats", etc.), -- and JavaScript for the
programming language. And honestly, I'm not entirely sure that
comparing a data language with a programming language quite
makes sense. (So, if anything, shouldn't we rather be comparing
TeX to JavaScript here? Instead of HTML and CSS, that is.)
Hence, I claim that the power of TeX is also its weakness.
Yes, one can implement a seemingly-declarative "markup" language
in TeX (such as LaTeX), but will it be much different to
implementing such a language in JavaScript? Yes, one can
perform static analysis of a TeX document -- but will the
results /always/ be more useful than performing that same static
analysis on a pure JavaScript-based Web page^4? And no, one
does not "process" a TeX document to get a PDF -- one has to
"run" it instead. "Here be the halting problem."
Somewhat less importantly, TeX code is even less isolated (by
default) from the underlying system than JavaScript^5. One can
easily \input /etc/passwd -- or write to any file the user is
permitted to write to. And given that there're users that tend
to be wary of running arbitrary JavaScript, what should they
feel about running arbitrary TeX code?
Now, to speak of the bright side. HTML5 possesses a decent
"expressive power" and can be "specialized" as necessary by the
means of (generally ad-hoc) 'class' values and (more
standardized) "microformats", RDFa, etc. The standardization of
such elements as <article />, <nav /> and <time /> in HTML5
allows for easier extraction of the "payload" content and
metadata from the compliant documents. The inclusion of the DOM
interface specification makes it possible to provide uniform
interface to "HTML objects" across programming languages.
There're some developments (such as RASH^6) aimed at making HTML
a suitable format for authoring scientific papers in.
An even older project, MathJax^7, allows one to include quality
mathematics in HTML documents. It supports several formats for
both "input" (MathML, TeX, ASCIImath) and "output" (HTML and
CSS, SVG, MathML.) The formulae are rendered on the user's
side, which means that the user has a degree of control over the
final presentation. When the math is written in the TeX
notation, the user of a browser not implementing JavaScript, or
having it disabled, sees the unprocessed TeX -- which can be as
readable as the author manages to write it.
While the use of "client-side" JavaScript is questionable at
times, its /omnipresence/ can be regarded as an opportunity.
Frankly, I don't seem to recall there ever been a development
environment covering the computers ranging from something one
can carry on one's palm, to desktops, to supercomputers^8.
Notes
^1 Presumably to avoid possible clashes with the archived pages' own
styling.
^2 Unless, of course, the newly introduced elements become so
commonly used by other parties as to warrant inclusion into the
whatever new HTML TR W3C decides to publish.
^3 Most commonly visual, but, while frequently overlooked, I'd like
to note that CSS 2.1 offers properties to describe also the
/aural/ presentation of the document -- think of the speech
synthesers' users, for instance. I'm unaware of any similar
facility for TeX-based publishing systems.
^4
http://circuits.im/ comes to mind.
^5 The isolation the JavaScript implementations offer is also
stronger than, say, the one implemented in GhostScript (-dSAFER)
for PostScript -- that happens to be one another common
"document programming language".
^6
http://rawgit.com/essepuntato/rash/master/documentation/
RASH: Research Articles in Simplified HTML
^7
http://mathjax.org/
^8 Disclaimer: I do not advocate in favor of portable computers in
general, and even less so for any and all devices running
non-free software, or implementing cellular network protocols.
Also, I really hope that one wouldn't actually use JavaScript
for any "number crunching", but will rely on something like C
instead. That said, should I ever have to choose between
JavaScript and, say, Python -- I'd go with JavaScript, sure.