On the unreasonable usefulness of XML

30 views
Skip to first unread message

Dilawar Singh

unread,
Jun 4, 2014, 3:18:23 PM6/4/14
to wncc...@googlegroups.com
XML looks dirty or unreadable to human eyes but extremely friendly to computers.
At places where human intervention is minimum and ugliness counts for little,
e.g. two programs exchanging data, XML can be used with ease and to great
effect.

Before I dropped out from my Ph.D. at EE, IIT Bombay, I wrote a parser to
convert verilog to equivalent XML which can be later parsed by any other program
easily. XML files are extremely easy to parse and convert to something else. Let
me give you an example where XML is widely endorsed and used: computational and
system biology/chemistry. People have their data and models but they could not
exchange them easily with each other; there was no predefined standard. So far,
biologies and chemists did not care but since computers have become as useful as
test-tubes[1], there was a need.

I had little working experience in chemistry. In biology, however, XML is ruling
the roost. Two widely used format for data and model exchange, NeuroML and SBML
are XML based language. Two large projects, OPEN-SOURCE-BRAIN and MODEL-DB
maintains repositories of models; many of them are XML based.

Good models in Biology are extremely hard to come by. XML makes it extremely
easy to reuse them. It is often joked that there are less than six people in
this world who can reuse neuron models developed by the Traub, who enjoys a
great respect and admiration of his peers (He writes his models in non-standard
Fortran and each model is written twice). Only recently they were converted to
XML and GNU-Fortran. He is definitely not very fond of all the standardization,
for modeling is an art and its not a good thing to standardize it.

XML hurts eyes of everyone but the ease with which one can parse XML and
manipulate it far outweigh its ugliness. Programmers have been dealing with XML
like "ugliness" for a very long time. As someone said "XML is just like Lisp
with < and > instead of ( and )".

XML are suitable for describing hierarchal data. Systems are nothing if not
hierarchical entities. They can also encode data. Often data and
system-definitions are separated but they also be combined. XML is generic
enough for all sort of tricks. XML does not perform well with binary data, and
its suitability for representing large data is poor. Moreover, it can also be
used as text based database. Many data-bases dumps their data in XML format. XML
may not have SQL like querying support, but in many cases XPATH is good enough.

No matter how one describes one's model, it is very important to verify/validate
one's model. We are used to compilers doing the validation job for us by
emitting errors and warnings. Tools like `lint` helps a great deal too. And in
the end, there is debugger. With XML, one writes another XML based script called
schemas (e.g. XSD schema) as a validation script. If schema is made available
then it can be used to 'validate' the model file with this schema. In a way, the
parser/validator of XML is another XML file.

When dealing with schemas, two java based tools `trang` and `xjc` are extremely
useful. They can generates a schema from a given XML file, one can tweak it
later. Bundles likes JABX and PYXB are great to generate parsers from schemas.
Python has a cute little program `generateDS.py` which creates a parser from
schema. Schemas are not only validators, they are also parser generators with
verification added.

XML enjoys great support from all mainstream programming languages. C/C++ has
libxml2 and libxst etc. though one misses XML in boost libraries. Python comes
with its own batteries (lxml, xml, etc.). Java seems to be a language made for
XML. Many tools for manipulating, edition, validating, visualizing XML are
written in JAVA. Even Haskell has caught up with XML with HaXML.

[1] http://www.nytimes.com/2013/10/10/science/three-researchers-win-nobel-prize-in-chemistry.html
and http://www.thehindu.com/todays-paper/tp-features/tp-sci-tech-and-agri/when-computer-is-as-important-as-a-test-tube/article5219503.ece

Dilawar
NCBS Bangalore

Saket Choudhary

unread,
Jun 4, 2014, 3:33:48 PM6/4/14
to wncc...@googlegroups.com
Have you looked at Beer-XML http://www.beerxml.com/ ? ;-)
Also :
1. RSS [This is more famous]
2. XSLT [A style based version of XML, used for document rendering
engines, probably GoogleDocs as well, though I am not sure]

On a side note, JSBML is participating in GSoC this year.
http://sbml.org/Software/JSBML
> --
> --
> The website for the club is http://wncc-iitb.org/
> To post to this group, send email to wncc...@googlegroups.com
> --- You received this message because you are subscribed to the Google
> Groups "Web and Coding Club IIT Bombay" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to wncc_iitb+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages