...sadly based on PDF viewing on the web:
So what is the status of alternatives to pdf for converting human
editable text, say LaTeX or BibTeX source,
for human viewing on the web?
Especially, what is the system overhead for supporting these alternatives?
I may be misunderstanding what you're looking
for, but that sounds exactly what LaTeXML is designed
for. And, it's New & Improved, easier to install!
Just released Version 0.8.0:
http://dlmf.nist.gov/LaTeXML/
-- What would it take for some agent (e.g. Planetary or you Bruce) to support a generic LaTeXML webservice?This could complement something like sharelatex (pointer in previous msg) which in my opinion provides a greatplatform for collaborative LaTeX editing. So I could prepare my LaTeX/BibTeX in sharelatex, then pipe it to the LaTeXML webservice to get HTML versions. If this could bedone, it would mitigate my concern somewhat about big programs. Or rather, that concern would morph intoconcerns about the business model and longevity of the LaTeXML service provider.
I would be very interested in your take on these issues.Coming back to Sloan/DML. Seefor a brief account of recent developments. If any on this list are interested in being involved in such efforts,please let me know. It appears there will be a conference to convene such people fairly soon. Where, when and who will be invited remain tbd. But I do have some input into this process.with best wishes to all on the list
A standard demo web service is available now (http://latexml.mathweb.org), and PlanetMath is hosting an instance too. If there's a huge computational load then we'd have to discuss more but this should work for some quick demos/tests.
curl -d 'mode=math&tex=a%2Bb%3Dc' http://latexml.mathweb.org/convert
Maybe Bruce and Deyan can give tips on more advanced features for bibliography conversions.
Integrating with ShareLaTeX for conversions in the way you've described should be really easy. (I'm hoping for tighter integration with ShareLaTeX, since waiting for PDFLaTeX to recompile the document is a bit tedious, and LaTeXML could be modified to render things on a line-by-line basis or similar, so, much faster.)
This seems related to things we've been discussing at PlanetMath recently, for example, this (not successful) application for a small grant ($21K) that we submitted to the Wikimedia Foundation.
http://meta.wikimedia.org/wiki/Grants:IEG/PlanetMath_Books_Project
We also recently uploaded a book to PlanetMath; the interface is not much to look at, http://planetmath.org/node/87534 but this is a proof-of-concept for using Planetary as a book hosting platform, as the interface can be improved. The original authors of this book wrote it on Github, and Git integration is also part of the latest set of features in Planetary (although I haven't been involved with that part). I think we have many of the relevant pieces for making *a* digital library for math, and I'd love to be involved in the discussions.
(My current job deals with AI for mathematics, so if there's are aspects of this work with clear artificial intelligence-relevance that would make my involvement easier to explain to my boss - viz. http://www.iiia.csic.es/coinvent/node/1)
I would disagree with that characterization of
the effort required to install LaTeXML. Particularly with
the new release (& hopefully upcoming Debian & Macports repos;
& with a renewed commitment to timely releases),
it can in fact be quite trivial to install --- depending
on your system, of course. But, as with most software,
it does typically require admin access to install.
OTOH, if you're content with what's already installed
on your system, and it does the job you need,
then I've probably just misunderstood your query.
It's still mainly me, but with a huge amount of support
-- Does LaTeXML remain a solo developer project? Or are there others
besides yourself who can continue to develop and maintain the code?
This seems an issue for organizational commitment to LaTeXML, such as
might occur as part of the DML initiative that Sloan foundation is
showing some interest in funding (more about this below).
from Deyan Ginev. I think it's safe to say that Deyan would be capable
of maintaining it, but I certainly can't make that commitment on
his behalf ... At any rate, LaTeXML is open-source and available
on GitHub, so anyone can contribute, submit patches, fork or ...
There's always a bootstrapping issue with user community, developers,
etc. It's clear that the long interval leading to the current release
hasn't been ideal, but we're trying to get back on track in that respect.
Nevertheless, there is a small, but enthusiastic, user community.
Primarily LaTeXML focuses on complete documents or document sets,
and deals with BibTeX files as part of that process.
By processing a dummy document like:
\begin{document}\cite{*}\bibliography{foo}\end{document}
you can, with a single command, create an html version of the bibliography,
complete with styling, math as MathML, etc. It doesn't call
latex, but acts as a replacement for latex. How that fits
into your existing pipeline, I couldn't say; it could conceivably
replace it entirely.
At the moment, LaTeXML uses a built-in bibliographic style that
you may or may not like, but supporting more bibstyle options
is on the todo list.
LaTeXML is a (set of) commandline programs, not gui,
certainly not Python, so yes, you'd make a system call.
It's been my experience that small programs, Python or otherwise,
have bugs just like large programs, and also that they can encounter
situations that they don't expect or invalid input data.
It seems to
me that they should inform the user of those situations, although they often do not.
So, I view error handling as a positive thing.
But perhaps I'm just misunderstanding your point.
The size is more of a question of how much it does,
not how often it fails. Again, if a small program adequately
handles the job you need, by all means, stick with it.
Deyan has developed code for web services that wrap
LaTeXML, so anyone can install both of those and offer
their own LaTeX to HTML (or epub or ....) service.
I guess you know that LaTeXML was developed exactly
to support http://dlmf.nist.gov/, a flavor of DML.
A core design goal was preserving and (as far as possible)
enhancing the presentation and structure of the documents,
but more importantly the semantic content, both through enhancing
the markup through extra metadata on the input side, and after processing.
So, to my mind, DML's are exactly the kind of application that LaTeXML
is intended for, and many of the issues you discuss in that report are
quite dear to my heart.
curl -d 'mode=math&tex=a%2Bb%3Dc' http://latexml.mathweb.org/convertAh, this is some progress, to have a demo like this. But what would it take to provide a real service?
ShareLaTeX has quite an interesting and possibly viable business model I think.
There is some connection, but I have the sense that funders may not be keen on this sort of thing until there is someproof of concept of high quality textbooks emerging from such efforts.
I have colleagues who expressed this opinion quite strongly. I think such issues remain significant inhibitors to participation by the best authors in collaborative textbooks.
OK. If we think of Planetary as a participant in DML then can you give me a few sentences on-- nature of organization-- contact people and their roles in the organization-- possible contributions to a DML effort as outlined in the NRC report?Also, how to clarify distinction if any between Planetary and Michael K's group which is also a natural participant?
You mean you are working for COINVENT now? Is this project actually funded? The project description gives me a somewhat surreal impression.
I think one point that should be made clear form the beginning is that the Planetary project in itself is understaffed at the moment and gets tractions from side efforts - two examples are Joe and Ray's textbook remixing project, or the work of PhD students from Michael's group (Mihnea Iancu and Constantin Jucosvschi being the main experts on that side at the moment) on a semantic glossary for mathematics and advanced semantic editing support.
So you shouldn't have the expectation, or come off with the impression, that there is a team ready and waiting to provide support and development on the "Planetary" project as such. But KWARC, DLMF and PlanetMath.org are still actively pursuing their individual interests (and intersect in Planetary, though not so actively at the moment).
A standard demo web service is available now (http://latexml.mathweb.org),
Let me jump right in, since this is my work being showcased here. This demo was first brought to life when we met at CICM in Bertinoro at 2011 and has been around since, with just a few extra examples added over the years.
A real service is simply deploying the web service on "a real server". That has been done on numerous occasions:
- It is the current production state of PlanetMath.org
- It is used in an experimental NIST project called DRMF, that continues where the DLMF leaves off
- It is used by the KWARC instances of Planetary, e.g. for our semantic glossary project.
- I have also used the API behind the service to reconvert the entirety of arXiv.org, another long-lasting project at KWARC.
- Moritz Schubotz performed an evaluation of the service on Wikipedia's home servers last year, which you can read about at: http://arxiv.org/pdf/1404.6179 .
- Respectively, has been used to convert all of Wikipedia's math on a number of occasions (by Moritz Schubotz and Jozef Misutka in the past)
In terms of recognition, note that there are now special modules that add LaTeXML web service support for Drupal, MediaWiki, and soon CPAN.
I spent over a year under Bruce's guidance at NIST, working on various production and research aspect of LaTeXML, so there has been a lot of sweat and effort invested in making the system better. The newly released 0.8 version is testament of that as well.
So, I think by now I am eager to claim the web service is not perfect, but is far from being a proof-of-concept toy.
End of my marketing spiel :-)
The first, and more fundamental/scientific, argument is that this isn't about business first, it's about the web's progress and bringing the data to the user (and their machine).
Having scientific papers (and bibliographic data) join the world of HTML (and the linked datasets of the semantic web) is long overdue.
WriteLaTeX and ShareLaTeX definitely can advance their businesses quickly by offering the EMACS experience in a browser and add a few social web bells and whistles on top, but the non-web nature of the documents they create will keep tripping them. Problematic areas are accessibility, interactivity, standards compliance and interoperability, versioning (PDF is a binary format) and remaining alien to the web (and the web of data).
There are alternatives to that approach, which embrace the web as a foundation and are waving the banner of bringing scientific publishing to the 21st century, from the ground up. A great example of a business doing this is Authorea.com, you should look them up. PlanetMath and KWARC have been pioneering in that regard as well, but from the non-profit and applied research angles.
Indeed, but which end of the candle ends up being the "simplest"?The broader problem is how do you get people to maintain
highly structured
docs like these and provide them easy ways of converting them to html.
For this
I'll settle for the simplest thing that works.
Having simple conversion software that works on simple documents,
and then fighting with authors to make sure they only
write simple documents?
versus:
fighting with complex documents in the first place
and accepting the size & complexity involved?
Factor in the fragmentation of effort you mention below,
and divide to avoid software monoculture.
So here we hit the issue. I want to display lists of all kinds of things
with essentially
the same code: articles, books, journals, people, potentially theorems,
problems, solitions, ...
So I cant work with a rigid biblio style. If I work within python
mapping text strings to html,
I can easily customize code to achieve whatever I want. It is much
harder to provide
a consistent templating language to make it possible for others to
safely do that.
Previous attempts at that, .bst files, the internals of biblatex, seem
to me like
complete failures. So does CSL. There has to be something easier than that.
I'm having to read between the lines, here. I'm guessing
that you're using BibTeX's syntax for bibliographies, but
have extended the set of artifact types and fields?
(a very tempting thing to do!)
And I'm guessing further that it isn't necessarily the case that BibTeX's
.bst language inherently can't handle those enhancements, but that
you can't quite convince it, while maintaining any vestige of sanity?
[_not_ meant as an insult to your programming abilities
--- I've _been_ there, myself!]
As for biblatex, there are some big fans on this list.
I, myself, haven't had the opportunity to try it out,
as it doesn't run on my Fedora 20 system.
So much for ease of installation! :>
At any rate, given the rather open-ended description
of your bibliographies, I obviously can't say whether
or not LaTeXML would handle them satisfactorily. Somebody'd
just have to try it. Maybe with the future improvements
to LaTeXML's bibstyle handling it could be better, maybe not.
Perhaps we get spoiled byGoogle.
Funny, how they can't manage MathML, though.
But yes, that's a programming philosophy I try to adhere to:
If something is reasonably recoverable, try to proceed,
and make the best of it, but warn the user of the problems
so they have the opportunity to fix it; Try to distinguish
levels of severity (Info, Warning, Error).
Well, I do have a nice tool and ideas about how to use it that I thinkI guess you know that LaTeXML was developed exactly
to support http://dlmf.nist.gov/, a flavor of DML.
A core design goal was preserving and (as far as possible)
enhancing the presentation and structure of the documents,
but more importantly the semantic content, both through enhancing
the markup through extra metadata on the input side, and after
processing.
So, to my mind, DML's are exactly the kind of application that LaTeXML
is intended for, and many of the issues you discuss in that report are
quite dear to my heart.
Great. I'd be very interested in which specifics you might be
interested in further involvement in.
could be very useful in such a project at some stage; the tool is admittedly
not perfect, but IMNSHO the best of its class. Of course, there are
competitors whose proponents likely feel the same way about theirs;
some may be better connected to DML. But from a quick scanning of
the NRC's report (http://www.nap.edu/openbook.php?record_id=18619,
I assume that's related to the report you posted?)
it's not clear that LaTeX=>(XML|HTML) conversion is even on their radar.
So, whether this type of contribution is seen as important at
this stage by the DML project, I can't say.
Not that I'm aware of. We certainly have a working system.
SLoan is clearly looking for participants capable of building working
systems, which you have done, as have others on this list.
I expect you would know if NIST/DLMF has received any invitation from
Sloan/Wolfram to
participate in a DML conference, right? It would be strange if not,
which I infer from your comments.
But in spite of the "Library" in DLMF's name, perhaps it isn't
considered in the same category; it does correspond only to a
single book, after all. There is a whole spate of issues
that broader DML has to deal with that we didn't: a potentially huge
collection of inhomogeneous countries, institutions, journals, licenses,
artifacts, languages...
Ah, Wolfram is a major player?
We did have an interesting conversation
with Stephen Wolfram at an early stage in the DLMF project, but there are no
collaborations. They have their functions.wolfram.com project, of which
they are rightfully proud; undoubtedly there's an element of
competition. I would hope that we are at least collegial :>
Hi Michael, I am encouraged to learn that you have been included in the invitation from Sloan/Wolfram.
I definitely think they should include Bruce/NIST as a partner. This will put on the table the issue of interoperability between NIST and Wolfram efforts with functions. I think such techno-legal issues and agreeing on formats for open exchange of canonical representations of math info should be among the most important things that could be resolved, I hope without huge effort, by a consortium of open math info partners. I'd be interested in your impressions of that issue.
All of these problems I do not see in the DLMF model of starting
with informal content, transforming to machine-readable formats,
and then enriching it with semantics in a community effort the
more suitable model. It is sowhat slower, but more
community-oriented and scalable. In particular I care very much
about point 5. above (I call that *bootstrappability of the
format*), and that seems supported in the DLMF model but broken in
Wolfram/Mathematica model.
with best wishes to all --Jim