[FYI] Open-access peer-review on top of arXiv

11 views
Skip to first unread message

Ginev, Deyan Ivelinov

unread,
May 12, 2014, 4:24:24 AM5/12/14
to planet...@googlegroups.com
...sadly based on PDF viewing on the web:

http://theoj.org/

Greetings,
Deyan

Joe Corneli

unread,
May 12, 2014, 7:38:37 AM5/12/14
to planet...@googlegroups.com
It does give something we can compare against any similar demos we can make!
> --
> You received this message because you are subscribed to the Google Groups
> "Planetary Developers Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to planetary-de...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

James W PITMAN

unread,
May 12, 2014, 10:11:16 AM5/12/14
to planet...@googlegroups.com
On Mon, May 12, 2014 at 1:24 AM, Ginev, Deyan Ivelinov <d.g...@jacobs-university.de> wrote:
...sadly based on PDF viewing on the web:

So what is the status of alternatives to pdf for converting human editable text, say LaTeX or BibTeX source,
for human viewing on the web?
Especially, what is the system overhead for supporting these alternatives?

I have been experimenting with sharelatex  https://www.sharelatex.com?r=ce133896&rm=d&rs=b
which provides a very nice general interface for collaborative editing of LaTeX/BibTeX source.
This compiles only to pdf in their service. I want better ways of piping such source docs to HTML for web viewing.

I have python code for doing this for extremely simple docs (some cvs and bibliographies), but I would
be glad to instead leverage available tools for this purpose. 

So what is out there now that I could use?

This might be a good time to revisit previous efforts by Joe and me to clean up the PlanetMath biblio data.

I have much better tools on my side for handling biblios of this size (about 3 or 4K items as I recall)
I  am extremely interested in the management issue of where/how such biblio data should be stored and managed
in order to 
1) keep it machine readable
2)  provide instant gratification to editors and authors by nice human-viewable text on the web with lots of useful links
Traditional methods of supporting 2) involve use of .bst or similar (e.g. biblatex) style files which are arcane and 
hard to support.  I am shortcutting this by using python to map directly to HTML. But I expect I could benefit from
what you guys have learned about how to do this. Any reliable LaTeX/BibTeX to HTML
service would be of great interest to me. 

--Jim

Bruce Miller

unread,
May 12, 2014, 12:00:15 PM5/12/14
to planet...@googlegroups.com
On 05/12/2014 10:11 AM, James W PITMAN wrote:
> On Mon, May 12, 2014 at 1:24 AM, Ginev, Deyan Ivelinov
> <d.g...@jacobs-university.de <mailto:d.g...@jacobs-university.de>> wrote:
>
> ...sadly based on PDF viewing on the web:
>
>
> So what is the status of alternatives to pdf for converting human
> editable text, say LaTeX or BibTeX source,
> for human viewing on the web?
> Especially, what is the system overhead for supporting these alternatives?

I may be misunderstanding what you're looking
for, but that sounds exactly what LaTeXML is designed
for. And, it's New & Improved, easier to install!
Just released Version 0.8.0:
http://dlmf.nist.gov/LaTeXML/

bruce

> I have been experimenting with sharelatex
> https://www.sharelatex.com?r=ce133896&rm=d&rs=b
> <https://www.sharelatex.com/?r=ce133896&rm=d&rs=b>
> --
> You received this message because you are subscribed to the Google
> Groups "Planetary Developers Mailing List" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to planetary-de...@googlegroups.com
> <mailto:planetary-de...@googlegroups.com>.

James W PITMAN

unread,
May 12, 2014, 5:12:54 PM5/12/14
to planet...@googlegroups.com
Bruce, about your response to: 
So what is the status of alternatives to pdf for converting human
editable text, say LaTeX or BibTeX source,
for human viewing on the web?
Especially, what is the system overhead for supporting these alternatives?

I may be misunderstanding what you're looking
for, but that sounds exactly what LaTeXML is designed
for.  And, it's New & Improved, easier to install!
Just released Version 0.8.0:
  http://dlmf.nist.gov/LaTeXML/
 
I'm very glad to see that the  LaTeXML effort is still going strong and there is a new release.
 Following are some general concerns/questions about LaTeXML.

-- installation appears to require some substantial amount of system knowledge. As an amateur programmer and professional mathematician I dont think I have enough  system knowledge (and control over the systems I work on) to do it myself. So I become reliant on system staff to install and maintain something big for me.  This is in contrast to e.g. LaTeX/BibTeX/Python/... which are available to nearly all mathematicians without  special system support.

-- Does LaTeXML  remain a solo developer project? Or are there others besides  yourself who can continue to develop and maintain the code?  This seems an issue for organizational commitment to LaTeXML, such as might occur as part of the DML initiative that Sloan foundation is showing some interest in funding (more about this below).
 
-- It is unclear to me how easily I could use LaTeXML on the fly as part of a pipeline from BibTeX (or better,
logical equivalent like BibJSON) to LaTeX to HTML.  Right now I manage BibTeX to HTML directly in python
without calling LaTeX. But LaTeX might pleasant to use as a templating language for styling of displays, as it would be easily understood with minimal documentation by any mathematician.
My question is how easy it would be to manage the invoking of LaTeX templates and subsequent LaTeXML processing from inside a python script. Presumably I would have to call the shell to invoke perl to run LaTeXML
.... and now there are a lot of things in the pipe which might break and oblige me to do various levels of error 
handling .... This is why I am fearful of big programs.

-- What would it take for some agent (e.g. Planetary or you Bruce) to support a generic  LaTeXML webservice?
This could complement something like sharelatex (pointer in previous msg) which in my opinion provides a great
platform for collaborative LaTeX editing.  So I could prepare my LaTeX/BibTeX in sharelatex, then pipe it to the LaTeXML webservice to get HTML versions.  If this could be
done, it would  mitigate my concern somewhat about big programs. Or rather, that concern would morph into 
 concerns about the business model and longevity of the LaTeXML service provider.

I would be very interested in your take on these issues.

Coming back to Sloan/DML. See 
for a brief account of recent developments. If any on this list are interested in being involved in such efforts, 
please let me know. It appears there will be a conference to convene such people fairly soon. Where, when and who will be invited remain tbd.  But I do have some input into this process.

with best wishes to all on the list

--Jim



Joe Corneli

unread,
May 12, 2014, 8:08:16 PM5/12/14
to planet...@googlegroups.com
Hi Jim:

-- What would it take for some agent (e.g. Planetary or you Bruce) to support a generic  LaTeXML webservice?
This could complement something like sharelatex (pointer in previous msg) which in my opinion provides a great
platform for collaborative LaTeX editing.  So I could prepare my LaTeX/BibTeX in sharelatex, then pipe it to the LaTeXML webservice to get HTML versions.  If this could be
done, it would  mitigate my concern somewhat about big programs. Or rather, that concern would morph into 
 concerns about the business model and longevity of the LaTeXML service provider.

A standard demo web service is available now (http://latexml.mathweb.org), and PlanetMath is hosting an instance too.  If there's a huge computational load then we'd have to discuss more but this should work for some quick demos/tests.

 curl -d 'mode=math&tex=a%2Bb%3Dc' http://latexml.mathweb.org/convert

Maybe Bruce and Deyan can give tips on more advanced features for bibliography conversions.

Integrating with ShareLaTeX for conversions in the way you've described should be really easy.  (I'm hoping for tighter integration with ShareLaTeX, since waiting for PDFLaTeX to recompile the document is a bit tedious, and LaTeXML could be modified to render things on a line-by-line basis or similar, so, much faster.)

I would be very interested in your take on these issues.

Coming back to Sloan/DML. See 
for a brief account of recent developments. If any on this list are interested in being involved in such efforts, 
please let me know. It appears there will be a conference to convene such people fairly soon. Where, when and who will be invited remain tbd.  But I do have some input into this process.

with best wishes to all on the list

 Thanks Jim!

This seems related to things we've been discussing at PlanetMath recently, for example, this (not successful) application for a small grant ($21K) that we submitted to the Wikimedia Foundation.

http://meta.wikimedia.org/wiki/Grants:IEG/PlanetMath_Books_Project

We also recently uploaded a book to PlanetMath; the interface is not much to look at, http://planetmath.org/node/87534 but this is a proof-of-concept for using Planetary as a book hosting platform, as the interface can be improved.  The original authors of this book wrote it on Github, and Git integration is also part of the latest set of features in Planetary (although I haven't been involved with that part).  I think we have many of the relevant pieces for making *a* digital library for math, and I'd love to be involved in the discussions.

(My current job deals with AI for mathematics, so if there's are aspects of this work with clear artificial intelligence-relevance that would make my involvement easier to explain to my boss - viz. http://www.iiia.csic.es/coinvent/node/1)

With best wishes to you as well -

Joe

Bruce Miller

unread,
May 13, 2014, 11:29:26 AM5/13/14
to planet...@googlegroups.com
On 05/12/2014 05:12 PM, James W PITMAN wrote:
> Bruce, about your response to:
>
> So what is the status of alternatives to pdf for converting human
> editable text, say LaTeX or BibTeX source,
> for human viewing on the web?
> Especially, what is the system overhead for supporting these
> alternatives?
>
>
> I may be misunderstanding what you're looking
> for, but that sounds exactly what LaTeXML is designed
> for. And, it's New & Improved, easier to install!
> Just released Version 0.8.0:
> http://dlmf.nist.gov/LaTeXML/
>
> I'm very glad to see that the LaTeXML effort is still going strong and
> there is a new release.
> Following are some general concerns/questions about LaTeXML.
>
> -- installation appears to require some substantial amount of system
> knowledge. As an amateur programmer and professional mathematician I
> dont think I have enough system knowledge (and control over the systems
> I work on) to do it myself. So I become reliant on system staff to
> install and maintain something big for me. This is in contrast to e.g.
> LaTeX/BibTeX/Python/... which are available to nearly all mathematicians
> without special system support.

I would disagree with that characterization of
the effort required to install LaTeXML. Particularly with
the new release (& hopefully upcoming Debian & Macports repos;
& with a renewed commitment to timely releases),
it can in fact be quite trivial to install --- depending
on your system, of course. But, as with most software,
it does typically require admin access to install.

OTOH, if you're content with what's already installed
on your system, and it does the job you need,
then I've probably just misunderstood your query.

> -- Does LaTeXML remain a solo developer project? Or are there others
> besides yourself who can continue to develop and maintain the code?
> This seems an issue for organizational commitment to LaTeXML, such as
> might occur as part of the DML initiative that Sloan foundation is
> showing some interest in funding (more about this below).

It's still mainly me, but with a huge amount of support
from Deyan Ginev. I think it's safe to say that Deyan would be capable
of maintaining it, but I certainly can't make that commitment on
his behalf ... At any rate, LaTeXML is open-source and available
on GitHub, so anyone can contribute, submit patches, fork or ...

There's always a bootstrapping issue with user community, developers,
etc. It's clear that the long interval leading to the current release
hasn't been ideal, but we're trying to get back on track in that respect.
Nevertheless, there is a small, but enthusiastic, user community.

> -- It is unclear to me how easily I could use LaTeXML on the fly as part
> of a pipeline from BibTeX (or better,
> logical equivalent like BibJSON) to LaTeX to HTML. Right now I manage
> BibTeX to HTML directly in python
> without calling LaTeX. But LaTeX might pleasant to use as a templating
> language for styling of displays, as it would be easily understood with
> minimal documentation by any mathematician.

Primarily LaTeXML focuses on complete documents or document sets,
and deals with BibTeX files as part of that process.
By processing a dummy document like:
\begin{document}\cite{*}\bibliography{foo}\end{document}
you can, with a single command, create an html version of the bibliography,
complete with styling, math as MathML, etc. It doesn't call
latex, but acts as a replacement for latex. How that fits
into your existing pipeline, I couldn't say; it could conceivably
replace it entirely.

At the moment, LaTeXML uses a built-in bibliographic style that
you may or may not like, but supporting more bibstyle options
is on the todo list.

> My question is how easy it would be to manage the invoking of LaTeX
> templates and subsequent LaTeXML processing from inside a python script.
> Presumably I would have to call the shell to invoke perl to run LaTeXML
> .... and now there are a lot of things in the pipe which might break and
> oblige me to do various levels of error
> handling .... This is why I am fearful of big programs.

LaTeXML is a (set of) commandline programs, not gui,
certainly not Python, so yes, you'd make a system call.

It's been my experience that small programs, Python or otherwise,
have bugs just like large programs, and also that they can encounter
situations that they don't expect or invalid input data. It seems to
me that they should inform the user of those situations, although they
often do not. So, I view error handling as a positive thing.
But perhaps I'm just misunderstanding your point.

The size is more of a question of how much it does,
not how often it fails. Again, if a small program adequately
handles the job you need, by all means, stick with it.

> -- What would it take for some agent (e.g. Planetary or you Bruce) to
> support a generic LaTeXML webservice?
> This could complement something like sharelatex (pointer in previous
> msg) which in my opinion provides a great
> platform for collaborative LaTeX editing. So I could prepare my
> LaTeX/BibTeX in sharelatex, then pipe it to the LaTeXML webservice to
> get HTML versions. If this could be
> done, it would mitigate my concern somewhat about big programs. Or
> rather, that concern would morph into
> concerns about the business model and longevity of the LaTeXML service
> provider.

As Joe has pointed out; such services already exist,
at least as demonstrators, or proof of concept.
Deyan has developed code for web services that wrap
LaTeXML, so anyone can install both of those and offer
their own LaTeX to HTML (or epub or ....) service.

> I would be very interested in your take on these issues.
>
> Coming back to Sloan/DML. See
> http://stat-www.berkeley.edu/users/pitman/publications/planning_wdml.pdf
> for a brief account of recent developments. If any on this list are
> interested in being involved in such efforts,
> please let me know. It appears there will be a conference to convene
> such people fairly soon. Where, when and who will be invited remain tbd.
> But I do have some input into this process.

I guess you know that LaTeXML was developed exactly
to support http://dlmf.nist.gov/, a flavor of DML.
A core design goal was preserving and (as far as possible)
enhancing the presentation and structure of the documents,
but more importantly the semantic content, both through enhancing
the markup through extra metadata on the input side, and after processing.
So, to my mind, DML's are exactly the kind of application that LaTeXML
is intended for, and many of the issues you discuss in that report are
quite dear to my heart.

> with best wishes to all on the list

Thanks for your comments;
bruce

James W PITMAN

unread,
May 13, 2014, 4:58:37 PM5/13/14
to planet...@googlegroups.com
Joe, many thanks for detailed reply.

A standard demo web service is available now (http://latexml.mathweb.org), and PlanetMath is hosting an instance too.  If there's a huge computational load then we'd have to discuss more but this should work for some quick demos/tests.

 curl -d 'mode=math&tex=a%2Bb%3Dc' http://latexml.mathweb.org/convert

Ah, this is some progress, to have a demo like this. But what would it take to provide a real service? 

Maybe Bruce and Deyan can give tips on more advanced features for bibliography conversions.
I'd be interested in that. I see something from Bruce in next message.  

Integrating with ShareLaTeX for conversions in the way you've described should be really easy.  (I'm hoping for tighter integration with ShareLaTeX, since waiting for PDFLaTeX to recompile the document is a bit tedious, and LaTeXML could be modified to render things on a line-by-line basis or similar, so, much faster.)
Well, perhaps not so easy.  I talked to the ShareLaTeX developer James Allen who said he thought enhancing ShareLaTeX to
provide HTML export would be really hard and was not on their present roadmap. I think if data could be pushed from ShareLaTeX
to your service that could be great, but what would be your business model? ShareLaTeX has quite an interesting and
possibly viable business model I think.


This seems related to things we've been discussing at PlanetMath recently, for example, this (not successful) application for a small grant ($21K) that we submitted to the Wikimedia Foundation.

http://meta.wikimedia.org/wiki/Grants:IEG/PlanetMath_Books_Project

There is some connection, but I have the sense that funders may not be keen on this sort of thing until there is some
proof of concept of high quality textbooks emerging from such efforts. There is quite a lot of interest in this in California
see e.g. http://als.csuprojects.org/free-etextbooks which has a number of further links.
I remain unclear on why one should expect a quality product to emerge from collaborative textbook efforts. Textbook
writing requires a lot of coherent effort and organization. People who make that effort typically not wish to see their product
dismantled and put back together in ways they dont like or think are wrong. I have colleagues who expressed this opinion
quite strongly. I think such issues remain significant inhibitors to participation by the best authors in collaborative textbooks.


We also recently uploaded a book to PlanetMath; the interface is not much to look at, http://planetmath.org/node/87534 but this is a proof-of-concept for using Planetary as a book hosting platform, as the interface can be improved.  The original authors of this book wrote it on Github, and Git integration is also part of the latest set of features in Planetary (although I haven't been involved with that part).  I think we have many of the relevant pieces for making *a* digital library for math, and I'd love to be involved in the discussions.
 
OK. If we think of Planetary as a participant in DML then can you give me a few sentences on
-- nature of organization
-- contact people and their roles in the organization
-- possible contributions to a DML effort as outlined in the NRC report?
Also, how to clarify distinction if any between Planetary and Michael K's group which is also a natural participant?


(My current job deals with AI for mathematics, so if there's are aspects of this work with clear artificial intelligence-relevance that would make my involvement easier to explain to my boss - viz. http://www.iiia.csic.es/coinvent/node/1)

You mean you are working for COINVENT now? Is this project actually funded? The project description
 gives me a somewhat surreal impression.

many thanks for response!
--Jim

James W PITMAN

unread,
May 13, 2014, 5:39:54 PM5/13/14
to planet...@googlegroups.com
OK, continuing to Bruce's response. Many thanks for that Bruce, your detailed
comments much appreciated.

I would disagree with that characterization of
the effort required to install LaTeXML.  Particularly with
the new release (& hopefully upcoming Debian & Macports repos;
& with a renewed commitment to timely releases),
it can in fact be quite trivial to install --- depending
on your system, of course.  But, as with most software,
it does typically require admin access to install.
 
Understood. Let me see what support I can find for that.

OTOH, if you're content with what's already installed
on your system, and it does the job you need,
then I've probably just misunderstood your query.

No, I am generally interested in the state of the art in LaTeX to XML conversion,
especially easy, low overhead conversion for very simple LaTeX docs like cvs
and biblios, of which there are a huge number readily available, and a need to
provide flexible on-the-fly renderings of dynamically generated pages derived from such
LaTeX docs.  The broader  problem is how do you get people to maintain highly structured
docs like these and provide them easy ways of converting them to html. For this
I'll settle for the simplest thing that works.

-- Does LaTeXML  remain a solo developer project? Or are there others
besides  yourself who can continue to develop and maintain the code?
  This seems an issue for organizational commitment to LaTeXML, such as
might occur as part of the DML initiative that Sloan foundation is
showing some interest in funding (more about this below).

It's still mainly me, but with a huge amount of support
from Deyan Ginev.  I think it's safe to say that Deyan would be capable
of maintaining it, but I certainly can't make that commitment on
his behalf ...  At any rate, LaTeXML is open-source and available
on GitHub, so anyone can contribute, submit patches, fork or ...
Great that Deyan is seriously involved. 

There's always a bootstrapping issue with user community, developers,
etc. It's clear that the long interval leading to the current release
hasn't been ideal, but we're trying to get back on track in that respect.
Nevertheless, there is a small, but enthusiastic, user community.
Good to hear. 


Primarily LaTeXML focuses on complete documents or document sets,
and deals with BibTeX files as part of that process.
By processing a dummy document like:
  \begin{document}\cite{*}\bibliography{foo}\end{document}
you can, with a single command, create an html version of the bibliography,
complete with styling, math as MathML, etc.  It doesn't call
latex, but acts as a replacement for latex.  How that fits
into your existing pipeline, I couldn't say; it could conceivably
replace it entirely.

At the moment, LaTeXML uses a built-in bibliographic style that
you may or may not like, but supporting more bibstyle options
is on the todo list.
So here we hit the issue. I want to display lists of all kinds of things with essentially
the same code: articles, books, journals, people, potentially theorems, problems, solitions, ...
So I cant work with a rigid biblio style. If I work within python mapping text strings to html,
I can easily customize code to achieve whatever I want. It is much harder to provide
a consistent templating language to make it possible for others to safely do that.
Previous attempts at that, .bst files, the internals of biblatex, seem to me like
complete failures. So does CSL. There has to be something easier than that. 

LaTeXML is a (set of) commandline programs, not gui,
certainly not Python, so yes, you'd make a system call.

This is a big cost due to reliance on other systems error handling, which has to have some big benefit  
to be worthwhile i think.

It's been my experience that small programs, Python or otherwise,
have bugs just like large programs, and also that they can encounter
situations that they don't expect or invalid input data.
Yes. 
It seems to
me that they should inform the user of those situations, although they often do not.
In some contexts yes. But in other contexts, like making a best effort to display dirty
biblio data, like Google Scholar does tremendously well, there is really no obvious place for error reporting
to the viewer. 
So this is some facility that ideally needs to be able to be switched on and off or otherwise controlled
by the user. Unfortunately such issues greatly complicate user interfaces.
 
So, I view error handling as a positive thing.
But perhaps I'm just misunderstanding your point.
I agree that error handling is a positive thing provided errors can be handled gracefully and trivial format errors to 
not block rendering of otherwise perfectly good data. Humans are very good at overlooking trivial flaws, machines
much less so, though Google does awfully well. Perhaps we get spoiled by Google.

The size is more of a question of how much it does,
not how often it fails. Again, if a small program adequately
handles the job you need, by all means, stick with it.
Well, then we get the problem of fragmentation of developer  communities, which is unfortunate.
Worse, I am more like a community of one, with only sporadic interest from others on specific data conversion
projects. 


Deyan has developed code for web services that wrap
LaTeXML, so anyone can install both of those and offer
their own LaTeX to HTML (or epub or ....) service.
Great to hear, I should try out. 

I guess you know that LaTeXML was developed exactly
to support http://dlmf.nist.gov/, a flavor of DML.
A core design goal was preserving and (as far as possible)
enhancing the presentation and structure of the documents,
but more importantly the semantic content, both through enhancing
the markup through extra metadata on the input side, and after processing.
So, to my mind, DML's are exactly the kind of application that LaTeXML
is intended for, and many of the issues you discuss in that report are
quite dear to my heart.
 
Great.  I'd be very interested in which specifics  you might be interested in further
involvement in.  SLoan is clearly looking for participants capable of building working
systems, which you have done, as have others on this list.
I expect you would know if NIST/DLMF has received any invitation from Sloan/Wolfram to
participate in a DML conference, right? It would be strange if not, which I infer from your comments.
The  process of selection of participants in ongoing DML effort is unfortunately opaque.
But if I have names/organizations/potential_contributions I can suggest them to the organizers.
One question: is there any history of cooperation or lack thereof between NIST/DLMF and Wolfram?
Your thoughts on that?  I think for DML to move forward with Wolfram involved there has to be
some major change in Wolfram attitude to support of open data/code.
--Jim

Joe Corneli

unread,
May 13, 2014, 6:31:30 PM5/13/14
to planet...@googlegroups.com
On Tue, May 13, 2014 at 9:58 PM, James W PITMAN <pit...@stat.berkeley.edu> wrote:
 curl -d 'mode=math&tex=a%2Bb%3Dc' http://latexml.mathweb.org/convert

Ah, this is some progress, to have a demo like this. But what would it take to provide a real service? 

Well, we're using the web service for all the conversions on PlanetMath - it seems pretty robust!

https://github.com/KWARC/planetary/blob/master/sites/all/modules/drutexml/drutexml.module#L451
 
I'd like to do some improvements on the downstream client side but I'm really happy with the service.  I think Deyan uses the same service for his batch conversions of ArXiv data - the most recent conversion ran on a cluster in record time.  If we needed to do conversions on that scale continuously, then we'd need some dedicated hardware.  But I think software is ready to use now (with the caveat, again, that I'm not sure how to use LaTeXML's advanced bibliography features).

ShareLaTeX has quite an interesting and possibly viable business model I think.

I don't think that there's a business model for conversion alone, unless it's just a fee for "bulk data processing" - given that all of the software is open source, I'm more interested in using ShareLaTeX or some similar editor as part of PlanetMath.  For PM we've discussed various business models, including some "freemiums" not listed here (like blogs):

https://github.com/holtzermann17/planetmath-docs/wiki/Business-Models


There is some connection, but I have the sense that funders may not be keen on this sort of thing until there is some
proof of concept of high quality textbooks emerging from such efforts.

I think the HoTT book is quite good - I certainly hear about it from a lot of folks; I haven't read it in detail, but I know there is a discussion group in NYC that's working their way through it. http://homotopytypetheory.org/book/

I think the key to success for this book has been a relatively small and dedicated group of expert co-authors.  It's not *so* different from a traditional monograph, really.

There was a related experiment in Finland: http://creativecommons.org/weblog/entry/34643

The CC-By-SA license allows subsequent experiments to develop - the PM experiment for instance was of interest to the authors when they heard about it.

Apparently 20% of French schools are using CC-By-SA books,
http://meta.wikimedia.org/wiki/Grants_talk:IEG/PlanetMath_Books_Project#I_have_some_CC_by_SA_French_textbooks
  
I have colleagues who expressed this opinion quite strongly. I think such issues remain significant inhibitors to participation by the best authors in collaborative textbooks.

Well, it seems there are at least a few authors who do like writing CC-By-SA books - and it seems that in both California and France there's some pull from the consumer side.
http://bzg.fr/vers-un-dispositif-dincitation-a-la-creation-de-manuels-scolaires-libres.html

 
OK. If we think of Planetary as a participant in DML then can you give me a few sentences on
-- nature of organization
-- contact people and their roles in the organization
-- possible contributions to a DML effort as outlined in the NRC report?
Also, how to clarify distinction if any between Planetary and Michael K's group which is also a natural participant?

I'd say it's best to bring in PlanetMath as the org, and Planetary is a "joint venture" between PlanetMath and Michael Kohlhase's group (KWARC) at Jacobs.

For PlanetMath:

Joe Corneli PhD* (computing) - member of the board of directors, holtze...@gmail.com
Ray Puzio PhD (physics) - operations manager, r...@novres.org

*: should be awarded by August...

"PlanetMath is one of the first mathematics digital libraries - online since 2001, and best know for its collaboratively created mathematics encyclopedia.  PlanetMath is currently working to make it easy to produce mathematical textbooks from pre-existing, freely available content.  We are collaborators in the development of the Planetary platform, a mathematics-ready CMS based on Drupal that is in use on PlanetMath.org since February, 2013, and that has also been used as a course portal and a next-generation authoring system (on MathHub.info).  Project leads are involved with research on mathematical hypertext and knowledge-rich approaches in mathematical AI."

You mean you are working for COINVENT now? Is this project actually funded? The project description gives me a somewhat surreal impression.

Indeed... but yes, it is running now!  I'm in the last days of polishing up my PhD thesis before finally submitting so I'll be more focused on the COINVENT stuff after that.  I do think that the project is a bit confusing; my specific remit within the project is to study "social creativity in mathematics from a computational point of view" - which, luckily, has something to do with what I've been looking at in my research on PlanetMath.

Joe

James W PITMAN

unread,
May 14, 2014, 10:31:57 AM5/14/14
to Deyan Ginev, planet...@googlegroups.com
Hi Deyan, many thanks for response.


I think one point that should be made clear form the beginning is that the Planetary project in itself is understaffed at the moment and gets tractions from side efforts - two examples are Joe and Ray's textbook remixing project, or the work of PhD students from Michael's group (Mihnea Iancu and Constantin Jucosvschi being the main experts on that side at the moment) on a semantic glossary for mathematics and advanced semantic editing support.
Understood. I remain interested in these activities, and there is some possibility I think of funding from Sloan Foundation especially for the semantic glossary of math. I would like to learn more about that. 
Is the glossary data open? If so where can I see it?
We spoke about a "MathWordNet" and I believe there are efforts in this direction supported by zbMATH. Is that the
same as Michael's group's effort? Wolfram is also interested in this.  I think it would not take much to push Sloan to 
fund a large scale semantic glossary effort with distributed partners and editorial control.


So you shouldn't have the expectation, or come off with the impression, that there is a team ready and waiting to provide support and development on the "Planetary" project as such. But KWARC, DLMF and PlanetMath.org are still actively pursuing their individual interests (and intersect in Planetary, though not so actively at the moment).
Also understood.

A standard demo web service is available now (http://latexml.mathweb.org), 

Let me jump right in, since this is my work being showcased here. This demo was first brought to life when we met at CICM in Bertinoro at 2011 and has been around since, with just a few extra examples added over the years.

A real service is simply deploying the web service on "a real server". That has been done on numerous occasions:
 - It is the current production state of PlanetMath.org
 - It is used in an experimental NIST project called DRMF, that continues where the DLMF leaves off
 - It is used by the KWARC instances of Planetary, e.g. for our semantic glossary project.
 - I have also used the API behind the service to reconvert the entirety of arXiv.org, another long-lasting project at KWARC.
 - Moritz Schubotz performed an evaluation of the service on Wikipedia's home servers last year, which you can read about at: http://arxiv.org/pdf/1404.6179 .
 - Respectively, has been used to convert all of Wikipedia's math on a number of occasions (by Moritz Schubotz and Jozef Misutka in the past)

In terms of recognition, note that there are now special modules that add LaTeXML web service support for Drupal, MediaWiki, and soon CPAN.

I spent over a year under Bruce's guidance at NIST, working on various production and research aspect of LaTeXML, so there has been a lot of sweat and effort invested in making the system better. The newly released 0.8 version is testament of that as well.

So, I think by now I am eager to claim the web service is not perfect, but is far from being a proof-of-concept toy.

End of my marketing spiel :-)

This is all great, but I am not clear yet how if at all it assists my purpose,  to make it easy for people with modest technical
capability to create and publish machine-readable mathematical documents, especially bibliographies, in ways which
make the data readily aggregateable and reusable.  Take for example my personal bibliography which is posted in
BibTeX format with various calling .tex files at https://www.sharelatex.com/templates/5371208e47c639da170afa8c/v/0/zip
I used biblatex to make a simple display by year with abstracts. This is the file 0bibserver_years.tex which 
calls 0bibserver.bib  I wrote the .tex by a python script from the .bib (like all the individual files). How can I get 
something like this to display with LaTeXML?  Why would that be any better than if I did it by python scripting
with MathJaX?
I tried uploading the .zip to your latexml service, but it did not recognize the call to biblatex. 
So I am left wondering what to do.  I see two options
1) I offer people my customizable python scripts for mapping BibTeX to HTML. (These work pretty well: see e.g.
2) I figure out how to get comparable functionality by leveraging your LaTeXML or other available converter.
Before committing further to 1) I thought I would at least take a look at 2),
This is why I proposed revisiting the planetmath biblio dataset. But as there seems to be no interest or resources for
focussing on that,  after this exchange I am back to thinking that 1) may still be the best option for me to continue
pursuing.


The first, and more fundamental/scientific, argument is that this isn't about business first, it's about the web's progress and bringing the data to the user (and their machine).
I agree in principle, but in practice we have to find resources to commit to tasks of this kind. Which means people's time,
and they need to be supported somehow. So it comes down to a business issue, whether that business is a university
or a scholarly society or a private company.
 
Having scientific papers (and bibliographic data) join the world of HTML (and the linked datasets of the semantic web) is long overdue.
Yes!  

WriteLaTeX and ShareLaTeX definitely can advance their businesses quickly by offering the EMACS experience in a browser and add a few social web bells and whistles on top, but the non-web nature of the documents they create will keep tripping them. Problematic areas are accessibility, interactivity, standards compliance and interoperability, versioning (PDF is a binary format) and remaining alien to the web (and the web of data).
All agreed. 

There are alternatives to that approach, which embrace the web as a foundation and are waving the banner of bringing scientific publishing to the 21st century, from the ground up. A great example of a business doing this is Authorea.com, you should look them up. PlanetMath and KWARC have been pioneering in that regard as well, but from the non-profit and applied research angles.
Thanks for pointer to Authorea. I just hate registering with yet another service to see if they do anything useful for me.
I did find ShareLaTeX immediately rewarding in terms of UI and ease of collaboration.

 I agree about all the limitations you point out above.  But authors need something like the functionality of ShareLeTeX
to get them freely authoring well-structured documents, and then the capability to pipe those documents to other
services which expose them to linked datasets and semweb. Maybe that's provided already by Authorea?

many thanks again for responses!

--Jim

 

Bruce Miller

unread,
May 14, 2014, 4:21:15 PM5/14/14
to planet...@googlegroups.com
On 05/13/2014 05:39 PM, James W PITMAN wrote:
> OTOH, if you're content with what's already installed
> on your system, and it does the job you need,
> then I've probably just misunderstood your query.
>
>
> No, I am generally interested in the state of the art in LaTeX to XML
> conversion,
> especially easy, low overhead conversion for very simple LaTeX docs like cvs
> and biblios, of which there are a huge number readily available, and a
> need to
> provide flexible on-the-fly renderings of dynamically generated pages
> derived from such
> LaTeX docs. The broader problem is how do you get people to maintain
> highly structured
> docs like these and provide them easy ways of converting them to html.
> For this
> I'll settle for the simplest thing that works.

Indeed, but which end of the candle ends up being the "simplest"?
Having simple conversion software that works on simple documents,
and then fighting with authors to make sure they only
write simple documents?
versus:
fighting with complex documents in the first place
and accepting the size & complexity involved?

Factor in the fragmentation of effort you mention below,
and divide to avoid software monoculture.

[...]

> Primarily LaTeXML focuses on complete documents or document sets,
> and deals with BibTeX files as part of that process.
> By processing a dummy document like:
> \begin{document}\cite{*}\__bibliography{foo}\end{__document}
> you can, with a single command, create an html version of the
> bibliography,
> complete with styling, math as MathML, etc. It doesn't call
> latex, but acts as a replacement for latex. How that fits
> into your existing pipeline, I couldn't say; it could conceivably
> replace it entirely.
>
> At the moment, LaTeXML uses a built-in bibliographic style that
> you may or may not like, but supporting more bibstyle options
> is on the todo list.
>
> So here we hit the issue. I want to display lists of all kinds of things
> with essentially
> the same code: articles, books, journals, people, potentially theorems,
> problems, solitions, ...
> So I cant work with a rigid biblio style. If I work within python
> mapping text strings to html,
> I can easily customize code to achieve whatever I want. It is much
> harder to provide
> a consistent templating language to make it possible for others to
> safely do that.
> Previous attempts at that, .bst files, the internals of biblatex, seem
> to me like
> complete failures. So does CSL. There has to be something easier than that.

I'm having to read between the lines, here. I'm guessing
that you're using BibTeX's syntax for bibliographies, but
have extended the set of artifact types and fields?
(a very tempting thing to do!)
And I'm guessing further that it isn't necessarily the case that BibTeX's
.bst language inherently can't handle those enhancements, but that
you can't quite convince it, while maintaining any vestige of sanity?
[_not_ meant as an insult to your programming abilities
--- I've _been_ there, myself!]

As for biblatex, there are some big fans on this list.
I, myself, haven't had the opportunity to try it out,
as it doesn't run on my Fedora 20 system.
So much for ease of installation! :>

At any rate, given the rather open-ended description
of your bibliographies, I obviously can't say whether
or not LaTeXML would handle them satisfactorily. Somebody'd
just have to try it. Maybe with the future improvements
to LaTeXML's bibstyle handling it could be better, maybe not.

> LaTeXML is a (set of) commandline programs, not gui,
> certainly not Python, so yes, you'd make a system call.
>
>
> This is a big cost due to reliance on other systems error handling,
> which has to have some big benefit
> to be worthwhile i think.

Indeed, integrating a new component into an existing
workflow is almost never drop-in & go, so it has
to provide enough improvement to be worth it.

> It's been my experience that small programs, Python or otherwise,
> have bugs just like large programs, and also that they can encounter
> situations that they don't expect or invalid input data.
>
> Yes.
>
> It seems to
> me that they should inform the user of those situations, although
> they often do not.
>
> In some contexts yes. But in other contexts, like making a best effort
> to display dirty
> biblio data, like Google Scholar does tremendously well, there is really
> no obvious place for error reporting
> to the viewer.
> So this is some facility that ideally needs to be able to be switched on
> and off or otherwise controlled
> by the user. Unfortunately such issues greatly complicate user interfaces.
>
> So, I view error handling as a positive thing.
> But perhaps I'm just misunderstanding your point.
>
> I agree that error handling is a positive thing provided errors can be
> handled gracefully and trivial format errors to
> not block rendering of otherwise perfectly good data. Humans are very
> good at overlooking trivial flaws, machines
> much less so, though Google does awfully well. Perhaps we get spoiled by
> Google.

Funny, how they can't manage MathML, though.

But yes, that's a programming philosophy I try to adhere to:
If something is reasonably recoverable, try to proceed,
and make the best of it, but warn the user of the problems
so they have the opportunity to fix it; Try to distinguish
levels of severity (Info, Warning, Error).
Of course, between different optima and the vagaries of TeX,
I don't always reach that ideal.

> I guess you know that LaTeXML was developed exactly
> to support http://dlmf.nist.gov/, a flavor of DML.
> A core design goal was preserving and (as far as possible)
> enhancing the presentation and structure of the documents,
> but more importantly the semantic content, both through enhancing
> the markup through extra metadata on the input side, and after
> processing.
> So, to my mind, DML's are exactly the kind of application that LaTeXML
> is intended for, and many of the issues you discuss in that report are
> quite dear to my heart.
>
> Great. I'd be very interested in which specifics you might be
> interested in further involvement in.

Well, I do have a nice tool and ideas about how to use it that I think
could be very useful in such a project at some stage; the tool is admittedly
not perfect, but IMNSHO the best of its class. Of course, there are
competitors whose proponents likely feel the same way about theirs;
some may be better connected to DML. But from a quick scanning of
the NRC's report (http://www.nap.edu/openbook.php?record_id=18619,
I assume that's related to the report you posted?)
it's not clear that LaTeX=>(XML|HTML) conversion is even on their radar.
So, whether this type of contribution is seen as important at
this stage by the DML project, I can't say.

> SLoan is clearly looking for participants capable of building working
> systems, which you have done, as have others on this list.
> I expect you would know if NIST/DLMF has received any invitation from
> Sloan/Wolfram to
> participate in a DML conference, right? It would be strange if not,
> which I infer from your comments.

Not that I'm aware of. We certainly have a working system.
But in spite of the "Library" in DLMF's name, perhaps it isn't
considered in the same category; it does correspond only to a
single book, after all. There is a whole spate of issues
that broader DML has to deal with that we didn't: a potentially huge
collection of inhomogeneous countries, institutions, journals, licenses,
artifacts, languages...

> The process of selection of participants in ongoing DML effort is
> unfortunately opaque.
> But if I have names/organizations/potential_contributions I can suggest
> them to the organizers.
> One question: is there any history of cooperation or lack thereof
> between NIST/DLMF and Wolfram?

Ah, Wolfram is a major player? We did have an interesting conversation
with Stephen Wolfram at an early stage in the DLMF project, but there are no
collaborations. They have their functions.wolfram.com project, of which
they are rightfully proud; undoubtedly there's an element of
competition. I would hope that we are at least collegial :>

James W PITMAN

unread,
May 14, 2014, 6:12:16 PM5/14/14
to planet...@googlegroups.com
Hi Bruce, many thanks for detailed response. We may be getting off focus for plantary-dev
in which case admin please say so and we can continue more privately.
The broader  problem is how do you get people to maintain
highly structured
docs like these and provide them easy ways of converting them to html.
For this
I'll settle for the simplest thing that works.

Indeed, but which end of the candle ends up being the "simplest"?
  Having simple conversion software that works on simple documents,
  and then fighting with authors to make sure they only
  write simple documents?
versus:
  fighting with complex documents in the first place
  and accepting the size & complexity involved?

Factor in the fragmentation of effort you mention below,
and divide to avoid software monoculture.

Understood, this is a challenging problem, and likely one with more than one at least
locally optimal solution.
 

So here we hit the issue. I want to display lists of all kinds of things
with essentially
the same code: articles, books, journals, people, potentially theorems,
problems, solitions, ...
So I cant work with a rigid biblio style. If I work within python
mapping text strings to html,
I can easily customize code to achieve whatever I want. It is much
harder to provide
a consistent templating language to make it possible for others to
safely do that.
Previous attempts at that, .bst files, the internals of biblatex, seem
to me like
complete failures. So does CSL. There has to be something easier than that.

I'm having to read between the lines, here.  I'm guessing
that you're using BibTeX's syntax for bibliographies, but
have extended the set of artifact types and fields?
(a very tempting thing to do!)
Sort of. As preferred I/O for alll internal machine processing, storage and serialization of datasets
(e.g. extracting a query response from a big data store and paging it for consumption
by a page renderer) I use BibJSON http://www.bibjson.org/ which is  a JSON format modeled
largely on BibTeX. 
Unfortunately there is no reliable and easily supportable JSON editor I am aware of. So for human
editing of datasets I prefer to offer either BibTeX or some simpler plain text equivalent for a user to 
edit with conventional tools. Advantage of BibTeX is that BibTeX the program provides some easy
validation of BibTeX the format. For JSON I would have to write my own validator against a JSON schema
or similar. Too much work.
When it comes to BibTeX-like entries for People, Journals etc, I prefer naive plain text markup, and
hope for the best validation. Mostly it is obvious if something is wrong by scanning HTML renderings.

 
And I'm guessing further that it isn't necessarily the case that BibTeX's
.bst language inherently can't handle those enhancements, but that
you can't quite convince it, while maintaining any vestige of sanity?
[_not_ meant as an insult to your programming abilities
  --- I've _been_ there, myself!]
Right. I swore a solemn oath more than 10 years ago never again to open a .bst file again. 
It is so much easier for me to write python code to convert JSON entries to HTML or whatever.

As for biblatex, there are some big fans on this list.
I, myself, haven't had the opportunity to try it out,
as it doesn't run on my Fedora 20 system.
So much for ease of installation! :>
Interesting, but unfortunately typical of the challenges. 

At any rate, given the rather open-ended description
of your bibliographies, I obviously can't say whether
or not LaTeXML would handle them satisfactorily.  Somebody'd
just have to try it. Maybe with the future improvements
to LaTeXML's bibstyle handling it could be better, maybe not.

For me the main issue would be ease of bibstyle definition. I just dont see why
it should made be harder than it is in python,  which in my experience is very easy.


Perhaps we get spoiled byGoogle.

Funny, how they can't manage MathML, though.
I guess the MathML aware community is too small for them to care. 

But yes, that's a programming philosophy I try to adhere to:
If something is reasonably recoverable, try to proceed,
and make the best of it, but warn the user of the problems
so they have the opportunity to fix it;  Try to distinguish
levels of severity (Info, Warning, Error).
Sounds good.

 
    I guess you know that LaTeXML was developed exactly
    to support http://dlmf.nist.gov/, a flavor of DML.
    A core design goal was preserving and (as far as possible)
    enhancing the presentation and structure of the documents,
    but more importantly the semantic content, both through enhancing
    the markup through extra metadata on the input side, and after
    processing.
    So, to my mind, DML's are exactly the kind of application that LaTeXML
    is intended for, and many of the issues you discuss in that report are
    quite dear to my heart.

Great.  I'd be very interested in which specifics  you might be
interested in further involvement in.

Well, I do have a nice tool and ideas about how to use it that I think
could be very useful in such a project at some stage; the tool is admittedly
not perfect, but IMNSHO the best of its class. Of course, there are
competitors whose proponents likely feel the same way about theirs;
some may be better connected to DML.  But from a quick scanning of
the NRC's report (http://www.nap.edu/openbook.php?record_id=18619,
I assume that's related to the report you posted?)
Yes.
 
it's not clear that LaTeX=>(XML|HTML) conversion is even on their radar.
If not mentioned this is an oversight. I personally think that conversion from
formats authors are willing to work in, like LaTeX, to more web-aware formats
is potentially an important component of a DML.
So, whether this type of contribution is seen as important at
this stage by the DML project, I can't say.
Neither can I, as the admin of the project is very fluid at present. 


 SLoan is clearly looking for participants capable of building working
systems, which you have done, as have others on this list.
I expect you would know if NIST/DLMF has received any invitation from
Sloan/Wolfram to
participate in a DML conference, right? It would be strange if not,
which I infer from your comments.

Not that I'm aware of.  We certainly have a working system.
But in spite of the "Library" in DLMF's name, perhaps it isn't
considered in the same category; it does correspond only to a
single book, after all.  There is a whole spate of issues
that broader DML has to deal with that we didn't: a potentially huge
collection of inhomogeneous countries, institutions, journals, licenses,
artifacts, languages...
Right, but I think that getting things from where they are now to more interable
web formats is an important component. 


Ah, Wolfram is a major player?
Sloan funded a partial fractions data extraction project with Wolfram
Champaign, IL
$123,453 over 12 months to prototype part of a 
Mathematical Heritage Library by constructing and 
demonstrating a computable database concerned 
with continued fractions.
Project Director: Michael Trott, Content Manager 
for Physics 

It bothers me that the product of this project is not accessible in bulk with an open license.

We did have an interesting conversation
with Stephen Wolfram at an early stage in the DLMF project, but there are no
collaborations. They have their functions.wolfram.com project, of which
they are rightfully proud; undoubtedly there's an element of
competition. I would hope that we are at least collegial :>

We comment in the NRC report on the lack of interoperability and ability to aggregate mathematical
data from different sources such as DLMF and Wolfram. I hope a DML effort might break down some
barriers like this.

Michael Kohlhase

unread,
May 15, 2014, 12:02:25 AM5/15/14
to planet...@googlegroups.com
Dear Jim, dear all,

sorry to get into this discussion so late.
On 13.5.14 22:58, James W PITMAN wrote:
>
> OK. If we think of Planetary as a participant in DML then can you give me a
> few sentences on
> -- nature of organization
> -- contact people and their roles in the organization
> -- possible contributions to a DML effort as outlined in the NRC report?
> Also, how to clarify distinction if any between Planetary and Michael K's
> group which is also a natural participant?

I have been asked by Michael Trott to be part of a (technical planning
group for) a consortium of the possible Sloan-backed world heritage
ditital mathematical library (WHDML) effort. The invitation to this
mentioned a couple of topics for that he sees or relevance, (the first
three topics I see myself interested in and relevant to the discussion
on this thread).

> - an ontology for mathematics (or, for a start, a mathematical wordnet)
> - a semantically meaningful encoding of mathematical definitions,
theorems, formulas, lemmas, ...
> - semantic mathematical search
> - fingerprinting of mathematical theorems (e.g.,
http://www.ams.org/notices/201308/rnoti-p1034.pdf)
> - tools for assisting mathematical proof based on semantically encoded
mathematics
> - OCR of mathematics
> - manual and computerized (machine-learning based) tagging of
mathematical papers
> - annotation tools for PDF files and scanned papers

The first is almost exactly what we are trying to do with our SMGLoM
system (currently between KWARC and ZBMath, but we will invite outside
participation once we have a viable system), the second we have done for
years and planetary gives the user side for, and the third is covered
from our side by the MWS system. But Jim was mentioning LaTeXML and
conversion as a prerequisite for the DML effort. I agree on this and had
already inserted another topic into the DML discussion (Michael accepted
that).

> - semantics extraction methods/tools for math documents (aka. math
linguistics)

I have also suggested Bruce/NIST as a partner to him, but I have not
heard back yet.

Michael

--
----------------------------------------------------------------------
Prof. Dr. Michael Kohlhase, Office: Research 1, Room 168
Professor of Computer Science Campus Ring 1,
Jacobs University Bremen D-28759 Bremen, Germany
tel/fax: +49 421 200-3140/-493140 skype: m.kohlhase
m.koh...@jacobs-university.de http://kwarc.info/kohlhase
----------------------------------------------------------------------

m_kohlhase.vcf

James W PITMAN

unread,
May 15, 2014, 12:59:57 PM5/15/14
to planet...@googlegroups.com
Hi Michael, I am encouraged to learn that you have been included in the invitation from
Sloan/Wolfram. I definitely think they should include Bruce/NIST as a partner. This will 
put on the table the issue of interoperability between NIST and Wolfram efforts with 
functions. I think such techno-legal issues and agreeing on formats for open exchange of canonical
representations of math info should be among the most important things that could be resolved,
I hope without huge effort,  by a consortium of open math info partners.
I'd be interested in  your impressions of that issue. 
with best wishes to all
--Jim

Michael Kohlhase

unread,
May 16, 2014, 1:09:47 AM5/16/14
to planet...@googlegroups.com
Dear Jim, dear all,
On 15.5.14 18:59, James W PITMAN wrote:
Hi Michael, I am encouraged to learn that you have been included in the
invitation from Sloan/Wolfram.
thanks.
I definitely think they should include Bruce/NIST as a
partner. This will
put on the table the issue of interoperability between NIST and Wolfram
efforts with
functions. I think such techno-legal issues and agreeing on formats for
open exchange of canonical
representations of math info should be among the most important things that
could be resolved,
I hope without huge effort,  by a consortium of open math info partners.
I'd be interested in  your impressions of that issue.

I am not sure that this will be so easy. To me there seem to be to be two issues:

- representations for the DLMF/functions material,
- representations for the WHDML over all. 

Let's take stock of the former and judge it in the light of the latter.

If the wonderful work that Wolfram Inc is doing on Wolfram Alpha is any indication on the direction they intend to take, then they are representing  (computational aspects of) elementary functions in Mathematica. I see this as essentially a formalization effort, since the essential objects of the knowledge are machine-actionable (by the Mathematica Kernel). The informal parts of the mathematical knowledge (mostly natural language) are essentially an add-on that is delivered together with the search results, but not machine-actionable. MathWorld is less computationally biased, but still uses Mathematica representations.

DLMF on the other hand, is largely a presentational effort based in (slightly stylized) LaTeX with formalization essentially as an afterthought. As a consequence, the DLMF is almost entirely suitable for human consumption (i.e. Mathematicians reading web pages). Where you _can_ compute with or visualize the material, this is independently hand-coded.

So approaches approach the DML topic from different sides. It seems technically possible for the approaches to "meet in the middle" - for instance in a framework the supplies parallel markup like MathML that combines and interrelates formal and informal representations.

I see the legal problems and business interest as the bigger problem for reaching an integration on the DLMF/functions front. Wolfram Inc. has the legitimate interest to protect their investments and promote their Mathematica system (though that may involve opening up some content as with MathWorld). DLMF has the interest to serve the mathematical community (though that may involve generating revenue streams for future improvements).

Let us now turn to the consequences for all of WHDML.
 
Although I am all for machine-actionable (semantic) representations, it seems that the "hand-formalization and centralized curation" will not scale to WHDML-size/diversity corpora. There are five aspects to this:
  1. Even thought the Wolfram-Alpha content is very impressive, it only covers a relatively small thematic niche of Maths - that where Mathematica-style computation is helpful. That is a very reasonable business decision, but impressive as it is, we cannot see it as an indication that (formal) Mathematica as a representation format can cover all of mathematics. It might, but that is not proven, I am skeptical.
  2. I am also skeptical that any institution can pay and staff the human effort necessary for such a hand curation. After all the effort invested in Wolfram-Alpha is estimated to exceed 1000 person years.
  3. Mathematica is a format whose meaning is largely specified "in the Mathematica code". That is independent of the form it is serialized as in. In particular, even representing Mathematica content in standards  like Content MathML will not change this, since the meaning references still go into Mathematica code.
  4. Mathematica is a proprietary format, which is controlled by one institution that is not under the control of the mathematical community. In particular, to extent the scope of the representation format one must extend Mathematica (and essentially only Wolfram Inc. can do that (effectively)).
  5. In contrast to that, Mathematicians can extend the scope of mathematics by just publishing papers with definitions/theorems without having to switch the medium, and without having to (initially) worry about computational aspects. 

All of these problems I do not see in the DLMF model of starting with informal content, transforming to machine-readable formats, and then enriching it with semantics in a community effort the more suitable model. It is sowhat slower, but more community-oriented and scalable. In particular I care very much about point 5. above (I call that *bootstrappability of the format*), and that seems supported in the DLMF model but broken in Wolfram/Mathematica model.

BTW, I am using the DLMF vs. Mathematica/Wolfram argumentation only as the most conspicuous example. The arguments apply to any similar situation/institutions in the same way.

Concretely Even though Wolfram/Mathematica should be involved  and central in the WHDML for their ability to move things forward, I think that the formats should not be set by them.

OK, that turned out longer than intended, but that is what I think  (you asked for it).

Michael
with best wishes to all
--Jim


m_kohlhase.vcf

Bruce Miller

unread,
May 16, 2014, 8:19:10 AM5/16/14
to planet...@googlegroups.com
On 05/14/2014 06:12 PM, James W PITMAN wrote:
> Hi Bruce, many thanks for detailed response. We may be getting off focus
> for plantary-dev

Indeed; I hadn't meant to turn this into a
LaTeXML advertisement! Apologies to the planetary
community!

Of course, I could spin it into a Lesson For Us All,
exploring the differences & conflicts between amazing,
bleeding edge research software that only works
during waning moon versus It Just Works production
code.....

(with LaTeXML being somewhere in the vague middle)

bruce

Joe Corneli

unread,
May 16, 2014, 10:46:31 AM5/16/14
to planet...@googlegroups.com
> I see the legal problems and business interest as the bigger problem for
> reaching an integration on the DLMF/functions front. Wolfram Inc. has the
> legitimate interest to protect their investments and promote their
> Mathematica system (though that may involve opening up some content as with
> MathWorld). DLMF has the interest to serve the mathematical community
> (though that may involve generating revenue streams for future
> improvements).

http://mathworld.wolfram.com/about/terms.html isn't very open in terms
of downstream use, and to be honest,
http://dlmf.nist.gov/about/notices#S1 isn't much better.

I think licensing and terms of use are going to be the first thing
that will have to be decided for there to be "a 21st Century Global
Library for Mathematics Research" rather than many distinct
non-interoperable libraries (which is the current state of affairs).
Formats for exchange aren't serviceable if we're legally forbidden
from actually exchanging content!

I personally think that the Creative Commons Attribution-ShareAlike
license or something more permissive like CC-By or CC-Zero would be
suitable for a truly global effort.

- These licenses permit sharing, adaptation, and commercial
exploitation: these are good things!
- They are at least somewhat in the spirit of the WIPO treaty (yes,
WIPO of all things!), which says that mathematical concepts are not
copyrightable - although only CC-Zero goes the whole way -
http://www.ic.gc.ca/eic/site/ippd-dppi.nsf/eng/ip00086.html (Article 2)

If DML is to be *fully* decentralized, and basically just curated
links to contents found elsewhere, then licensing is of course less of
an issue for DML itself - although exchange and licensing would
continue to be relevant for the community, and maybe the DML could
convene the right people to talk through this issue.

If DML does manage some content, then I think things that are free and
open should be given preference, for example, the rule might be that
only free/open things can be submitted as depository copies (at which
point they would be managed as a Copy of Record). Non-free things
could be linked to but wouldn't be able to avail themselves of the
library's Copy of Record service and any other enhancements (like
format shifting, interlinking with other content, etc.). Perhaps the
MathHub "escrow" policy could be adopted as a stepping stone:
http://mathhub.info/help/ip

Jim's paper mentions "best practices to facilitate knowledge
management in research mathematics." I think it would be good to have
a clear statement saying that that free/open IS a best practice - if
others agree about that - and to talk about the ramifications for
those authors, and companies, who want more control. For example,
Jim, do you think having a Copy of Record would be enough to satisfy
authors who are concerned about the integrity of their works and who
would not wish to be associated with downstream versions?

I don't know how DLMF or Wolfram would think about this, but if we are
going to talk about a global library and not just a bunch of private
libraries, it should be talked about.

I think Jim's earlier question about business models is also an
important topic to talk about - it won't really do to talk about
"commercial exploitation" and have this only be a notional thing. If
Wolfram's main profit-making business is selling software, for
example, wouldn't it be good for them to be able to include lots of
free content *with* the software? CC-Zero content would raise the
least obstacles for this, on the downstream side. If we imagined that
free textbooks became standard, this wouldn't mean that teachers' or
authors' services were free, so they would presumably still have jobs.
Paper copies of the books could still be bought and sold. Consulting
services could still be offered. While I'm dreaming, even Wolfram
could move to a service-and-consulting based business model.

I think we need to think about where it's all headed!

Bruce Miller

unread,
May 16, 2014, 1:16:42 PM5/16/14
to planet...@googlegroups.com
On 05/16/2014 10:46 AM, Joe Corneli wrote:
>> I see the legal problems and business interest as the bigger problem for
>> reaching an integration on the DLMF/functions front. Wolfram Inc. has the
>> legitimate interest to protect their investments and promote their
>> Mathematica system (though that may involve opening up some content as with
>> MathWorld). DLMF has the interest to serve the mathematical community
>> (though that may involve generating revenue streams for future
>> improvements).
>
> http://mathworld.wolfram.com/about/terms.html isn't very open in terms
> of downstream use, and to be honest,
> http://dlmf.nist.gov/about/notices#S1 isn't much better.

acknowledged...

> I think licensing and terms of use are going to be the first thing
> that will have to be decided for there to be "a 21st Century Global
> Library for Mathematics Research" rather than many distinct
> non-interoperable libraries (which is the current state of affairs).
> Formats for exchange aren't serviceable if we're legally forbidden
> from actually exchanging content!

Some _naive_ comments, with perspective from DLMF
but _not_ speaking for the DLMF project,
and most certainly _not_ implying any licenses!

Copyright laws are complex enough, but there's something that
doesn't seem quite covered (to my limited knowledge) that we
would want to apply with MKM.

Creators of reference works _want_ their work to be used,
but don't want to be ripped off wholesale, so I guess
we at DLMF took the easy way out with a standard copyright and
rely on fair use doctrines.

After the labor invested, many won't appreciate having their
work taken and republished with a cheaper binding (for better
_and_ worse, it happened), or re-hosted with a faster server,
fancier colors & javascript or whatever. You might not feel
that way, but it's a valid point of view.

OTOH, not only does DLMF expect & want people to use
"little bits" of DLMF (fair use), but I suspect that we
would find certain kinds of en-masse usages a good thing
as well. Obviously search indexing! :> But probably also
even sucking in the entire content (cmml if we had it!)
into ATP (either to prove, or use in proofs), CAS (adding
identities to be used) and such.

The problem, of course, is how to draw such lines.
I get headaches from reading Derived Works definitions;
otherwise typical licenses, for all their words, end up
essentially all or nothing.

I guess some of this depends on which model of WDML
you're thinking about; there's at least 2.
In the above, I'm partly talking about a massive
AI/MKM knowledge base where stuff is magically,
mysteriously used. That's cool, pie in the sky, perhaps.
Attribution (see below) is still important.

At the other end is a massive indexy sorta thing,
where you're assisted in finding article-like things.
While you may not like DLMF's license, it is free
& open for web access, so there's no issue at all
for such a library to point to it. And frankly,
I'd see little point in copying DLMF content directly
_into_ the WDML and serving it from there.

Yet, I am also often frustrated to find in such
libraries that the article I need and want is in
fact licensed in some way that I can't actually read it.

[...]

> I don't know how DLMF or Wolfram would think about this, but if we are
> going to talk about a global library and not just a bunch of private
> libraries, it should be talked about.
>
> I think Jim's earlier question about business models is also an
> important topic to talk about - it won't really do to talk about
> "commercial exploitation" and have this only be a notional thing. If
> Wolfram's main profit-making business is selling software, for
> example, wouldn't it be good for them to be able to include lots of
> free content *with* the software? CC-Zero content would raise the
> least obstacles for this, on the downstream side. If we imagined that
> free textbooks became standard, this wouldn't mean that teachers' or
> authors' services were free, so they would presumably still have jobs.

Well, the teachers I know essentially _do_ work for free, but...
Seriously, either you sell your software/books/whatever, or you
have some other source of income. Often, attribution of the "free"
work is critical to sustain the other income, though. Citations
are vital to university professors (at least, till tenure :> )
DLMF couldn't survive if NIST didn't get the sense that people
were using it and found it to be useful.

In the 2nd category of WDML, such attribution seems easy to
guarantee; in the 1st category it seems tricky.

bruce

Joe Corneli

unread,
May 16, 2014, 3:26:17 PM5/16/14
to planet...@googlegroups.com
> In the 2nd category of WDML, such attribution seems easy to
> guarantee; in the 1st category it seems tricky.

Remix is what makes the Copy of Record idea seems important. If Alice
submits a work to the DML and Bob remixes it, Alice should definitely
get credit for *her* work, but it's not clear that she should get
credit for Bob's (potentially just incremental) work. If Bob does do
significant work, even if it's all remix, then he may well want to
deposit it as a Copy of Record. Maybe Alice should get a few extra
"brownie points" when that happens.

I'm suggesting that, for purposes of academic credit, the registration
(and potential accompanying peer review) is even more important than
*copy*-right. For purposes of commercial sale or computational reuse,
the considerations are a bit different.

Computational use: Empirically, proprietary software is quite popular,
especially when:

- the service providers derive benefits from a monopoly or quasi-monopoly
- the consumers, for their part, prefer to subsidize a firm rather
than a commons

Thinking about a physical library: if consumers wants to use
proprietary stuff (like Mathematica), then I think it would be fine
for the library to buy an appropriate license (if it has the money)
and make Mathematica available to the users. If the users are true
geeks and prefer to use Maxima or whatever, then the library should
probably be willing to ante up money to help here too to invest in
Maxima maintenance or hack-fests or whatever.

Commercial sale (including printed copies, subscription services,
etc.): costs and benefits are very similar to the above.

Lots of stuff is already there for the reading, but the idea of going
"beyond aggregation alone to create a comprehensive digital
mathematics information resource which could be of much greater value
than the sum of its contributing publications" means that *just*
reading isn't so much what's at stake.

So, what is the "more"? Is it remix? Is it computation? Community?
-- all of the above? Is this a library supposed to serve the typical
mathematician, the typical mathematics student, the (always atypical)
computer math geeks -- all of the above?

The top recommendation in Jim's note is "enriching the knowledge base
of mathematics" -- that's harder to do if remix isn't permitted --
except in the case of adding new content, which is how mathematicians
usually go about doing it :-)

The report mentions "the extent to which the mathematical literature
might be adequately tagged with identifiers for mathematical concepts
to facilitate linking and navigation" - here, that Deyan's work on
NNexus is pretty much state-of-the-art (AFAIK) -- some investments
there might be the equivalent of *infrastructure* for the library.

It's an example of something we can do that won't violate TOS's and
copyright clauses, but I think there's much more that we could do (at
least on a demo basis) with remixable content. If the library focuses
a percentage of time and energy on free/open content is that going to
be 80%? Or 20%?
Reply all
Reply to author
Forward
0 new messages