Thu 11 May: TeX Hour: Using LaTeXML to access audit arXiv LaTeX source files

1 view
Skip to first unread message

Jonathan Fine

unread,
May 10, 2023, 3:41:37 PM5/10/23
to mem...@tug.org, uk-tex...@googlegroups.com
Hi

The arXiv has about 2.5 million articles, most of which have been processed with LaTeX to produce PDF. In addition, most of these LaTeX articles have been processed with LaTeXML, to produce HTML. Recently the arXix has announced it will be making this HTML available, to improve accessibility. Tomorrow's TeX Hour is about using LaTeXML to audit accessibility of the arXiv LaTeX source.


LaTeXML produces a log file, containing warnings and errors. It provides to some degree an accessibility audit of the LaTeX source files on the arXiv. Tomorrow's TeX Hour is an informal preliminary report on my efforts to use thes log files to audit arXiv source for accessibility. Results so far are outnumbered by problems, but it's early days.

Going to https://ar5iv.labs.arxiv.org/feeling_lucky will send you to a random arXiv article in HTML. At the bottom of that page there is a link to the LaTeX-to-HTML conversion report (the log file), and also the arXiv PDF. Getting the LaTeX source is more work. Automating all this is one of the early problems.

wishing you safe and accessible TeXing

Jonathan

Jonathan Fine

unread,
May 31, 2023, 1:58:22 PM5/31/23
to uk-tex...@googlegroups.com
Hi

The papers on the arXiv, mostly written in LaTeX, are an important sample of current practice. This TeX Hour will gather questions worth asking about the arXiv papers, that can easily be answered by using software to process these papers. Your questions are welcome.


with best regards

Jonathan
Reply all
Reply to author
Forward
0 new messages