Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1051748: RFP: pdf2htmlex -- convert PDF to HTML without losing text or format

87 views
Skip to first unread message

Lev Lamberov

unread,
Sep 12, 2023, 2:10:05 AM9/12/23
to
Package: wnpp
Severity: wishlist

* Package name : pdf2htmlex
Version : 0.18.8rc1
Upstream Author : Lu Wang <coolw...@gmail.com> and other contributors
* URL or Web page : https://github.com/pdf2htmlEX/pdf2htmlEX
* License : GPL-3+
Description : convert PDF to HTML without losing text or format

pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies.
It aims to provide an accurate rendering, while being optimized for Web
display. Text, fonts and formats are natively preserved in HTML.
Mathematical formulas, figures and images are also supported.

pdf2htmlEX is also a publishing tool: almost 50 options make it flexible
for many different use cases: PDF preview, book/magazine publishing,
personal resume, etc.

pdf2htmlEX is optimized for modern web browsers such as Mozilla Firefox
& Google Chrome.

Johannes Schauer Marin Rodrigues

unread,
Sep 12, 2023, 7:10:04 AM9/12/23
to
Hi,

On Tue, 12 Sep 2023 10:57:57 +0500 Lev Lamberov <dog...@debian.org> wrote:
> Package: wnpp
> Severity: wishlist
>
> * Package name : pdf2htmlex
> Version : 0.18.8rc1
> Upstream Author : Lu Wang <coolw...@gmail.com> and other contributors
> * URL or Web page : https://github.com/pdf2htmlEX/pdf2htmlEX
> * License : GPL-3+
> Description : convert PDF to HTML without losing text or format
>

you are aware that pdf2htmlex used to be part of Debian? It is still in
old-old-stable. It was removed with this bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=921471

Are you sure it is wise to include that package into Debian again? The issue
tracker is very low on activity:

https://github.com/pdf2htmlEX/pdf2htmlEX/issues

Thanks!

cheers, josch
signature.asc

Lev Lamberov

unread,
Sep 12, 2023, 11:30:05 AM9/12/23
to
Hi,

Вт 12 сен 2023 @ 13:01 Johannes Schauer Marin Rodrigues <jo...@debian.org>:
Well, the upstream is indeed not very active (the latest commit is on 13
Mar this year). I admit that it looks more like an abandonware, but
probably someone™ could step forward and care for it (I personally lack
the relevant competence). Recently I had to convert LaTeX source (XeTeX,
in fact) to HTML and in fact the best result I got was with
LaTeX->PDF->HTML, where the last convertion was done with this
pdf2htmlex. It produced HTML document which looks exactly like PDF
produced by LaTeX. So, I thought that this tool can be of use to someone
else.

Cheers!
Lev
0 new messages