Under what circumstances, PDF document cannot be converted to HTML?but image ?

360 views
Skip to first unread message

muche...@gmail.com

unread,
Nov 23, 2013, 6:53:34 AM11/23/13
to coolwanglu, pdf2htmlex
i try to convert pdf to html like:
D:\\test\\pdf2htmlEX-v1.0-win32-static\\pdf2htmlEX.exe D:/test/extract/gcdzzsc.pdf gcdzzsc.html --dest /test/extract --embed-image 0 --embed-javascript 0 --embed-font 0 --split-pages 7 --embed-css 0 --css-filename mytest.css
but ,some Pdf document covert to html background image.
Under what circumstances, PDF document cannot be converted to HTML?but  image ?
what are the rules?
 

--best for you!
 
Share.Mu

Lu Wang

unread,
Nov 23, 2013, 7:40:58 AM11/23/13
to muche...@gmail.com, pdf2htmlex
If you mean text, currently text with writing mode fonts are converted into images.
Text with type 3 fonts are experimentally supported, which must be turned on manually during compiling.
Please refer to the pdf2htmlEX wiki for instructions (see 'Building')


regards,
- Lu

Lu Wang

unread,
Nov 23, 2013, 10:04:42 AM11/23/13
to muche...@gmail.com, pdf2htmlex
Most likely they are IMAGES in the PDF instead of text.


regards,
- Lu


On Sat, Nov 23, 2013 at 10:21 PM, muche...@gmail.com <muche...@gmail.com> wrote:
i have a pdf doc. can you help me try converto to HTML doc   ?
 
the result HTML content is a image in fact. i don't know what reason .thanks.

--best for you!
 
Share.Mu
 

muche...@gmail.com

unread,
Nov 23, 2013, 10:16:54 AM11/23/13
to coolwanglu, pdf2htmlex
i think so too.
do you have some pdf docment the structure of the doument?
i know too little about pdf theoretical konwledge.

Lu Wang

unread,
Nov 23, 2013, 10:19:35 AM11/23/13
to muche...@gmail.com, pdf2htmlex
please google for 'pdf specification'



regards,
- Lu

muche...@gmail.com

unread,
Nov 25, 2013, 5:52:29 AM11/25/13
to coolwanglu, pdf2htmlex
Hi Lu:
 
i convert a pdf to html on  an other computer . it happend "Internal Error: Attempt to output 2147483647 into a 16-bit field. it will be truncated and the file may not be useful. "
the system alert a error window : the pdf2htmlEX.exe has stopped working.
what is reason?

Lu Wang

unread,
Nov 29, 2013, 3:27:44 AM11/29/13
to muche...@gmail.com, pdf2htmlex
I've seen the error before, but I'm not sure about it myself.
Please check the version of Fontforge you are using (run `pdf2htmlEX -v`), if it is older than 201209xx, probably it's too old, and you'd better update it.


regards,
- Lu

muche...@gmail.com

unread,
Dec 3, 2013, 8:30:41 PM12/3/13
to coolwanglu, pdf2htmlex
oh,i am receive your mail Just now.
The version of Fontforge Message as :
D:\test\pdf2htmlEX-v1.0-win32-static>pdf2htmlEX.exe -v
pdf2htmlEX version 0.10
Copyright 2012,2013 Lu Wang <coolw...@gmail.com> and other contributers
Libraries:
  poppler 0.24.1
  libfontforge 20130820
Default data-dir: .\data
Supported image format: png jpg
 
As you say,this version should not exist the problem. but .......
 
An other question troubled me a long time.Can you help me ?please.
Question: Can we extract structrued data from pdf file ?this question about pdf structre ,but pdf2htmlEx .
example: there is  a pdf article . can we extract title、Paragraph、Image、Table of content and so on  from the pdf .

Lu Wang

unread,
Dec 5, 2013, 3:57:35 AM12/5/13
to muche...@gmail.com, pdf2htmlex
So I cannot give a quick fix then, would you please file a issue on GitHub, with a sample file?

What I can tell is it must be coordination system calculation during font conversion, but I'm not sure if it's a bug of pdf2htmlEX or fontforge.


About your second question, there's no such data info stored in PDF as far as I know, after PDF is mainly designed for displaying and printing. Generally recognizing the structure is a diffcult task, and I'm afraid that I don't know any promising technique so far.


regards,
- Lu

muche...@gmail.com

unread,
Dec 5, 2013, 11:00:41 AM12/5/13
to coolwanglu, pdf2htmlex
OK, Tomorrow,i will collect the Environment information about run time. and writing on GitHub .thanks .

Wenshan He

unread,
Jun 22, 2017, 5:31:44 AM6/22/17
to pdf2htmlEX, coolw...@gmail.com
This problem solved?I encountered problem is the same with you.

The PDF is can edit the signature printing, PDF is not the same with the other.

The PDF in the attachment.


在 2013年11月23日星期六 UTC+8下午7:53:34,Share.Mu写道:
OperaPrint (1).pdf
Reply all
Reply to author
Forward
0 new messages