I want to convert these to a single PDF file (with all the links
converted, separate html files converted to pages in the order the
links are encountered in index.html, recursively).
Is this possible ?
> This is a MIME GnuPG-signed message. If you see this text, it means that
> your E-mail or Usenet software does not support MIME signed messages.
> The Internet standard for MIME PGP messages, RFC 2015, was published in 1996.
> To open this message correctly you will need to install E-mail or Usenet
> software that supports modern Internet standards.
>
> --=_mimegpg-commodore.email-scan.com-26770-1246739546-0001
> Content-Type: text/plain; format=flowed; charset="US-ASCII"
> Content-Disposition: inline
> Content-Transfer-Encoding: 7bit
I was obviously looking for something more elegant and less work-intensive.
I have about 600 html files.
So leave then as html. Why do you want to convert 600 files to pdf?
At least part of the work can be done with htmldoc <www.htmldoc.org>.
I've used to construct books from multiple web pages but have always
wanted to control the order, not pull recursively from an index.html
file. The book format is simple enough (build a fake one and look
at the resulting format) that you should be able to construct it from
a list of html files easily enough.
That reduces the problem to one of getting the (recursively generated)
list of html files starting at the index file. There's probably a tool
to do that, somewhere.
I need to send it as a PDF to my boss. He is not interested in
receiving a bunch of html files. The original website is down, so I
cannot send him a link. And PDF is far more compact and convenient for
someone who is not computer savvy.
i _guess_ it is possible to make PDFs with live links (but i do not
know of such a program available for Linux)...and, if they is, i don't
know exactly how much time YOU wanna invest checking that each link
continues to function!!!
anyway, why not burn the entire lot to a CD (DVD if you need that much
room) using the exact directory structure of the (former) site and all
files in the correct place....
then using a browser, the experience will be the same as when the site
was live....all links will work and if he wants a print or .pdf, he
can...etc..
i don't know if the boss uses something from Redmond or Jobs...so i
don't know how to make it idiot proof (that is, if you knew for sure
you could build the Redmond magic file to tell his system to launch
the default browser pointing to /[top directory]/index.html but your
"not computer savvy" boss should be able to follow YOUR "IS computer
savvy" instruction telling him where/how to begin the browse..
--
DenverD (Linux Counter 282315) via Thunderbird 3.0.1-1.1, KDE 3.5.7,
openSUSE Linux 10.3, 2.6.22.19-0.3-default #1 SMP i686 athlon
First Google "html2pdf", install the one you like.
Then a one-line bash script:
for i in *.html; do html2pdf $i ${i%%.html}.pdf; done
Then use pdfsam (PDF split and merge) to aggregate all the PDFs.
pdfsam is here: <http://www.pdfsam.org/>
Set up your own website with the contents, and send him a link. To receive a pdf file whith every
link of the original expanded out in place would be an unholly god awful mess, and he would not
appreciate your sending him that. Links are not well handled by a translation to pdf.
And those 600 pages will translate to about 3000 pdf pages, it your web site is at all typical,
and a pdf file which will take forever to mail and to open. He will NOT thank you.
>> I need to send it as a PDF to my boss. He is not interested in receiving
>> a bunch of html files. The original website is down, so I cannot send
>> him a link. And PDF is far more compact and convenient for someone who
>> is not computer savvy.
>
>
> i _guess_ it is possible to make PDFs with live links (but i do not
> know of such a program available for Linux)...and, if they is, i don't
> know exactly how much time YOU wanna invest checking that each link
> continues to function!!!
>
> anyway, why not burn the entire lot to a CD (DVD if you need that much
> room) using the exact directory structure of the (former) site and all
> files in the correct place....
>
> then using a browser, the experience will be the same as when the site
> was live....all links will work and if he wants a print or .pdf, he
> can...etc..
>
> i don't know if the boss uses something from Redmond or Jobs...so i
> don't know how to make it idiot proof (that is, if you knew for sure
> you could build the Redmond magic file to tell his system to launch
> the default browser pointing to /[top directory]/index.html but your
> "not computer savvy" boss should be able to follow YOUR "IS computer
> savvy" instruction telling him where/how to begin the browse..
Thanks for the suggestion. Giving him a CD is not an option. He is
about as savvy with this stuff as your friendly neighbourhood snail.
PDF it needs to be.
Thanks for the hint. Does html2pdf create links to the generated pdfs ?
Just seems to me that the program would need to know the pdf file names
for the targets.
pdfsam sounds like a subset of pdftk.
All links have been converted to local files with wget.
I don't know precisely what the many versions and varieties of html2pdf
actually do; you'll need to examine the Google results pages to find one
that meets your needs.
What I usually do while looking at a web page is either print the page to
a PostScript file and convert with ps2pdf or copy'n'paste the portions of
interest to me from the web page into OpenOffice Writer and export as a PDF
file -- it always perserves the links as they were at the time of the
copy'n'paste.
Since you've already converted all links in the HTML files to local files,
it would seem a simple sed replacing all ".html" with ".pdf" in the HTML
files prior to conversion to PDF files would suffice (and work) after the
HTML to PDF conversions of the files.
In other words, if a local bletch.html has an imbedded link to "foobar.html",
edit/replace that embedded link to now be "foobar.pdf".
Now convert the bletch.html to PDF.
The converted bletch.pdf file will now have that embedded link to "foobar.pdf".
I understand.
Would pdfsam or pdftk honour that link and convert that into a page
reference when concatening the pdf files ?
Excellent question. :-)
I didn't know about pdftk until you mentioned it earlier, so I checked:
<http://www.accesspdf.com/pdftk/>
It seems quite featureful (though last updated in 2006), but it's not clear if
it creates a Table of Contents (ToC) since searching that web page for both
"index" and "content" found no match.
pdfsam (<http://www.pdfsam.org/>) is a new program whose developer is adding
features all the time. The version I have is about a year old, but I just now
noticed the author is adding new stuff all the time. Might be worth emailing
her with your question (and your original application); she may already have
a solution. Can't hurt to ask. :-)
> Thanks for the suggestion. Giving him a CD is not an option. He is
> about as savvy with this stuff as your friendly neighbourhood snail.
Then there's the "set up a web server and copy the files" option.
Can he open a URL sent by e-mail?
i can't imagine a person so stupid that he can't insert a CD in the
caddy but he CAN look at monster pdf file..
i'm beginning to think "Geico Caveman" is smart as his boss.