Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Generating fancy/pretty documents

2 views
Skip to first unread message

Bernie Cosell

unread,
Jul 5, 2008, 9:52:24 AM7/5/08
to
Suggestions on how to produce "nice" text documents from Perl? Plain text
is easy, of course, but I need to be able to do fancier/formatted stuff.
What I've done in the past is generate HTML [with <tables>, <fonts>, etc]
and then use an app like HTMLDOC to convert that to PDF (and, indeed, a
long time back I used to generate troff input and use ghostscript to
process *that*). That kind of approach sort of works, but the whole
procedure is a bit klunky and fragile. Is there some way to produce
"pretty" PDF or ODF document directly? I see there's a suite of modules
PDF::API2 -- it looks pretty complicated....anyone used that? PDF::Create
also seems complicated.

Am I just being naive about this? As I say, in the past I've used troff
and HTML as an intermediate "layout" format and things like fonts, titles,
indenting, tables, simple drawing (e.g., putting a box around some text)
etc were all relatively easy. Is that kind of thing easy/reasonable to do
with one of the "direct to PDF" packages? In looking at the organization
and methods for the PDF modules, using HTMLDOC (<http://www.htmldoc.org/>)
starts looking more and more attractive...:o) Thanks!

/Bernie\
--
Bernie Cosell Fantasy Farm Fibers
ber...@fantasyfarm.com Pearisburg, VA
--> Too many people, too few sheep <--

Joost Diepenmaat

unread,
Jul 5, 2008, 10:10:08 AM7/5/08
to
Bernie Cosell <ber...@fantasyfarm.com> writes:

> procedure is a bit klunky and fragile. Is there some way to produce
> "pretty" PDF or ODF document directly? I see there's a suite of modules
> PDF::API2 -- it looks pretty complicated....anyone used that? PDF::Create
> also seems complicated.

PDF::API2 is complicated because PDF is complicated. It's not *that*
complicated, though. It depends on what your source and output data
are like.

If your data is mostly text and some illustrations & tables, you may
want to use latex instead. Latex at least makes if almost trivial to
generate a nice looking pdf/postscript document from straightforward
source data (sort of like POD, only much, much more flexible and
extensible) and it has superb support for references, indexes,
table-of-contents etc, but if you need very complex layouts with lots
of illustations PDF::API2 may be easier to use.

> Am I just being naive about this? As I say, in the past I've used troff
> and HTML as an intermediate "layout" format and things like fonts, titles,
> indenting, tables, simple drawing (e.g., putting a box around some text)
> etc were all relatively easy. Is that kind of thing easy/reasonable to do
> with one of the "direct to PDF" packages? In looking at the organization
> and methods for the PDF modules, using HTMLDOC (<http://www.htmldoc.org/>)
> starts looking more and more attractive...:o) Thanks!

My experience with HTML -> PDF translators is that it's too easy to
generate some HTML that won't get translated correctly, and you can
usually forget about generating references to sections with page
numbers and things like that.

--
Joost Diepenmaat | blog: http://joost.zeekat.nl/ | work: http://zeekat.nl/

Eric Pozharski

unread,
Jul 5, 2008, 5:04:07 PM7/5/08
to
Bernie Cosell <ber...@fantasyfarm.com> wrote:
> Suggestions on how to produce "nice" text documents from Perl? Plain
> text is easy, of course, but I need to be able to do fancier/formatted
> stuff. What I've done in the past is generate HTML [with <tables>,
> <fonts>, etc] and then use an app like HTMLDOC to convert that to PDF
> (and, indeed, a long time back I used to generate troff input and use
> ghostscript to process *that*). That kind of approach sort of works,
> but the whole procedure is a bit klunky and fragile. Is there some
> way to produce "pretty" PDF or ODF document directly? I see there's a
> suite of modules PDF::API2 -- it looks pretty complicated....anyone
> used that? PDF::Create also seems complicated.

I would slightly disagree with Joost Diepenmaat -- any interface is no
more complicated than underlying structure. If you are familiar with
a structure then no braindamage in an interface would stop you. If a
structure is unknown, you would fail sooner or later. Even I would
stress that as much simple an interface is as harder and sooner you will
fail.

> Am I just being naive about this? As I say, in the past I've used
> troff and HTML as an intermediate "layout" format and things like
> fonts, titles, indenting, tables, simple drawing (e.g., putting a box
> around some text) etc were all relatively easy. Is that kind of thing
> easy/reasonable to do with one of the "direct to PDF" packages? In
> looking at the organization and methods for the PDF modules, using
> HTMLDOC (<http://www.htmldoc.org/>) starts looking more and more
> attractive...

Wait a moment. You have to make clear for yourself, either you want
convert HTML to PDF or you want to have nice PDF. Those are slightly
different beasts.

Once I've downloaded PDF made from SGML (LDP, HOWTO's, you know)... It
was awful. Typesetting was even worse than wordproccessors make. I was
sick of it. However it was no different from rendered HTML. So what's
the problem? That PDF can't be passed upstream. You can read it
(somewhat), you can sidenote it, you can discard it. But you can't give
it away. Giving away such an awful typesetting, would render you
incompetent.

If you want good quality PDF (or PostScript, which are somewhat the
same), then you don't need converter. You need a tool that specializes
on B<typesetting>. You B<must> not focus on making typesetting
yourself. And here you have an option.

=item LaTeX

Hmm,.. Let's be honest, LaTeX isn't my favorite tool for making
typesetting. It's my only tool for that. But here we have the first
pitfall: you have to be familiar with it. I would disagree with Joost
Diepenmaat again -- LaTeX isn't easy, it's fscking braindamage, but as
soon as you adapt to it, than the whole world of typesetting becomes
bright and shiny (I must admit, it's still dark for me). First of all,
forget about B<PerlTeX>; you have to get that out of sandbox first; if
you'd fail (I failed), than you get nothing more than a heavy calculator
inside LaTeX source.

I'm quite successful in making LaTeX-generators. Perl does all
calculations, then passes to LaTeX some source and points to F<.sty>
(style files are alike (not exactly) F<.pm>). Now I'm brave enough to
pass LaTeX output directly to printer without proofreading.

If you are still not afraid (and you are?), (looking at yours headers)
consider MikTeX -- if my guess is right it's something alike ActivePerl
but from LaTeX world.

=item openoffice.org

(not tested, any feedback is appreciated). There're at least 2
distributions about OO.org on CPAN. B<OpenOffice::OOBuilder> seems not
ready -- it declares that it works only with spreadsheets.
B<OpenOffice::OODoc> seems to be much better.

I believe, that if you go with OO.org (whatever interface you chose) it
would be just slightly different with LaTeX way. There's something
named B<Styles> inside; maybe that would allow some kind of templating;
maybe not.

=item some-trademark

And what I'm supposed to say here?

--
Torvalds' goal for Linux is very simple: World Domination

Bernie Cosell

unread,
Jul 6, 2008, 7:10:46 AM7/6/08
to
Joost Diepenmaat <jo...@zeekat.nl> wrote:

} Bernie Cosell <ber...@fantasyfarm.com> writes:
}
} > procedure is a bit klunky and fragile. Is there some way to produce
} > "pretty" PDF or ODF document directly? I see there's a suite of modules
} > PDF::API2 -- it looks pretty complicated....anyone used that? PDF::Create
} > also seems complicated.
}
} PDF::API2 is complicated because PDF is complicated. It's not *that*
} complicated, though. It depends on what your source and output data
} are like.

I realize: that the PDF modules aren't *typesetting* modules but *PDF*
modules and you need to understand PDF and must use the modules to make
your doc. I've [hand]coded PostScript back in a bygone age and the PDF
calls seem pretty similar -- I guess I vaguely remember [again, from a
bygone age..:o)] that PDF was derived from EPS.

} If your data is mostly text and some illustrations & tables, you may
} want to use latex instead. Latex at least makes if almost trivial to
} generate a nice looking pdf/postscript document from straightforward
} source data (sort of like POD, only much, much more flexible and
} extensible) and it has superb support for references, indexes,
} table-of-contents etc,

LaTex would be a good alternative. I've used LaTex over the years (I wrote
a thesis using it back when you had to carve the LaTeX commands onto stone
tablets) and the problem [for me] was usually twofold: first was getting it
to run. I was always impressed and dismayed at the HUGE amount of crap
that came along with LaTeX [not to mention using its private font world
that I never got a good handle on]. Second was, if I'm remembering right,
that it was hard to get useable output from it [I think it generated PS and
then I needed to install GhostScript, another behemoth, to get that
converted to something more useful].

BUT: it is now a LOT of years later, all of that has matured I'm sure,
isn't as much of a mess as it used to be, and it isn't a "Perl" question at
all -- So thanks for the reminder. I'll go and check out LaTeX distros for
windows, etc [and see if I can find my LaTeX book -- it's on my shelf here
*somewhere* :o)].

} > Am I just being naive about this? As I say, in the past I've used troff
} > and HTML as an intermediate "layout" format and things like fonts, titles,
} > indenting, tables, simple drawing (e.g., putting a box around some text)
} > etc were all relatively easy. Is that kind of thing easy/reasonable to do

} > with one of the "direct to PDF" packages? ...


}
} My experience with HTML -> PDF translators is that it's too easy to
} generate some HTML that won't get translated correctly, and you can
} usually forget about generating references to sections with page
} numbers and things like that.

Just so. I can only say two things to this [on the mark] comment: first is
that for some of what I want to do that's likely to be OK. Note that since
I'll be generating the HTML *with*Perl*, I have a lot of control over not
generating things that HTMLDOC won't translate properly (NB: I *have* used
this approach in the past and it worked quite well.) Second, it is easy
to "proof" the output (indeed, in the past when I've gone this route, I
wrote my Perl stuff as CGI pgms, rather than standalone pgms, and it was
remarkably simple to get everything debugged with just a few mouse-clicks
in the browser). But you're right: even when everything was perfect, I
needed to [and did] play around with the HTML I generated to produce
something that HTMLDOC did the "right thing" with. And the best I can say
is that the PDFs were "OK" [basically, looking like unadorned text web
pages].

So thanks for the pointer: time to go re-learn LaTeX and see how it goes...

Bernie Cosell

unread,
Jul 6, 2008, 7:10:48 AM7/6/08
to
Eric Pozharski <why...@pozharski.name> wrote:

} Bernie Cosell <ber...@fantasyfarm.com> wrote:

} > Suggestions on how to produce "nice" text documents from Perl? Plain
} > text is easy, of course, but I need to be able to do fancier/formatted
} > stuff.

} I would slightly disagree with Joost Diepenmaat -- any interface is no


} more complicated than underlying structure. If you are familiar with
} a structure then no braindamage in an interface would stop you. If a
} structure is unknown, you would fail sooner or later. Even I would
} stress that as much simple an interface is as harder and sooner you will
} fail.

Perhaps, but consider LaTeX: you can go an awful long way with it *NEVER*
having to understand the awfulness of TeX.

} > Am I just being naive about this? As I say, in the past I've used
} > troff and HTML as an intermediate "layout" format and things like
} > fonts, titles, indenting, tables, simple drawing (e.g., putting a box
} > around some text) etc were all relatively easy. Is that kind of thing
} > easy/reasonable to do with one of the "direct to PDF" packages? In
} > looking at the organization and methods for the PDF modules, using
} > HTMLDOC (<http://www.htmldoc.org/>) starts looking more and more
} > attractive...
}
} Wait a moment. You have to make clear for yourself, either you want
} convert HTML to PDF or you want to have nice PDF. Those are slightly
} different beasts.

Yes, but the difference is subtle: what I want is "nice PDF", but I have
*control* over the HTML. *general* HTML->PDF conversion is very difficult
and surely does produce marginal PDF, but if you are generating the HTML
from Perl *specifically* to keep HTMLDOC happy [that is, using HTML as an
intermediate language, much like troff or LaTeX, rather than as a markup
language in its own right] then I think you have a better shot at helping
an app like HTMLDOC do a decent job.

} ... However it was no different from rendered HTML.

That's the thing, for sure. Even if you tune and tweak the HTML to
accommodate the converter, you're going to be left with a document that
looks like a printed web page. Alas.. :(

} =item LaTeX
}
} Hmm,.. Let's be honest, LaTeX isn't my favorite tool for making
} typesetting. It's my only tool for that.

Ah. Interesting point.

} ...I would disagree with Joost


} Diepenmaat again -- LaTeX isn't easy, it's fscking braindamage,

If you think LaTeX is bad, then try getting a copy of the TexBook and see
what the underlying machinery it's using is like! :o). Basically, LaTeX
and TeX are like the distinction between setting a document -man or -ms
macros versus raw troff.

}.... First of all,
} forget about B<PerlTeX>; ...


}
} I'm quite successful in making LaTeX-generators. Perl does all
} calculations, then passes to LaTeX some source and points to F<.sty>
} (style files are alike (not exactly) F<.pm>).

Interesting -- as I mentioned, I've _written_ docs using LaTeX but it
hadn't occurred to me to try using LaTeX as an intermediate language for
producing docs from a program. Now that I think about it, I'd guess it
would be fairly easy -- I don't remember a lot of LaTeX [gotta do a bunch
of reviewing :o)] but for *simple* docs it is mostly running text with
occasional "dot commands" to indicate headers, footers, etc.

} If you are still not afraid (and you are?), (looking at yours headers)
} consider MikTeX -- if my guess is right it's something alike ActivePerl
} but from LaTeX world.

Ah, cool. Not sure what my headers betray [I guess other than I'd be
looking for a Windows LaTeX distro] but I'll go and poke around. As I
mentioned in the reply to Joost, I have installed TeX/itsfunnyfonts/LaTeX
on Unix systems over the years and it was an awful mess [and a quite large
pile of it], and maybe the worst part was its idiosyncratic font system
that wouldn't play with anything else's fonts [and so required lots more
screwing around to properly render it... hammering on dvi2ps..UGH! :o)].
But it has *GOT* to be easier these days [in fact, I'd bet that some of
those LaTeX distros do a fairly good job of hiding the TeX underbelly :o)]

} =item openoffice.org
}
} (not tested, any feedback is appreciated). There're at least 2
} distributions about OO.org on CPAN. B<OpenOffice::OOBuilder> seems not
} ready -- it declares that it works only with spreadsheets.
} B<OpenOffice::OODoc> seems to be much better.

Wow.. An interesting approach. Just checking the OODoc and OODoc::Intro
PODs has it looking like an interesting alternative to LaTex. Time for
more reading...tnx!

/B\

greymaus

unread,
Jul 7, 2008, 9:31:27 AM7/7/08
to

My 2 cents worth.. Consider perls write function, if the data _Can_ be
structured that way. Simpler that going through Tex?Latex. Then, if you
want it to look well , pump it through `enscript'.

--
Greymaus
.
.
...

Peter J. Holzer

unread,
Jul 7, 2008, 11:47:22 AM7/7/08
to
On 2008-07-07 13:31, greymaus <grey...@mail.com> wrote:
> On 2008-07-06, Bernie Cosell <ber...@fantasyfarm.com> wrote:
[...]

>> So thanks for the pointer: time to go re-learn LaTeX and see how it goes...
>
> My 2 cents worth.. Consider perls write function, if the data _Can_ be
> structured that way. Simpler that going through Tex?Latex. Then, if you
> want it to look well , pump it through `enscript'.
>

I suspect your definition of "look well" differs from Bernie's.

hp

Dr.Ruud

unread,
Jul 9, 2008, 7:52:16 AM7/9/08
to
Bernie Cosell schreef:

> Suggestions on how to produce "nice" text documents from Perl? Plain
> text is easy, of course, but I need to be able to do
> fancier/formatted stuff. What I've done in the past is generate HTML
> [with <tables>, <fonts>, etc] and then use an app like HTMLDOC to
> convert that to PDF (and, indeed, a long time back I used to generate
> troff input and use ghostscript to process *that*). That kind of
> approach sort of works, but the whole procedure is a bit klunky and
> fragile. Is there some way to produce "pretty" PDF or ODF document
> directly? I see there's a suite of modules PDF::API2 -- it looks
> pretty complicated....anyone used that? PDF::Create also seems
> complicated.

Maybe you are looking for XML::ApacheFOP.

--
Affijn, Ruud

"Gewoon is een tijger."

ber...@fantasyfarm.com

unread,
Jul 14, 2008, 2:26:11 PM7/14/08
to
On Jul 9, 7:52 am, "Dr.Ruud" <rvtol+n...@isolution.nl> wrote:
> Bernie Cosell schreef:
>
> > Suggestions on how to produce "nice" text documents from Perl?  Plain
> > text is easy, of course, but I need to be able to do
> > fancier/formatted stuff. What I've done in the past is generate HTML
> > [with <tables>, <fonts>, etc] and then use an app like HTMLDOC to
> > convert that to PDF (and, indeed, a long time back I used to generate
> > troff input and use ghostscript to process *that*). ...

> Maybe you are looking for XML::ApacheFOP.

*Great* suggestion. I've had a chance to look at this now and it
looks like a very effective way to proceed. XML::ApacheFOP actually
automates it more than I think I would need to. For those of you who
aren't familiar with this, there is an XML spec for document
formatting, called "XSL-FO". All of the [easy :o)] schemes to do the
type of document creation I had been thinkingof involve an
intermediate "markup" format for your document: HTML [for HTMLDOC]
LaTeX [for [La]Tex/dvi], even troff [for -ms and n/troff]. This is a
similar approach, but using an XML spec'ed format. It looks to be
fairly fancy [some of the online examples I've seen of what you can do
with XSL-FO are pretty slick]. Also, since it is XML you can
*GENERATE* it from other sources [cf XSLT].

The neat part is that compared to LaTeX the backend is comparatively
simple: Apache.org has written an XSL-FO =to= nearlyanything
processing program writting in Java [the processor is called "FOP" and
the module referenced above basically automates spawning the JRE and
feeding FOP to it for you].

The XSL-FO spec is a bit complicated, but it doesn't seem all that bad
after you've tried wrestling with the LaTeX book [and don't even
*think* about trying to plow through the TeX book..:o)], so I'm going
to give it a try. THANKS!!

/Bernie\

cartercc

unread,
Jul 15, 2008, 9:29:27 AM7/15/08
to
On Jul 5, 9:52 am, Bernie Cosell <ber...@fantasyfarm.com> wrote:
> "pretty" PDF or ODF document directly? I see there's a suite of modules
> PDF::API2 -- it looks pretty complicated....anyone used that?

I use PDF::API2 to generate hundreds of PDF files from one input file.
PDF::API2 is simple to use for text, images, and simple geographic
shapes, which meets my needs. If you email me privately, I'll send you
my script. Use c c c 3 1 8 0 7 @ y a h o o . c o m

CC

0 new messages