CLLv1.1 again: html -> pdf with page numbers?

8 views
Skip to first unread message

Robin Lee Powell

unread,
Nov 17, 2011, 11:57:21 PM11/17/11
to lojba...@lojban.org

So I would dearly love to be able to generate a PDF version of
CLLv1.1 from the HMTL version. The problem I'm finding with all of
the converters I've tried is that they don't fix internal cross
references from names into page numbers, so that the index (for
example) is full of things like:

klama: Section 1.17, Section 2.4

rather than:

klama: 4, 8

Either an html->pdf converter that handles this, or something that
converts the cross references in the resulting pdf, would be
fantastic.

-Robin

--
http://singinst.org/ : Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei". My personal page: http://www.digitalkingdom.org/rlp/

M. Nael

unread,
Nov 18, 2011, 2:29:43 AM11/18/11
to loj...@googlegroups.com, lojba...@lojban.org
If it wasn't a lot of work, we could do it by hand.
Anyway, have you tried Word? It should make it easy to specify a special format for no. headers and use this formatting to specify interlinks with identical phrases: so that [Section.2.4] would be accessible to [See §2.4].
co'o
fe'o

Robin Lee Powell

unread,
Nov 18, 2011, 2:36:40 AM11/18/11
to loj...@googlegroups.com
On Fri, Nov 18, 2011 at 09:29:43AM +0200, M. Nael wrote:
> If it wasn't a lot of work, we could do it by hand.

No, we can't; the whole point here is to automatically generate a
number of different outputs from a single source format, which
happens to be docbook.

> Anyway, have you tried Word?

I don't consider that an acceptable source format for a wide variety
of reasons.

M.Nael

unread,
Nov 18, 2011, 2:40:26 AM11/18/11
to loj...@googlegroups.com
I meant to use word to make the inter-references and generate a PDF from there.

Jonathan Jones

unread,
Nov 18, 2011, 3:27:15 AM11/18/11
to loj...@googlegroups.com, lojba...@lojban.org
Obviously this doesn't matter with printouts, but are the references linked? That is, in the generated pdf, if you click on the "Section 1.17" bit, will it take you to that section?

The only thing I can think of that might fix it is to first determine what page each of these sections is on, and do a find/replace on them in the references. The problem with that is, the initial work of creating the section x = pg(s) y database is quite likely a lot of effort, even if the actual find/replace can be fully automated, and the database would need to be recreated anytime there's a change to the page structure (I mostly refer to addition or deletion of content, but even font changes apply).

If there were a way to automatically determine which page(s) a certain section will be on in the generated document, it would be possible to automate the whole thing, including the f/r database creation, but I have no idea how to go about doing that.


--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to loj...@googlegroups.com.
To unsubscribe from this group, send email to lojban+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.




--
mu'o mi'e .aionys.

.i.e'ucai ko cmima lo pilno be denpa bu .i doi.luk. mi patfu do zo'o
(Come to the Dot Side! Luke, I am your father. :D )

Robin Lee Powell

unread,
Nov 18, 2011, 3:30:46 AM11/18/11
to loj...@googlegroups.com
On Fri, Nov 18, 2011 at 01:27:15AM -0700, Jonathan Jones wrote:
> Obviously this doesn't matter with printouts, but are the
> references linked? That is, in the generated pdf, if you click on
> the "Section 1.17" bit, will it take you to that section?

I believe so, yes.

> The only thing I can think of that might fix it is to first
> determine what page each of these sections is on, and do a
> find/replace on them in the references. The problem with that is,
> the initial work of creating the section x = pg(s) y database is
> quite likely a lot of effort, even if the actual find/replace can
> be fully automated, and the database would need to be recreated
> anytime there's a change to the page structure (I mostly refer to
> addition or deletion of content, but even font changes apply).

If you know enough about PDF format to do the replace, wouldn't you
also know enough about it to count pages in it and figure out which
page something is on?

Jonathan Jones

unread,
Nov 18, 2011, 5:01:29 AM11/18/11
to loj...@googlegroups.com
On Fri, Nov 18, 2011 at 1:30 AM, Robin Lee Powell <rlpo...@digitalkingdom.org> wrote:
On Fri, Nov 18, 2011 at 01:27:15AM -0700, Jonathan Jones wrote:
> Obviously this doesn't matter with printouts, but are the
> references linked? That is, in the generated pdf, if you click on
> the "Section 1.17" bit, will it take you to that section?

I believe so, yes.

> The only thing I can think of that might fix it is to first
> determine what page each of these sections is on, and do a
> find/replace on them in the references. The problem with that is,
> the initial work of creating the section x = pg(s) y database is
> quite likely a lot of effort, even if the actual find/replace can
> be fully automated, and the database would need to be recreated
> anytime there's a change to the page structure (I mostly refer to
> addition or deletion of content, but even font changes apply).

If you know enough about PDF format to do the replace, wouldn't you
also know enough about it to count pages in it and figure out which
page something is on?

Maybe. I don't know enough about PDF format to answer that. You'd have to ask someone who does.
 
-Robin

--
http://singinst.org/ :  Our last, best hope for a fantastic future.
Lojban (http://www.lojban.org/): The language in which "this parrot
is dead" is "ti poi spitaki cu morsi", but "this sentence is false"
is "na nei".   My personal page: http://www.digitalkingdom.org/rlp/

--
You received this message because you are subscribed to the Google Groups "lojban" group.
To post to this group, send email to loj...@googlegroups.com.
To unsubscribe from this group, send email to lojban+un...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/lojban?hl=en.

purpleposeidon

unread,
Nov 19, 2011, 5:02:41 AM11/19/11
to loj...@googlegroups.com
http://www.alistapart.com/articles/boom

If I understand correctly, everything there is all vanilla CSS2/3, and
the company they mention just charges an exorbitant amount of money to
mash ^P for you.

mu'omi'e.djeims.

rden...@gmail.com

unread,
Nov 19, 2011, 5:24:15 AM11/19/11
to loj...@googlegroups.com, lojba...@lojban.org
Depending on the HTML it might be as simple as loading it into LibreOffice and saving it as a PDF. It should be also be possible to automate the process with a script. And probably one has to set out a template first, to ensure the pdf looks nice.

That's how I did it for the CLL 1.0 (well, almost as I the tweaked the page breaks here and there).

If you would send me the HTML I can give it a try.

remod

Il giorno , Robin Lee Powell <rlpo...@digitalkingdom.org> ha scritto:

purpleposeidon

unread,
Nov 19, 2011, 6:10:07 AM11/19/11
to loj...@googlegroups.com
On Sat, Nov 19, 2011 at 2:02 AM, purpleposeidon
<purplep...@gmail.com> wrote:
> http://www.alistapart.com/articles/boom

I was unable to get this to work.

mu'omi'e.djeims.

Robin Lee Powell

unread,
Nov 19, 2011, 8:03:43 AM11/19/11
to loj...@googlegroups.com
http://www.princexml.com/purchase/

Our use is commercial. I'd rather not spend $3,800 on this.

-Robin

M. Nael

unread,
Nov 19, 2011, 8:07:45 AM11/19/11
to loj...@googlegroups.com
Alright, you need to understand that I'm sacrificing here...
If it's not urgent, I can do it by hand; it should talk like two weeks... I only need to settle on one format so that I don't need to do updates after the links are done...
According to that, I really don't want to work on it and then discover it was done in a minute by a program... So I'll wait for a word.
Consider this a payback to the community ;)
co'o
fe'o

Robin Lee Powell

unread,
Nov 19, 2011, 9:26:12 AM11/19/11
to loj...@googlegroups.com
Ummm.

Loading it in LibreOffice and saving as PDF would get you *page
numbered internal references*? *Really*?

If so, I'm thoroughly impressed.

The full (in progress) html file is at
http://vrici.lojban.org/~rlpowell/media/public/tmp/pdf-test-input.html
, but you can use
http://vrici.lojban.org/~rlpowell/media/public/tmp/pdf-test-input-short.html
as an easier test case.

-Robin

> --
> You received this message because you are subscribed to the Google Groups "lojban" group.
> To post to this group, send email to loj...@googlegroups.com.
> To unsubscribe from this group, send email to lojban+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/lojban?hl=en.
>

--

Robin Lee Powell

unread,
Nov 19, 2011, 9:27:32 AM11/19/11
to loj...@googlegroups.com
On Sat, Nov 19, 2011 at 03:07:45PM +0200, M. Nael wrote:
> Alright, you need to understand that I'm sacrificing here...
>
> If it's not urgent, I can do it by hand; it should talk like two
> weeks...

You don't seem to understand the goal here at all. Any solution
that involves by-hand work to convert the source format to all of
the destination formats is completely out ofthe question. The goal
is a completely automated system.

Wayne E. Seguin

unread,
Nov 18, 2011, 9:39:53 AM11/18/11
to loj...@googlegroups.com
How / where do I help?

Robin Lee Powell

unread,
Nov 19, 2011, 9:48:36 AM11/19/11
to loj...@googlegroups.com
So I'm sort of trying to different paths here to get the results I
want: docbook -> html -> pdf and docbook -> latex -> pdf.

If you want to help with the former, I think I outlined the problem
decently below; let me know if not and I'll try to explain further.

Where I think *you* could best help, though, is the problem
described at
http://groups.google.com/group/lojban/browse_frm/thread/2bc7fb6f3c8830fc
, which is a TeX problem (specifically, with redering IPA
characters).

-Robin

Remo Dentato

unread,
Nov 19, 2011, 10:56:14 AM11/19/11
to loj...@googlegroups.com
On Sat, Nov 19, 2011 at 3:26 PM, Robin Lee Powell
<rlpo...@digitalkingdom.org> wrote:
> Ummm.
>
> Loading it in LibreOffice and saving as PDF would get you *page
> numbered internal references*?  *Really*?

Just tried and the HTML is indeed to complex. Will you share the
docbook files too? I never tried importing docbook files into
Libreoffice, not sure what comes out.

remod

Robin Lee Powell

unread,
Nov 19, 2011, 11:19:16 AM11/19/11
to loj...@googlegroups.com
On Sat, Nov 19, 2011 at 04:56:14PM +0100, Remo Dentato wrote:
> On Sat, Nov 19, 2011 at 3:26 PM, Robin Lee Powell
> <rlpo...@digitalkingdom.org> wrote:
> > Ummm.
> >
> > Loading it in LibreOffice and saving as PDF would get you *page
> > numbered internal references*? �*Really*?
>
> Just tried and the HTML is indeed to complex.

Even the short one?

> Will you share the docbook files too? I never tried importing
> docbook files into Libreoffice, not sure what comes out.

http://vrici.lojban.org/~rlpowell/media/public/tmp/cll_pdf_test.xml
is a slightly pared down version, but should do fon this test.

I would be very surprised if it did anything useful.

-Robin

Remo Dentato

unread,
Nov 19, 2011, 12:21:24 PM11/19/11
to loj...@googlegroups.com
On Sat, Nov 19, 2011 at 5:19 PM, Robin Lee Powell

<rlpo...@digitalkingdom.org> wrote:
> On Sat, Nov 19, 2011 at 04:56:14PM +0100, Remo Dentato wrote:
>> On Sat, Nov 19, 2011 at 3:26 PM, Robin Lee Powell
>> <rlpo...@digitalkingdom.org> wrote:
>> > Ummm.
>> >
>> > Loading it in LibreOffice and saving as PDF would get you *page
>> > numbered internal references*?  *Really*?
>>
>> Just tried and the HTML is indeed to complex.
>
> Even the short one?
>
>> Will you share the docbook files too? I never tried importing
>> docbook files into Libreoffice, not sure what comes out.
>
> http://vrici.lojban.org/~rlpowell/media/public/tmp/cll_pdf_test.xml
> is a slightly pared down version, but should do fon this test.
>
> I would be very surprised if it did anything useful.
>

I was hoping to be able to load it into libreoffice and do the changes
with a script or by definining the proper style in a template.

Turned out it was not so simple to do it, as you already had guessed.

You want to fix only the indices, right? Not to augment any
crossreference with the page number.

If you have the PDF "almost right", what about generating a .ps file,
write a script to replace the "Section x.y" with the proper number and
then transform the .ps into a .pdf file?

Robin Lee Powell

unread,
Nov 19, 2011, 7:25:53 PM11/19/11
to loj...@googlegroups.com

Do you have something that will generate .ps from HTML?

This is all somewhat moot anyways, as the docbook -> latex -> pdf
process is actually going reasonably well; I just figured that maybe
someone out there knew a really good way to do html -> pdf that I
wasn't aware of, so I didn't have to solve a bunch of presentation
problems twice (once in XSLT and CSS, and once in XSLT and LaTex).

Robin Lee Powell

unread,
Nov 20, 2011, 5:43:34 AM11/20/11
to loj...@googlegroups.com
On Sat, Nov 19, 2011 at 04:25:53PM -0800, Robin Lee Powell wrote:
>
> This is all somewhat moot anyways, as the docbook -> latex -> pdf
> process is actually going reasonably well; I just figured that
> maybe someone out there knew a really good way to do html -> pdf
> that I wasn't aware of, so I didn't have to solve a bunch of
> presentation problems twice (once in XSLT and CSS, and once in
> XSLT and LaTex).

Confirmed; the LaTeX route is working well now.

If someone has a beatiful, magical way to make the HTML route work,
I'll take it, but it's not worth sinking real time into anymore.

Reply all
Reply to author
Forward
0 new messages