strange font behavior

1,422 views
Skip to first unread message

Philippe

unread,
May 18, 2012, 11:35:43 AM5/18/12
to dom...@googlegroups.com
Hi,

I have a problem with all font except the ones provided as .afm with domPDF.
I have to parse the uncompressed pdf output to change some values (like page number, table of content, etc). When using the provided .afm fonts, everything work. But when i use .ttf, the resulting text output is strange. Every characters are separeted by an uncommon symbol. In notepad++, it's black square with 2 numbers in.
So, I can't parse correctly the pdf. Note that it only affects the text of the pdf not the pdf meta-data.

As said previously, it's my workaround for per-page header/footer and table of contents.


Is it normal? Any way to solve that?
You can get the same effect if you do something like:

$pdfData = $this->pdf->output(array('compress' => 0));
/* Here is the parsing */
header('Content-disposition: attachment; filename=' . $filename . '.pdf');
header('Content-type: application/pdf');
header('Content-Length: ' . strlen($pdfData));
header('Cache-Control: private, max-age=0, must-revalidate');
header('Pragma: public');
echo $pdfData;

BrianS

unread,
May 20, 2012, 9:43:14 PM5/20/12
to dom...@googlegroups.com
On Friday, May 18, 2012 11:35:43 AM UTC-4, Philippe wrote:
I have a problem with all font except the ones provided as .afm with domPDF.
I have to parse the uncompressed pdf output to change some values (like page number, table of content, etc). When using the provided .afm fonts, everything work. But when i use .ttf, the resulting text output is strange. Every characters are separeted by an uncommon symbol. In notepad++, it's black square with 2 numbers in.
So, I can't parse correctly the pdf. Note that it only affects the text of the pdf not the pdf meta-data.

This is normal. Text that uses fonts that have only .AFM files are encoded to Windows ANSI. Font that have a .UFM file are encoded to Unicode. You're seeing "an uncommon symbol" because the text characters are encoded with two bytes instead of one. If your editor doesn't understand this then it will show each byte separately. For single-byte characters (e.g. A-Z) you will see a low-bit marker character followed by the characters itself.

You're probably always going to run into this issue because a PDF is a mixed-encoded document. The control structures are (essentially) ANSI-encoded, while text content can be encoded in a few different ways. Unless your editor allows you to specify encoding for specific regions of text (which is highly unlikely) it won't know how to display the different data.

Also, there are potential problems text-editing a document that contains a mix of text and binary data. And that is that the binary data may become corrupted.

 
As said previously, it's my workaround for per-page header/footer and table of contents.

I'm not sure why you need to work around the current limitations in this way. If you provide details on what you're producing maybe we can come up with another solution.
 

Philippe

unread,
May 22, 2012, 9:14:03 AM5/22/12
to dom...@googlegroups.com
Thanks for clarifying the things. :)

Le dimanche 20 mai 2012 21:43:14 UTC-4, BrianS a écrit :
I'm not sure why you need to work around the current limitations in this way. If you provide details on what you're producing maybe we can come up with another solution.

I have as input some articles (they are in html) and I need to add some elements based on the articles.

- front page: title, subtitle, no page number, date in footer, optional. I can create it and add it to the front of the html inserting a page break after. The problem with this page is the page number to remove and the date in footer.
- table of content: a table with a <caption>, no page number, optional. The problems are the page number as above and to fill the pages of each element in the table. Each element is linked to his target with <a ...>. Page-break after the table.
- A variable number of articles: the page of the first article is numbered with 1. Each page of the article has a little title in the header (it's the title of the article). Each article starts on his own page but span on many pages.

Like I said in the first post, I currently use temporary string to get the page numbers and set them (page title included).

Do you have any solution? I have like one and half week to do that.

Philippe

unread,
May 22, 2012, 10:12:17 AM5/22/12
to dom...@googlegroups.com
I also need to use 100% php solution. No outside binary.

In an old version, I was making 2 pdfs then merging them but the internal links couldn't be made (and page titles were a problem)

Philippe

unread,
May 25, 2012, 2:58:41 PM5/25/12
to dom...@googlegroups.com
Okay, so now I used the little code that someone provided in Issue 225. I modified it to return the first and last page of the render. ($this->_pdf->get_page_number() x2)
It allows per-segment header/footer using css with fixed position. I can now use css "counter(page)" and "counter-increment: page XX" to get current page without modifying the pdf manually.
Also, it allows to set per segment title.

But now, I need a way to stop using dummy strings for my table of contents. The table can be more that 1 page long.
I see 3 solutions:
1) My current one that modify the pdf after rendering
2) I render once the table without page number then I rerender it after everything else rendered (so I know page numbers) and overwrite the old one. (I checked cpdf class and Canvas and I have no idea how to do that)
3) I render the table after all then I insert it on on first or second page (depends of the user's input).

Links need to be kept.
Do you think any of the two last solutions can work?

I'm using version 0.6 beta 3.

Philippe

unread,
May 25, 2012, 4:03:29 PM5/25/12
to dom...@googlegroups.com
If I allocate the correct amount of page (by using a page-break-before's and leaving the page blank), can I go render the table of content on those pages after doing everything else?

To find the number of page required, I could just render on a separate pdf the table of content then discard it. The table has a caption and 2 columns, nothing long to render.

BrianS

unread,
Jun 10, 2015, 6:45:04 PM6/10/15
to dom...@googlegroups.com
So I don't know if you found a solution yet, but I was playing around with this a bit. I've come up with something of a solution, though it does have its limitations.

You can find the HTML and rendered output on the debugger:
http://eclecticgeek.com/dompdf/debug.php?identifier=toc-v0.6.x

The main limitation right now is that your TOC can't be more than one page (at least, not without a lot more work on your part).

Take a look and let me know if you have any questions or need me to expand on it in any way.
-b

Richard Telford

unread,
Jun 1, 2017, 2:35:26 PM6/1/17
to dompdf
This worked really well for me - thanks. As you mentioned, it will only work so long as my TOC fits on a single page, which it does thankfully. Do you know if a more robust solution is planned for the future? Or is it a case of waiting for browser CSS specs to catch up or similar?

Cheers,

Rich

BrianS

unread,
Jul 7, 2017, 10:33:23 AM7/7/17
to dompdf
The CSS specs do provide some functionality we could use to better enable this type of functionality. I'm not sure when we'll be able to get to it, but it's definitely something we're thinking about.
Reply all
Reply to author
Forward
0 new messages