About the word "breaking within" issue

189 views
Skip to first unread message

Jeff_there

unread,
Jun 30, 2009, 11:47:29 AM6/30/09
to dompdf
Hi guys - Sorry to open a new thread on this, but we'd really need to
understand what's going on (and find a workaround), and we are pretty
sure all this is linked to new Apache-PHP distribution. Our story so
far:
- As described in other posts, but not under the same circumstances,
our dompdf-generated PDFs have some words cut in, one letter or more
at the end of a line, the rest on a new line.
- It occurs in table cells and also in simple DIVs (any block element,
for that matter).
- It occurs not only with French strings/files, but also in English
ones, so special characters don't seem to trigger this in particular.
- We implemented dompdf back in 2006-2007 in one site. No such issue.
- We implemented it again in late 2008, for another site. And again,
no such issue.
- But we just discovered it (yesterday), only some time after major
Apache-PHP changes on the hosting service provider side. No dompdf or
dompdf-related files we had developped to create the content had been
changed. Nothing... (but for the content in the related databases, but
we tested and there is no issue here). It tends to confirm what some
people here said: it works with some Apache boxes, it breaks in
others.

The only workaround we may try would be using wordwrap(), stripping
html code first, then reinserting it again. Now, this is sort of
tricky because of the structure of this content, the 'hidden breaking
rule' differing based on layout, etc, within one single file.

Unless I missed something, nobody came up yet with a definitely
working fix. Well, sure we'd really need one, and also this is a pity
because I really think dompdf is a very fine tool, well worth
developping further. Btw, I was really glad to see it restarted again!

So... Any help would be *greatly* appreciated!
Cheers :)

BCage

unread,
Jul 7, 2009, 7:00:40 AM7/7/09
to dompdf
Well, the only workaround I have found is to surround every single
word with a <span> element. This is not a fix, but it works for me.
The function below will wrap your HTML string with span tags and
return the new HTML string. You should only pass the body, it will not
work with doctype and the head tags.

function WrapSpans($htmlString) {
//Split on all space characters
$htmlWords = preg_split('/[[:space:]]/', $htmlString);
$newHtmlString = "";
$insideHtmlTag = false;
//Loop through all the words
foreach ($htmlWords as $word) {
if($word != ' ' && $word != '') {
//Both < and > are in the word
if(strpos($word, "<") !== false && strpos($word, ">") !==
false) {
//Word looks like: <tag>word</tag><otherTag
//so we are inside a tag
if(strrpos($word, "<") > strrpos($word, ">")) {
$insideHtmlTag = true;
}
//Word is a normal tag: <tag>
else {
$insideHtmlTag = false;
}
$newHtmlString .= $word.' ';
continue;
}
//Word contains > character, but not <
if(strpos($word, ">") !== false) {
$insideHtmlTag = false;
$newHtmlString .= $word.' ';
continue;
}
//Word contains < character, but not >
if(strpos($word, "<") !== false) {
$insideHtmlTag = true;
$newHtmlString .= $word.' ';
continue;
}
//Word does not contain < or >
if($insideHtmlTag == false) {
$newHtmlString .= "<span>$word "."</span>";
} else {
$newHtmlString .= $word.' ';
}
} else
$newHtmlString .= $word.' ';
}

return $newHtmlString;

pnomolos

unread,
Jul 7, 2009, 3:55:15 AM7/7/09
to dompdf
Jeff,

Try changing line 153 of text_frame_reflower.cls.php to the following
(or the line where you see $offset being calculated):

$offset = mb_strlen(utf8_encode($str)) - ( substr_count( $str, ' - ' )
* 3 ) - substr_count( $str, '-');

That's what I used to get mine working correctly. I'd imagine if you
have strings with "<word> -<word>" or "<word>- <word>" you'd want to
add a couple more subtractions on to the end of that line. Hopefully
this will help you out!

Cheers,

Phil

Jeff

unread,
Jul 9, 2009, 11:00:41 AM7/9/09
to dompdf
Phil,

I tried your change. It no longer break the word 'layout' (in the
file I uploaded a while ago for others to test), but it still does not
move it to the next line, as it should.

(This is a different Jeff :-)
> > Cheers :)- Hide quoted text -
>
> - Show quoted text -

Philip Schalm

unread,
Jul 9, 2009, 11:13:24 AM7/9/09
to dom...@googlegroups.com
Jeff,

I didn't test it with your file, as this was a change I used to try and get my own files working.  When I get a chance I'll take a look at yours and see if I can come up with something.  Don't know if I'll have time before I leave on vay-cay, though :)

Cheers,

Phil

Jeff_there

unread,
Jul 10, 2009, 8:22:54 AM7/10/09
to dompdf
Thanks guys for all your suggestions!
Pnomolos fix didn't really work for me (maybe I didn't implemented it
correctly).
On the other hand, I tried BCage workaround and it actually does the
trick very well. BCage, many thanks for this and your function! Right
now, I just tested it inserting the <span> tags manually, in one
paragraph, not using your function because of project specifics, but
I'll certainly do that with minor adaptation. Well done... I'm just
having an issue with &nbsp; entities that get scratched out when
comprised within a span, but I'll work around this and, anyway, this
is not such a big problem.
Btw, I'm also using the fix described here:
http://luca.priorelli.com/lang/en-us/2009/05/19/dompdf-justification-extended-ascii-chars/.
It doesn't fix the breaking issue, but allow for special characters to
display correctly. So, used in combination with the span trick,
everything looks fine...
Thanks again!

BCage

unread,
Jul 16, 2009, 9:17:53 AM7/16/09
to dompdf
No problem. I also use the fix you linked to for the special
characters. I'm sure my function could be rewritten as a regular
expression, which would probably make it a whole lot more efficient,
but I'm not proficient enough with those to do it that way. I replaced
nbsp; enitites with another span: <span style="width: 1em;"> </span>.
Which seems to work.

Regards,
Bas

On 10 jul, 14:22, Jeff_there <jeanf.g...@wanadoo.fr> wrote:
> Thanks guys for all your suggestions!
> Pnomolos fix didn't really work for me (maybe I didn't implemented it
> correctly).
> On the other hand, I tried BCage workaround and it actually does the
> trick very well. BCage, many thanks for this and your function! Right
> now, I just tested it inserting the <span> tags manually, in one
> paragraph, not using your function because of project specifics, but
> I'll certainly do that with minor adaptation. Well done... I'm just
> having an issue with &nbsp; entities that get scratched out when
> comprised within a span, but I'll work around this and, anyway, this
> is not such a big problem.
> Btw, I'm also using the fix described here:http://luca.priorelli.com/lang/en-us/2009/05/19/dompdf-justification-....

Marcelo Lopes

unread,
Jul 16, 2009, 10:43:37 AM7/16/09
to dompdf
Hi, I'm new here and after have tried some fixes by myself, I
implemented your suggestions and now everything is working fine.
I'm from Brazil, so, the special characters are simply essential to
us. Thanks again, guys.

Regards,
Marcelo.

AndreasS

unread,
Jul 24, 2009, 7:27:59 AM7/24/09
to dompdf
Hi all,

I think i got it!

The problem is caused by having an undefined character encoding using
mb_xxx fuctions like mb_strlen but a fixed UTF-8 working characterset
for
the DOM. If the encoding is not given, the mb_xx functions use the
default
internal encoding, which might differ from one installation to the
other.

So, the workaround for those who have access to the php.ini is setting
mbstring.internal_encoding to UTF-8 :

[mbstring]

; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
mbstring.internal_encoding = UTF-8

The "nice" solution for DOMPdf would be to add a fixed UTF-8 encoding
to all mb_xxx function calls as it should be independent from external
settings.

I would be interested if this helps on other installatios as I
haven't
done excessive regression tests.

Regards,
Andreas Sassermann

BrianS

unread,
Jul 28, 2009, 10:58:11 PM7/28/09
to dompdf
On Jul 24, 7:27 am, AndreasS <andreas.sasserm...@gmail.com> wrote:
> The problem is caused by having an undefined character encoding using
> mb_xxx fuctions like mb_strlen but a fixed UTF-8 working characterset
> for the DOM. If the encoding is not given, the mb_xx functions use the
> default internal encoding, which might differ from one installation to the
> other.

You are correct, Andreas. I have found in my own testing that setting
your internal encoding can help, but it's not a universal fix for
character encoding and character set problems. I've held off on making
any suggestions while I search for a fix that can be implemented in
the next release. Still, it is worth a shot and you are right to bring
it to everyone's attention.

> So, the workaround for those who have access to the php.ini is setting
> mbstring.internal_encoding to UTF-8 :
>
> [mbstring]
>
> ; internal/script encoding.
> ; Some encoding cannot work as internal encoding.
> ; (e.g. SJIS, BIG5, ISO-2022-*)
> mbstring.internal_encoding = UTF-8

You can also use mb_internal_encoding() <http://www.php.net/manual/en/
function.mb-internal-encoding.php> to set the encoding used by PHP for
the current running script. You can use this function anywhere in
dompdf_config.inc.php.

> The "nice" solution for DOMPdf would be to add a fixed UTF-8 encoding
> to all mb_xxx function calls as it should be independent from external
> settings.

This idea is actually along the lines of the solution I've been
considering. Setting the internal encoding won't be enough to solve
all the problems in this realm. We also have to consider things such
as the encoding of the source document and the resulting PDF document.
We may be able to solve the internal handling of text fairly quickly,
but I suspect a full solution that includes the format of the PDF will
take a bit longer.
-b

Dixon MD

unread,
Oct 18, 2012, 6:49:55 AM10/18/12
to dom...@googlegroups.com
I have solved the issue by changing the line in text_frame_reflower.cls.php page
$offset = mb_strlen($str); changed to $offset = mb_strlen($str, 'utf8');
Reply all
Reply to author
Forward
0 new messages