UTF8 Characters

2,698 views
Skip to first unread message

Symetric

unread,
Feb 29, 2012, 4:58:00 PM2/29/12
to dompdf
I'm trying to get accented characters (like é) working correctly in a
PDF file generated by dompdf . At first I thought the problem was
caused by dompdf not properly handling the special characters but I am
saving the HTML that I've created for dompdf into a file before
loading it into my dompdf instance and rendering it. The characters
are messed up there as well.

Has anyone run into the issue and can give me insight as to how I can
fix it? The characters seem to be working correctly everything else.
For example, some of text that ends up in the PDF is stored in a MySQL
database and can be manipulated via HTML forms and PHP processing. In
all of these locations, the characters are working correctly. It's
only when I collect everything together to compile it all into a HTML
string that I store a copy of and also send to dompdf to generate a
PDF.

You can see a copy of my HTML string at
http://devserver.symetrichosting.com/dompdf/Audit1330552210.html.
This is exactly the string that is sent to dompdf using load_html().

I am setting the mysql connection encoding to utf8 using
mysql_set_charset( 'utf8', $DB->connection_id ). I am also ensuring
that PHP is using utf-8 by calling mb_internal_encoding('utf-8').

Thanks,

Gabriel Harrison

unread,
Mar 1, 2012, 8:30:58 AM3/1/12
to dom...@googlegroups.com
Hi,

I experienced the pain of getting UTF-8 output working on PHP a year or so ago - not related to dompdf. You may have already done everything I had to do but just in case... 

Are you setting the mysql charset to UTF-8 for all elements of the connection? I had to execute the query: 

SET character_set_results = 'UTF8', character_set_client = 'UTF8', character_set_connection = 'UTF8'

When viewing the resulting HTML file the browsers tries to guess the encoding if it is not specified so IE could be being unhelpful in the debugging efforts it it is simply guessing the correct one.

Are you setting the input type as UTF-8 in dompdf? You may need to use utf8_decode if dompdf is expecting the default charset

Gabriel


BrianS

unread,
Mar 1, 2012, 12:51:40 PM3/1/12
to dom...@googlegroups.com

Ensuring proper character encoding in and between environments (mysql, php, html, pdf) can be tricky. Especially if at any point the characters are stored incorrectly. So for the various environments:
  • MySQL: the database proved to be a particularly thorny issue for me during a database migration where incorrect encodings were used, so this may be your first place to check. Ensure that your database/table collation is set to UTF8. Also ensure that your communications preferences are set correctly as well (as indicated by Gabriel or using SET NAMES 'UTF8'). See the MySQL manual regarding character sets for more info. You might want to use a database management app (PHPMyAdmin or MySQL Workbench) to confirm that everything is stored correctly in the database.
  • PHP: ensure that each script is using UTF8 and that (for dompdf) MBString is enabled
  • HTML: make sure that each page specifies UTF8 as the encoding. If not done on your editing forms the browser could default to latin1. You wouldn't necessarily realize this until you moved the text to a script that was using a different encoding.
  • PDF: you're running into problems before you even get here. However, avoid issues with DOMPDF by setting DOMPDF_ENABLE_UNICODE to true and using a supporting font (such as one of the DejaVu fonts).

Good luck tracking down the source of the issue.

Symetric

unread,
Mar 2, 2012, 5:23:24 PM3/2/12
to dompdf
Thank you both for your replies.

I actually had another UTF-8 issue come up today with another project
I'm working on. The cause there was a call to htmlentities() which
doesn't support multibyte encodings (like UTF-8). I review the code
that generates the HTML for dompdf and sure enough, there was a call
to htmlentities(). After removing it, everything worked great.

Now, since I've removed it, is there any change that I'll run into
issues if any specific characters are included that should've been
translated using htmlentities?

Thanks,

BrianS

unread,
Mar 2, 2012, 10:30:06 PM3/2/12
to dom...@googlegroups.com
On Friday, March 2, 2012 5:23:24 PM UTC-5, Symetric wrote:
I actually had another UTF-8 issue come up today with another project
I'm working on.  The cause there was a call to htmlentities() which
doesn't support multibyte encodings (like UTF-8).  I review the code
that generates the HTML for dompdf and sure enough, there was a call
to htmlentities().  After removing it, everything worked great.

I believe htmlentities() should support UTF-8 just fine so long as you tell it you're passing in a string encoded with UTF-8. Try something like the following:

$newstring = htmlentities($oldstring, ENT_QUOTES, "UTF-8");

Review the docs on this function for more information.
 
Now, since I've removed it, is there any change that I'll run into
issues if any specific characters are included that should've been
translated using htmlentities?

So long as your document is correctly encoded and you have specified the encoded in the document meta tags you should be fine. 

Sem Kurtulus

unread,
Mar 4, 2012, 1:21:35 PM3/4/12
to dom...@googlegroups.com
Hi All,

I am having the same problem but the discussion thread did not help
me. My code looks like this;


$html .= '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$html .= '<html xmlns="http://www.w3.org/1999/xhtml">';
$html .= '<head>';
$html .= '<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" />';

$html .= '</head>';

$html .= '<body>';

$html .= ' 献给母亲的爱';

$html .= '</body> </html>';

$dompdf = new DOMPDF();
$dompdf->load_html($html, 'UTF-8');
$dompdf->set_paper("a4", "landscape" );
$dompdf->render();
$dompdf->stream("Management_Report.pdf");

Not sure what am i doing wrong here. Any help greatly appreciated !!!

Thank you.


On 3/2/12, BrianS <eclect...@gmail.com> wrote:
> On Friday, March 2, 2012 5:23:24 PM UTC-5, Symetric wrote:
>>
>> I actually had another UTF-8 issue come up today with another project
>> I'm working on. The cause there was a call to htmlentities() which
>> doesn't support multibyte encodings (like UTF-8). I review the code
>> that generates the HTML for dompdf and sure enough, there was a call
>> to htmlentities(). After removing it, everything worked great.
>>
>
> I believe htmlentities() should support UTF-8 just fine so long as you tell
> it you're passing in a string encoded with UTF-8. Try something like the
> following:
>
> $newstring = htmlentities($oldstring, ENT_QUOTES, "UTF-8");
>
>

> Review the docs <http://www.php.net/manual/function.htmlentities.php> on


> this function for more information.
>
>
>> Now, since I've removed it, is there any change that I'll run into
>> issues if any specific characters are included that should've been
>> translated using htmlentities?
>>
>
> So long as your document is correctly encoded and you have specified the
> encoded in the document meta tags you should be fine.
>

> --
> You received this message because you are subscribed to the Google Groups
> "dompdf" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/dompdf/-/Y8Lx1x3KghQJ.
> To post to this group, send email to dom...@googlegroups.com.
> To unsubscribe from this group, send email to
> dompdf+un...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/dompdf?hl=en.
>
>

BrianS

unread,
Mar 4, 2012, 8:47:35 PM3/4/12
to dom...@googlegroups.com
On Sunday, March 4, 2012 1:21:35 PM UTC-5, Sem K wrote:
Hi All,

I am having the same problem but the discussion thread did not help
me. My code looks like this;


$html .= '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$html .= '<html xmlns="http://www.w3.org/1999/xhtml">';
$html .= '<head>';
$html .= '<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" />';

$html .= '</head>';

$html .= '<body>';

$html .= ' 献给母亲的爱';

$html .= '</body> </html>';

$dompdf = new DOMPDF();
$dompdf->load_html($html, 'UTF-8');
$dompdf->set_paper("a4", "landscape" );
$dompdf->render();
$dompdf->stream("Management_Report.pdf");

Not sure what am i doing wrong here. Any help greatly appreciated !!!

Thank you.

You're doing everything right but for one oversight. You haven't specified a font that supports the characters in your document body. The core fonts do not support CJK and I don't believe that the DejaVu fonts do either. For testing Chinese character text I've used the Firefly Sung font and had not problems.

Read the Unicode how-to and make sure you use a supporting font (e.g. Firefly Sung). You either need to load the font using load_font.php or reference it in the CSS using @font-face.

Sem Kurtulus

unread,
Mar 5, 2012, 7:43:22 PM3/5/12
to dom...@googlegroups.com
Hi Brian,

Thank you for your response. I read the documentation you have provided and I wanted to clear couple of things on my side.

Our website is in hostmoster and i just got a shell access to execute the command for the load_font.php. But I have no idea about the fonts to include.

By the way, this is going to be for TURKISH. I am sorry i used japanese as a trial, didnt mean to confuse.

So based on my limited understanding of how to load things etc, can you please tell me what i need to do from here on?

Thanks again.

Sem

2012/3/4 BrianS <eclect...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "dompdf" group.
To view this discussion on the web visit https://groups.google.com/d/msg/dompdf/-/xJ-LrzIEBqIJ.

Sem Kurtulus

unread,
Mar 6, 2012, 10:07:34 PM3/6/12
to dom...@googlegroups.com
Hi Brian, and all,

I hope i can get answer for my previous posting... i am really
desperate here. :)

Thank you

On 3/5/12, Sem Kurtulus <sem.ku...@gmail.com> wrote:
> Hi Brian,
>
> Thank you for your response. I read the documentation you have provided and
> I wanted to clear couple of things on my side.
>
> Our website is in hostmoster and i just got a shell access to execute the
> command for the load_font.php. But I have no idea about the fonts to
> include.
>
> By the way, this is going to be for TURKISH. I am sorry i used japanese as
> a trial, didnt mean to confuse.
>
> So based on my limited understanding of how to load things etc, can you
> please tell me what i need to do from here on?
>
> Thanks again.
>
> Sem
>
> 2012/3/4 BrianS <eclect...@gmail.com>
>
>> On Sunday, March 4, 2012 1:21:35 PM UTC-5, Sem K wrote:
>>>
>>> Hi All,
>>>
>>> I am having the same problem but the discussion thread did not help
>>> me. My code looks like this;
>>>
>>>
>>> $html .= '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
>>> Transitional//EN"

>>> "http://www.w3.org/TR/xhtml1/**DTD/xhtml1-transitional.dtd<http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd>
>>> ">'**;
>>> $html .= '<html
>>> xmlns="http://www.w3.org/1999/**xhtml<http://www.w3.org/1999/xhtml>


>>> ">';
>>> $html .= '<head>';
>>> $html .= '<meta http-equiv="Content-Type" content="text/html;
>>> charset=UTF-8" />';
>>>
>>> $html .= '</head>';
>>>
>>> $html .= '<body>';
>>>
>>> $html .= ' 献给母亲的爱';
>>>
>>> $html .= '</body> </html>';
>>>
>>> $dompdf = new DOMPDF();
>>> $dompdf->load_html($html, 'UTF-8');
>>> $dompdf->set_paper("a4", "landscape" );
>>> $dompdf->render();

>>> $dompdf->stream("Management_**Report.pdf");


>>>
>>> Not sure what am i doing wrong here. Any help greatly appreciated !!!
>>>
>>> Thank you.
>>>
>> You're doing everything right but for one oversight. You haven't
>> specified
>> a font that supports the characters in your document body. The core fonts
>> do not support CJK and I don't believe that the DejaVu fonts do either.
>> For
>> testing Chinese character text I've used the Firefly Sung font and had
>> not
>> problems.
>>
>> Read the Unicode how-to

>> <http://code.google.com/p/dompdf/wiki/CPDFUnicode>and make sure you use a
>> supporting font (e.g. Firefly
>> Sung <http://cle.linux.org.tw/fonts/FireFly/>). You either need to load

BrianS

unread,
Mar 6, 2012, 10:31:07 PM3/6/12
to dom...@googlegroups.com
If you're using Turkish then I suggest you try out the DejaVu fonts that are included with 0.6.0 beta 3. Though, actually, I'm not entirely certain you need to do anything special so long as you appropriately encode your document. Do you have some sample text? I can run a sample rendering for you.


On Monday, March 5, 2012 7:43:22 PM UTC-5, Sem K wrote:
Hi Brian,

Thank you for your response. I read the documentation you have provided and I wanted to clear couple of things on my side.

Our website is in hostmoster and i just got a shell access to execute the command for the load_font.php. But I have no idea about the fonts to include.

By the way, this is going to be for TURKISH. I am sorry i used japanese as a trial, didnt mean to confuse.

So based on my limited understanding of how to load things etc, can you please tell me what i need to do from here on?

Thanks again.

Sem

2012/3/4 BrianS
On Sunday, March 4, 2012 1:21:35 PM UTC-5, Sem K wrote:
Hi All,

I am having the same problem but the discussion thread did not help
me. My code looks like this;


$html .= '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0
Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">';
$html .= '<html xmlns="http://www.w3.org/1999/xhtml">';
$html .= '<head>';
$html .= '<meta http-equiv="Content-Type" content="text/html;
charset=UTF-8" />';

$html .= '</head>';

$html .= '<body>';

$html .= ' 献给母亲的爱';

$html .= '</body> </html>';

$dompdf = new DOMPDF();
$dompdf->load_html($html, 'UTF-8');
$dompdf->set_paper("a4", "landscape" );
$dompdf->render();
$dompdf->stream("Management_Report.pdf");

Not sure what am i doing wrong here. Any help greatly appreciated !!!

Thank you.

Reply all
Reply to author
Forward
0 new messages