Characters sets

10 views
Skip to first unread message

Christine DUPARD

unread,
Oct 27, 2010, 8:46:06 AM10/27/10
to anthologize-users
On Aug 3, 11:58 pm, Sherman Dorn <sherman.d...@gmail.com> wrote:

> I suspect this is already reported: in at least one of my attempts, an
> apostrophe turned into a-grave, the euro symbol, and TM. Character-set
> confusion!
...

On 08/04/2010 07:51 PM, John Flatness wrote:
...
> The easy way to work around this is to simply append an XML processing
> instruction to the beginning of the text you're importing.

> In context, what I've used is:

> // Specifying the encoding on document construction is important
> $tmpHTML = new DomDocument('1.0', 'UTF-8');
> $tmpHTML->loadHTML('<?xml encoding="UTF-8">' . $content);
...

This work fine when wordpress is installed on a Windows server.
On a Linux server, I had to remplace the line :
$tmpHTML->loadHTML('<?xml encoding="UTF-8"><body>' . $content . '</
body>' );
by :
$tmpHTML->loadHTML('<head><meta http-equiv="Content-Type"
content="text/html; charset=UTF-8" /></head><body>' . $content . '</
body>' );
to make it work correctly.

It would be fine if this modification should be included in future
versions of anthologize.

Thanks
C.D.
Reply all
Reply to author
Forward
0 new messages