<?php
// Create SimpleXMLElement
$foo = new SimpleXMLElement('<foo/>');
// Add an element using addChild() with text
$foo->addChild('bar', 'this&that');
// Add an element using a property
$foo->lol = 'this&that';
// Add a child using addChild() then set text as a property
$foo->addChild('what');
$foo->what = 'this&that';
// Add a child using a property with invalid UTF-8
$foo->magic = "\x80";
// Serialize it
echo $foo->asXML();
?>
On PHP 5.2.6 that outputs:
$ php fo.php
PHP Warning: SimpleXMLElement::addChild(): unterminated entity
reference that in /Users/gsnedders/Desktop/fo.php on line 7
Warning: SimpleXMLElement::addChild(): unterminated entity
reference that in /Users/gsnedders/Desktop/fo.php on line 7
PHP Warning: SimpleXMLElement::asXML(): string is not in UTF-8 in /
Users/gsnedders/Desktop/fo.php on line 20
Warning: SimpleXMLElement::asXML(): string is not in UTF-8 in /Users/
gsnedders/Desktop/fo.php on line 20
<?xml version="1.0"?>
<foo><bar>this</bar><lol>this&that</lol><what>this&that</
what><magic></magic></foo>
On PHP 5.2.4:
$ php fo.php
Warning: SimpleXMLElement::addChild(): unterminated entity
reference that in /home/caius/- on line 7
Warning: main(): unterminated entity reference that in /
home/caius/- on line 14
Warning: SimpleXMLElement::asXML(): string is not in UTF-8 in /home/
caius/- on line 20
<?xml version="1.0"?>
<foo><bar>this</bar><lol>this&that</lol><what>this</what><magic></
magic></foo>
This makes SimpleXML entirely unusable as a serializer: all escaping
of text must happen with in the serializer. On PHP 5.2.6, the second
parameter of SimpleXMLElement::addchild() must have all ampersands
escaped (but nothing else, so htmlspecialchars() is wrong there) —
behaviour of '<' is what would be expected, it is escaped in the
output as "<". If a text node contains any invalid UTF-8 characters
nothing is output.
We currently use SimpleXML in places and DOM in others for
serialization. The behaviour is wacky enough, but the fact it
underwent a backwards incompatible change in 5.2.6 makes me absolutely
certain that we should not use it to build trees and serialize them to
XML. We should use DOM everywhere.
--
Geoffrey Sneddon
<http://gsnedders.com/>
Here on Linux it does not filter the ampersand.. So i'm guessing it
has something to with smiplexml/libxml versions?
> I have created a branch to tackle the switch to DOM
> (http://svn.habariproject.org/habari/branches/090508-dom/). If you'd like
> the contribute to it, It would be greatly appreciate it. I am intending to
> start with converting the atomhandler.
cool. and good luck ;)
--
Matt Read
http://mattread.com
> On Fri, Sep 5, 2008 at 9:04 AM, Ali B. <dmon...@gmail.com> wrote:
>> We have also verified that SimpleXML on Windows behaves different
>> than
>> SimpleXML on linux! On linux the addChild filters the ampersands
>> while on
>> windows it doesn't.
>
> Here on Linux it does not filter the ampersand.. So i'm guessing it
> has something to with smiplexml/libxml versions?
libxml hasn't changed in a very long time. It has to do only with
PHP's inconsistency with itself.