Do you *really* need XML ? Because otherwise you could simply use the
optimized PHP serialize().
Otherwise I think the simpliest way is to create functions that will
convert PHP-serialization strings from/to XML.
I propose you an organization, the hard work is still to be done : you
have to analyze the format of a PHP-serialize string, to be able to
convert it from/to XML :)
Why work from PHP-serialize ? Because a serialization must include all
the attributes, even if protected of private. Building the serialized
string is not the problem, as the Reflection API or (even simplier)
get_object_vars($this) will provide the values of every attribute
regardless of its visibility. The problem comes when you must rebuild
the object, if you can't call the built-in "unserialize()" you will
not be able to create a new object and set its private or protected
attributes. So you have no choice to unserialize, you *must* call
unserialize() so you *must* be able to build a valid string for this
function.
class Serialize
{
/**
* Serializes a variable
* @param mixed $object
* @return string strict string representation
*/
public static function serialize($object)
{
return serialize($object);
}
/**
* Unserializes a variable
* @param string $string
* @return mixed
*/
public static function unserialize($string)
{
return unserialize($string);
}
}
class XMLSerialize extends Serialize
{
/**
* Serializes a variable
* @param mixed $object
* @return string strict string representation
*/
public static function serialize($object)
{
$string = parent::serialize($object);
$xml = self::stringToXML($string);
return $xml;
}
/**
* Unserializes a variable
* @param string $string
* @return mixed
*/
public static function unserialize($xml)
{
$string = self::XMLToString($xml);
return parent::unserialize($string);
}
/**
* Converts a PHP-serialize string to an XML representation
* @param string $string
* @return string $xml
*/
protected static function stringToXML($string)
{
// ... have fun here ...
}
/**
* Converts an XML representation to a PHP-serialize string
* @param string $xml
* @return string $string
*/
protected static function XMLToString($string)
{
// ... have fun here ...
}
}
To give you some basic hints, here is the serialization format for
main types :
## String ##
s:$length:"$value";
$length is the string's length
$value is the direct (unescaped) string's value
## Integer ##
i:$value;
## Double ##
d:$value;
You will notice some bugs in the float-representation. Another
argument to use serialize() & unserialize() which will get through
those bugs. E.g. serialize(0.33) gives $s = "d:
0.330000000000000015543122344752191565930843353271484375;" and when
you echo unserialize($s) it will display 0.33. Don't ask me how this
works, floats are always crap in every systems anyway.
## Booleans ##
b:$value;
$value is 1 or 0
## NULL ##
N;
## Array ##
a:$length:$hashSerialization
$length is the length of the array
$hashSerialization is the serialization of the key/value pairs in the
array (see "hash serialization below")
## Object ##
O:$classNameLength:"$className":$nbAttributes:$attributesSerialization
$classNameLenght is the length of the className
$className is the className
$nbAttributes is the number of attributes
$attributesSerialization is the serialization of the attribute-name/
attribute-value pairs (see "hash serialization below"). The attribute-
name is built following this rule :
- public attribute : use directly the name
- protected attribute : prefix the name with a star "*"
- private attribute : prefix the name with chr(0) . $className .
chr(0)
## Hash serialization ###
{$pair1;$pair2;...$pairN;}
$pairX is the serialization of the Xth pair in the hash, its
representation is $key;$value, where :
$key is the serialization of the key
$value is the serialisation of the value
## Example ##
class MyClass {
private $privateAttr = "toto";
protected $protAttr = "titi";
public $var = "tata";
}
$v1 = new MyClass;
$v2 = 'a string";';
$v3 = 367;
echo serialize(array($v1, $v2, 'integer' => $v3));
Will display this string :
a:3:{i:0;O:7:"MyClass":3:{s:20:"MyClassprivateAttr";s:4:"toto";s:
11:"*protAttr";s:4:"titi";s:3:"var";s:4:"tata";}i:1;s:10:"a
string";";s:7:"integer";i:367;}
You will notice that no character is escaped when the string is
serialized, and you may think this would cause a corruption. No, there
is no such risk, because the length of the string is given in the
serialization : when the parser reads "s:10:" he just takes the 10
next characters and builds the string with those, without regarding
the characters.
Last advice : don't try to make a single regexp to parse serialized
string. You'd better build a usual parser (going forward character(s)
by character(s)).