Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Few confusing things about XmlSerializer and xml serialization

6 views
Skip to first unread message

klem s

unread,
Sep 30, 2010, 4:24:22 PM9/30/10
to
1) If, when object is serialized using BinaryFormatter, runtime is
able to create object graph ( which documents how a set of objects
refer to each other), then why don’t we let runtime figure that out
also when persisting object state to xml? Instead, XmlSerializer
requires us to manually specify dependencies between related objects
( by specifying type information that represents each subelement
nested within the root ).

2)
I assume object can be reconstructed only if the environment receiving
and de-serializing the object-graph also contains assemblies
containing types specified in object graph ( thus, if object is
serialized in Net environment, then it can only be reconstructed in
Net environment )?

3) Isn’t the very definition of the term serialization that we persist
object state in a manner that enables us to reconstruct an object
being serialized? Since XmlSerializer only persists data, but doesn’t
enable us to reconstruct an object, why then do we claim that it
serializes the state of an object?

4) One of arguments why Xml doesn’t also persist each type’s fully
qualified name and name of defining assembly is that this way its
state can be used by any OS, application framework or programming
language. But can’t object state persisted via SoapFormatter also be
used by any OS or app framework or programming language?

I’m asking because SoapFormater does persist the name of the assembly
through the use of namespaces – that way it can be used by both Net
( which would de-serialize the object ) and also by other
environments. So why doesn’t XmlSerializer also persist assembly name
through the use of namespaces?!

5) I can understand how serialization may be useful when persisting
via BinaryFormatter, which enables us to reconstruct the object. But I
fail to see the importance of persisting object to xml format and
transferring it to the system (say Java) that knows nothing about Net
data types and thus isn’t able to reconstruct an object.

Thus, why would persisting object state inside xml be any more useful
than persisting its state inside Sql database file, which would then
be send ( instead of a xml file ) to the remote computer?

Peter Duniho

unread,
Sep 30, 2010, 5:01:45 PM9/30/10
to
klem s wrote:
> 1) If, when object is serialized using BinaryFormatter, runtime is
> able to create object graph ( which documents how a set of objects
> refer to each other), then why don’t we let runtime figure that out
> also when persisting object state to xml? Instead, XmlSerializer
> requires us to manually specify dependencies between related objects
> ( by specifying type information that represents each subelement
> nested within the root ).

Different API, different features and requirements.

SoapFormatter should provide functionality similar to BinaryFormatter,
but using XML as the basic storage format. The XmlSerializer format is
simpler in certain ways and has a different feature set, requiring more
programmer input for certain kinds of needs.

> 2)
> I assume object can be reconstructed only if the environment receiving
> and de-serializing the object-graph also contains assemblies
> containing types specified in object graph ( thus, if object is
> serialized in Net environment, then it can only be reconstructed in
> Net environment )?

It depends on your requirements. If you are asking whether you can get
exactly the same object implementation deserializing objects, then
yes…obviously you must be able to execute the code that went along with
that object in the first place.

But there's nothing preventing some non-.NET environment from
deserializing data that a .NET program serialized, and providing its own
implementation for the class supporting that object type.

Likewise, even in managed code in theory you could provide a different
assembly that has all the same library and type names as the one that
was used originally, to provide that implementation support for the
deserialized objects.

This is true regardless of the format in which the object was serialized.

> 3) Isn’t the very definition of the term serialization that we persist
> object state in a manner that enables us to reconstruct an object
> being serialized? Since XmlSerializer only persists data, but doesn’t
> enable us to reconstruct an object, why then do we claim that it
> serializes the state of an object?

None of the serialization APIs in .NET persist anything except data.
And as you already noted, it is in fact possible to reconstruct object
graphs (for example) deserializing XML-serialized objects, with
additional effort.

You don't even need to be deserializing using .NET code to reconstruct
the basic data structures. You just need an _equivalent_
implementation. You can make such an equivalent implementation using
any computer system you like (practically all are Turing-complete).

> 4) One of arguments why Xml doesn’t also persist each type’s fully
> qualified name and name of defining assembly is that this way its
> state can be used by any OS, application framework or programming
> language. But can’t object state persisted via SoapFormatter also be
> used by any OS or app framework or programming language?

Yes. But take a look at the output generated for each. If you don't
need all the extra information saved with SoapFormatter, why carry it
around?

> I’m asking because SoapFormater does persist the name of the assembly
> through the use of namespaces – that way it can be used by both Net
> ( which would de-serialize the object ) and also by other
> environments. So why doesn’t XmlSerializer also persist assembly name
> through the use of namespaces?!

Different API, different features and requirements.

> 5) I can understand how serialization may be useful when persisting
> via BinaryFormatter, which enables us to reconstruct the object. But I
> fail to see the importance of persisting object to xml format and
> transferring it to the system (say Java) that knows nothing about Net
> data types and thus isn’t able to reconstruct an object.

If it were true that a Java system could not reconstruct an object
serialized in .NET (whatever the serialization method used) then perhaps
one could say that if interoperability with Java were important, one
should not use .NET serialization.

But a) that's not a true statement, and b) sometimes interoperability
with others systems is not important, so even if it were a true
statement, there still would be a place for formats that aren't
interoperable.

Just because data isn't being deserialized into a system built on the
same platform in which the data was originally serialized, that doesn't
mean you can't reconstruct the same data structure. It just means you
need an equivalent implementation of that data structure.

> Thus, why would persisting object state inside xml be any more useful
> than persisting its state inside Sql database file, which would then
> be send ( instead of a xml file ) to the remote computer?

XML is (can be) human readable. It also provides a more direct way to
represent non-tabular object graphs.

It has other features that in many cases make it more desirable than,
for example, using SQL for object persistence. But I'd say those two
are the most common reasons people use XML rather than something else.

Pete

klem s

unread,
Oct 2, 2010, 4:35:31 PM10/2/10
to
In your post you quite often use the term "reconstructing object
graph" ( in my answers I interpret that as if you actually meant
"reconstructing an object" - perhaps my interpretation is off ).
Anyways, here is a related question:

From msdn:

“BinaryFormatter.Deserialize - Deserializes the specified stream into
an object graph.
Return value - The top (root) of the object graph.”

a) Isn’t here the term object graph used incorrectly, since to my
understanding object graph is the information which describes the
dependencies between serialized objects and the values these objects
hold, but I’d argue this information isn’t the object itself?

b) Thus, saying it returns object graph would suggest it returns to
the caller the information on how to build this object, when in fact
it returns the re-constructed object?

c) I assume “the top(root) of the object graph” refers to type Object?


On Sep 30, 11:01 pm, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com>
wrote:


> klem s wrote:
> > 1) If, when object is serialized using BinaryFormatter, runtime is
> > able to create object graph ( which documents how a set of objects
> > refer to each other), then why don’t we let runtime figure that out
> > also when persisting object state to xml? Instead, XmlSerializer
> > requires us to manually specify dependencies between related objects
> > ( by specifying type information that represents each subelement
> > nested within the root ).
>
> Different API, different features and requirements.

Could you elaborate on this? I understand XmlSerializer meets
different requirements, but how would letting runtime figure out
dependencies ( instead of us manually specifying them ) prevent
XmlSerializer from fulfilling those requirements?

> > 2)
> > I assume object can be reconstructed only if the environment receiving
> > and de-serializing the object-graph also contains assemblies
> > containing types specified in object graph ( thus, if object is
> > serialized in Net environment, then it can only be reconstructed in
> > Net environment )?
>
> It depends on your requirements. If you are asking whether you can get
> exactly the same object implementation deserializing objects, then
> yes…obviously you must be able to execute the code that went along with
> that object in the first place.
>
> But there's nothing preventing some non-.NET environment from
> deserializing data that a .NET program serialized, and providing its own
> implementation for the class supporting that object type.
>

I assume by providing its own implementation you’re talking about
defining some class C, which would have field members able to hold de-
serialized field values, which we would manually assign to members of
C?


> > 3) Isn’t the very definition of the term serialization that we persist
> > object state in a manner that enables us to reconstruct an object
> > being serialized? Since XmlSerializer only persists data, but doesn’t
> > enable us to reconstruct an object, why then do we claim that it
> > serializes the state of an object?
>
> None of the serialization APIs in .NET persist anything except data.
> And as you already noted, it is in fact possible to reconstruct object
> graphs (for example) deserializing XML-serialized objects, with
> additional effort.
>

* Uhm, I don’t recall claiming that it’s possible to reconstruct
object graphs ( if by reconstructing object graphs you really mean
reconstructing actual objects ) by de-serializing Xml-serialized
objects. I understand we are able to extract field data of an xml
serialized object, but I don’t know how we would be able to
reconstruct actual object ( see further down my post where I discuss
the meaning of term “reconstruct” ) ?!

> with additional effort.

* I know I’m being repetitive, since I ask the following quite a few
times in this post - by “additional effort” you mean defining class C
with its field members being able to hold de-serialized filed values
and then manually assigning de-serialized values to these field
members?


> You don't even need to be deserializing using .NET code to reconstruct
> the basic data structures.

But we must know in advance the type of fields serialized object holds
and thus define a class with members able to hold values of de-
serialized fields?


>
> > I’m asking because SoapFormater does persist the name of the assembly
> > through the use of namespaces – that way it can be used by both Net
> > ( which would de-serialize the object ) and also by other
> > environments. So why doesn’t XmlSerializer also persist assembly name
> > through the use of namespaces?!
>
> Different API, different features and requirements.
>

So when should we use SOAP and when XML?

> > 5) I can understand how serialization may be useful when persisting
> > via BinaryFormatter, which enables us to reconstruct the object. But I
> > fail to see the importance of persisting object to xml format and
> > transferring it to the system (say Java) that knows nothing about Net
> > data types and thus isn’t able to reconstruct an object.
>
> If it were true that a Java system could not reconstruct an object
> serialized in .NET (whatever the serialization method used) then perhaps
> one could say that if interoperability with Java were important, one
> should not use .NET serialization.
>

I’m not sure if we’re using the same terminology here. Doesn’t the
term reconstruction mean re-creating the serialized object? As you've
said, object serialized in Net can’t be re-created in Java.

But I suspect when you talk about reconstructing an object graph
( serialized in Net ) in Java, you’re talking about defining some Java
class C, which would have field members able to hold de-serialized
field values? Thus, in Java we would de-serialize field values and
MANUALLY assign them to members of C?

cheers

Peter Duniho

unread,
Oct 2, 2010, 6:23:10 PM10/2/10
to
klem s wrote:
> In your post you quite often use the term "reconstructing object
> graph" ( in my answers I interpret that as if you actually meant
> "reconstructing an object" - perhaps my interpretation is off ).

Your concern with respect to "reconstructing an object" seems to be
specifically related to the instantiation of other objects that original
object referred to. By definition, if you have some objects referring
to other objects, the entire collection of objects can be referred to as
a "graph" and it's obviously the _graph_ that's of interest. It's
generally not enough to just recreate each object; you need to fix up
all the references stored in each object that are what define the graph.

I simply added the word "graph" in my description to help make it clear
what the important aspect of the serialization/deserialization is as
it pertains to this discussion.

> Anyways, here is a related question:
>
> From msdn:
>
> “BinaryFormatter.Deserialize - Deserializes the specified stream into
> an object graph.
> Return value - The top (root) of the object graph.”
>
> a) Isn’t here the term object graph used incorrectly, since to my
> understanding object graph is the information which describes the
> dependencies between serialized objects and the values these objects
> hold, but I’d argue this information isn’t the object itself?

The quote is stating the situation correctly. The _graph_ is the
collection of objects along with the relationships between the objects.
That's what BinaryFormatter.Deserialize() is reconstructing.

> b) Thus, saying it returns object graph would suggest it returns to
> the caller the information on how to build this object, when in fact
> it returns the re-constructed object?

Your understanding of "graph" is far too limited. It's quite common to
describe the actual in-memory data structure as a graph.

> c) I assume “the top(root) of the object graph” refers to type Object?

The instance of the root object will have whatever type is correct for
the graph. The Deserialize() has a return type of System.Object, but
the object itself will have whatever type was actually serialized. You
cast it to the correct type so that you can actually _use_ the
deserialized data.

>> Different API, different features and requirements.
>
> Could you elaborate on this? I understand XmlSerializer meets
> different requirements, but how would letting runtime figure out
> dependencies ( instead of us manually specifying them ) prevent
> XmlSerializer from fulfilling those requirements?

You would have to ask the .NET designers if you want to know for sure
the exact details of their design decisions. But, it's quite common in
API design to not provide "the kitchen sink", so to speak. Additional
features mean additional code, which means additional implementation and
maintenance costs, as well as additional overhead for the client code
when it uses the API.

Some clients may have a need for explicit management anyway; for those
clients, automatic support of dependencies may at best result in
processing the output of which is simply ignored, and at worst may
result in output that simply cannot be modified to suit the client's needs.

>> But there's nothing preventing some non-.NET environment from
>> deserializing data that a .NET program serialized, and providing its own
>> implementation for the class supporting that object type.
>>
> I assume by providing its own implementation you’re talking about
> defining some class C, which would have field members able to hold de-
> serialized field values, which we would manually assign to members of
> C?

You could deserialize the data however you like. It might involve
reading the data, then passing the values to a constructor, it might
involve building the deserializing logic into the class, or you might
use some kind of reflection-based system similar to .NETs that does the
same sort of thing .NET would do.

Whatever. It doesn't really matter. The important point is that it can
be done.

>> None of the serialization APIs in .NET persist anything except data.
>> And as you already noted, it is in fact possible to reconstruct object
>> graphs (for example) deserializing XML-serialized objects, with
>> additional effort.
>>
> * Uhm, I don’t recall claiming that it’s possible to reconstruct
> object graphs ( if by reconstructing object graphs you really mean
> reconstructing actual objects ) by de-serializing Xml-serialized
> objects.

You wrote: "XmlSerializer requires us to manually specify dependencies
between related objects".

Which is another way of saying that XmlSerializer doesn't prevent us
from reconstructing the object graph, it simply requires us to do so
manually.

> I understand we are able to extract field data of an xml
> serialized object, but I don’t know how we would be able to
> reconstruct actual object ( see further down my post where I discuss
> the meaning of term “reconstruct” ) ?!

Field data (or in many cases more properly, property data) is the only
thing you have in serialized data. Sometimes those fields are
references to other objects. As long as your serialization method can
allow for storing the data and relationships for all of the objects
involved, the entire graph can be reconstructed.

And all of the serialization methods discussed here allow that.

>> with additional effort.
>
> * I know I’m being repetitive, since I ask the following quite a few
> times in this post - by “additional effort” you mean defining class C
> with its field members being able to hold de-serialized filed values
> and then manually assigning de-serialized values to these field
> members?

See above.

>> You don't even need to be deserializing using .NET code to reconstruct
>> the basic data structures.
>
> But we must know in advance the type of fields serialized object holds
> and thus define a class with members able to hold values of de-
> serialized fields?

That depends entirely on the serialization method. But yes, usually
it's expected that the data types are implicit and that the code
deserializing the data knows what types to expect.

That's not to rule out other serialization methods in which the type
information is embedded. In fact, it's my recollection that both
BinaryFormatter and SoapFormatter include enough type information in the
serialized data that you could in fact reconstruct a _container_ type in
which all of the data could be stored upon deserialization.

Note, of course, that the data in an object does not fully define the
object. There's all the code that goes with an object as well. Some
simple types are just data containers, and those are easy enough to
recreate from scratch based on type information embedded in the
serialized data. But most interesting types have a wealth of
implementation detail, none of which is part of the serialized data.

>>> I’m asking because SoapFormater does persist the name of the assembly
>>> through the use of namespaces – that way it can be used by both Net
>>> ( which would de-serialize the object ) and also by other
>>> environments. So why doesn’t XmlSerializer also persist assembly name
>>> through the use of namespaces?!
>> Different API, different features and requirements.
>>
> So when should we use SOAP and when XML?

You should use SOAP if you're serializing data to be consumed by some
third-party that requires the use of SOAP. I would personally not use
it for internal-only serialization, because of the overhead. Even
BinaryFormatter would be better, and custom serialization can be even
more efficient for that. But even for internal-only stuff, SOAP is
technically fine too.

I tend to use XML, and not even XmlSerializer. I just write simple
methods in my classes to convert an object instance to an
System.Xml.Linq.XElement. Objects that refer to other objects just use
the other object's XElement-generating method to create a child element
for the current object's XElement.

Then to use that code, you just convert the root object to XElement,
store that in an XDocument and save the XDocument.

I do this for two reasons: one, I can easily compress the XML stream
using GzipStream or similar if I want data size efficiency, and two, the
XML is easy to read for a human. And, I don't find much value in
messing around with the various serialization APIs, because IMHO they
don't add enough convenience to justify the constraints they involve
(and the …Formatter-based methods also are fairly inefficient in terms
of data size).

You can make your own decisions about which serialization method is
best, basing those decisions on what your specific goals and
requirements are. Only you know those pieces of information, so only
you can make the decision.

>>> 5) I can understand how serialization may be useful when persisting
>>> via BinaryFormatter, which enables us to reconstruct the object. But I
>>> fail to see the importance of persisting object to xml format and
>>> transferring it to the system (say Java) that knows nothing about Net
>>> data types and thus isn’t able to reconstruct an object.
>> If it were true that a Java system could not reconstruct an object
>> serialized in .NET (whatever the serialization method used) then perhaps
>> one could say that if interoperability with Java were important, one
>> should not use .NET serialization.
>>
> I’m not sure if we’re using the same terminology here. Doesn’t the
> term reconstruction mean re-creating the serialized object? As you've
> said, object serialized in Net can’t be re-created in Java.

"Reconstruction" means recreating the basic data structure. It doesn't
have to be in the exact same memory format, or even the exact same
platform. For example, I can reconstruct a 32-bit integer from the text
"255", and I can do so using _any_ computer environment I want. If I do
so on Intel hardware, the in-memory representation will look one way,
while on PowerPC hardware it will look another (Intel is little-endian,
PowerPC is big-endian).

If you use the word "reconstruction" in a stricter sense, you're free to
do so, but then you limit the ways you can "reconstruct" something.

> But I suspect when you talk about reconstructing an object graph
> ( serialized in Net ) in Java, you’re talking about defining some Java
> class C, which would have field members able to hold de-serialized
> field values? Thus, in Java we would de-serialize field values and
> MANUALLY assign them to members of C?

That depends on your definition of "manually". Java has SOAP-aware
frameworks, for example. So if you're sending SOAP-formatted data, that
doesn't necessarily mean you'll have to write all the code to
deserialize that data yourself.

In fact, that would probably be the main reason to use SOAP, IMHO.
Interoperability with non-.NET platforms.

Pete

klem s

unread,
Oct 3, 2010, 2:21:19 PM10/3/10
to
On Oct 3, 12:23 am, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com> wrote:
> klem s wrote:
> > In your post you quite often use the term "reconstructing object
> > graph" ( in my answers I interpret that as if you actually meant
> > "reconstructing an object" - perhaps my interpretation is off ).
>
> ... By definition, if you have some objects referring

> to other objects, the entire collection of objects can be referred to as
> a "graph" and it's obviously the _graph_ that's of interest.  It's
> generally not enough to just recreate each object; you need to fix up
> all the references stored in each object that are what define the graph.
>
I didn’t mean to imply that reconstructing an object doesn’t also
include restoring all the references held by each object


>
> > Anyways, here is a related question:
>
> > From msdn:
>
> > “BinaryFormatter.Deserialize - Deserializes the specified stream into
> > an object graph.
> > Return value - The top (root) of the object graph.”
>
> > a) Isn’t here the term object graph used incorrectly, since to my
> > understanding object graph is the information which describes the
> > dependencies between serialized objects and the values these objects
> > hold, but I’d argue this information isn’t the object itself?
>
> The quote is stating the situation correctly.  The _graph_ is the
> collection of objects along with the relationships between the objects.
>   That's what BinaryFormatter.Deserialize() is reconstructing.
>

Then we could claim that the following method also returns an object
graph, since returned object of type B is basically a collection of
objects with some type of relationships established between each
other:

object void some_Method()
{
return new B();
}

class A{}
class B : A
{
public C c = new C ();
}
class C{}

BTW – are there situations where the term “reconstructing an object
graph” also includes the reconstruction of object’s behaviour ( I
realize that this makes sense only if either object’s methods are also
persisted or if we serialize object O via BinaryFormatterand and then
also de-serialize O via BinaryFormatter)?


>
> > I understand we are able to extract field data of an xml
> > serialized object, but I don’t know how we would be able to
> > reconstruct actual object ( see further down my post where I discuss
> > the meaning of term “reconstruct” ) ?!
>
> Field data (or in many cases more properly, property data) is the only
> thing you have in serialized data.  

I don’t completely agree with your claim since ( at least with
BinaryFormatter ), there’s additional data stored in object graph that
also describes parent-child relationship of a persisted object – I
wouldn’t consider this information as field / property data?!

> Sometimes those fields are
> references to other objects.  As long as your serialization method can
> allow for storing the data and relationships for all of the objects
> involved, the entire graph can be reconstructed.
>
> And all of the serialization methods discussed here allow that.
>

That’s not strictly true when it comes to XmlSerializer, since it
doesn’t describe parent-child relationship?


>
> >> You don't even need to be deserializing using .NET code to reconstruct
> >> the basic data structures.
>
> > But we must know in advance the type of fields serialized object holds
> > and thus define a class with members able to hold values of de-
> > serialized fields?
>
> That depends entirely on the serialization method.  But yes, usually
> it's expected that the data types are implicit and that the code
> deserializing the data knows what types to expect.
>

What do you mean by “data types being implicit”?

> That's not to rule out other serialization methods in which the type
> information is embedded.  In fact, it's my recollection that both
> BinaryFormatter and SoapFormatter include enough type information in the
> serialized data that you could in fact reconstruct a _container_ type in
> which all of the data could be stored upon deserialization.
>

I’m aware that both BinaryFormatter and SoapFormatter also serialize
assembly and type names, which enables Net application to re-construct
an instance of same type as one that was serialized. Is that what you
meant by reconstructing a _container_type?

> …include enough type information in the serialized data…

Doesn't the above excerpt somewhat contradicts with your statement
that only information persisted is field/property data:

"Field data (or in many cases more properly, property data) is the
only thing you have in serialized data."

> Note, of course, that the data in an object does not fully define the
> object.  There's all the code that goes with an object as well.  Some
> simple types are just data containers, and those are easy enough to
> recreate from scratch based on type information embedded in the
> serialized data.  But most interesting types have a wealth of
> implementation detail, none of which is part of the serialized data.
>

Is there a way to a also serialize the behaviour of an object ( ie
methods ), which could then be de-serialized on some non-Net
environment?

> >>> I’m asking because SoapFormater does persist the name of the assembly
> >>> through the use of namespaces – that way it can be used by both Net
> >>> ( which would de-serialize the object ) and also by other
> >>> environments. So why doesn’t XmlSerializer also persist assembly name
> >>> through the use of namespaces?!
> >> Different API, different features and requirements.
>
> > So when should we use SOAP and when XML?
>
> You should use SOAP if you're serializing data to be consumed by some
> third-party that requires the use of SOAP.  I would personally not use
> it for internal-only serialization,

By internal-only serialization are you referring to data that gets
serialized and de-serialized only within the same application ( thus
data that won’t get de-serialized by external app )?


BTW - my book claims that strictly speaking XmlSerializer does not
persist state using object graph. XmlSerializer persists field/
property data and it also indirectly persists a relationship between
objects ( via subelements ), so why would book make such claims?

thank you

Peter Duniho

unread,
Oct 3, 2010, 8:49:10 PM10/3/10
to
klem s wrote:
> [...]

>> The quote is stating the situation correctly. The _graph_ is the
>> collection of objects along with the relationships between the objects.
>> That's what BinaryFormatter.Deserialize() is reconstructing.
>>
> Then we could claim that the following method also returns an object
> graph, since returned object of type B is basically a collection of
> objects with some type of relationships established between each
> other:
>
> object void some_Method()
> {
> return new B();
> }

You could indeed claim that. It's not a very interesting object graph,
and it's not typically why one would introduce the idea of an object
graph into the discussion. But it wouldn't be wrong per se.

Personally, I think it's more useful to talk of the object graph when
you're dealing with a pre-existing graph data structure, and are
manipulating or otherwise addressing that entire data structure. But
it's true: the simplest graph requires only two objects, and you can say
that returning one object that refers to another is in fact the same as
returning an object graph, however degenerate that case may be.

> [...]


> BTW – are there situations where the term “reconstructing an object
> graph” also includes the reconstruction of object’s behaviour ( I
> realize that this makes sense only if either object’s methods are also
> persisted or if we serialize object O via BinaryFormatterand and then
> also de-serialize O via BinaryFormatter)?

IMHO, an object graph is the data structure itself, defining the
relationships between objects. Even if you could persist the
implementation as well as the data (see below), I wouldn't say that
doing so _changes_ anything.

That is, if reconstructing the object graph includes the implementation
as well as the data, then doing the same thing without the
implementation could also be considered reconstructing the object graph.
The fact that implementation is being persisted doesn't seem very
relevant to me. The "graph" is a state of data, not of implementation
(though of course one needs implementation to make any sense of the data).

>> Field data (or in many cases more properly, property data) is the only
>> thing you have in serialized data.
>
> I don’t completely agree with your claim since ( at least with
> BinaryFormatter ), there’s additional data stored in object graph that
> also describes parent-child relationship of a persisted object – I
> wouldn’t consider this information as field / property data?!

Why not? The graph has to be stored in fields _somewhere_. Most
commonly, this is in the form of object references within the objects
that make up the graph themselves. In that case, _obviously_ even the
graph information being persisted is coming from field or property data.

But even if the graph is being stored outside the objects that make up
the graph (perhaps you've got a big table or something describing the
relationships), then that data is being stored in _some_ object's fields
or properties, and persisting that data is saving that object's fields
or properties.

In OOP, you have two things: data, and implementation. Serialization
practically always persists only data. And the data is always being
stored in object fields _somewhere_.

>> Sometimes those fields are
>> references to other objects. As long as your serialization method can
>> allow for storing the data and relationships for all of the objects
>> involved, the entire graph can be reconstructed.
>>
>> And all of the serialization methods discussed here allow that.
>>
> That’s not strictly true when it comes to XmlSerializer, since it
> doesn’t describe parent-child relationship?

Of course it does. It requires more work, but nested XML elements are
commonly used with XmlSerializer to describe parent/child relationships.
For more complex graphs, one can instead use a referencing technique
(e.g. each object has a unique identifier, and then that identifier is
used in the serialized XML for any other object that refers to the object).

>> That depends entirely on the serialization method. But yes, usually
>> it's expected that the data types are implicit and that the code
>> deserializing the data knows what types to expect.
>>
> What do you mean by “data types being implicit”?

I mean that you cannot tell by inspecting the serialized data what the
types are. For example, if I write the text "17", can you tell me
whether that is supposed to be deserialized as a byte, a short, an int,
a float, a double, a decimal, or a string? No. The data types are
implied by — that is, they are implicit — the serialization method that
was used.

But if I include some meta-data that goes along with the data itself,
and which describes the format of the data, then the data types are
stored _explicitly_ in the serialized data.

>> That's not to rule out other serialization methods in which the type
>> information is embedded. In fact, it's my recollection that both
>> BinaryFormatter and SoapFormatter include enough type information in the
>> serialized data that you could in fact reconstruct a _container_ type in
>> which all of the data could be stored upon deserialization.
>>
> I’m aware that both BinaryFormatter and SoapFormatter also serialize
> assembly and type names, which enables Net application to re-construct
> an instance of same type as one that was serialized. Is that what you
> meant by reconstructing a _container_type?

No. By "reconstructing a container type", I mean that if you've got
(for example) a class A that looks like this:

class A
{
public int IntValue { get; set; }
public byte ByteValue { get; set; }
public float FloatValue { get; set; }
}

…then one could write code to deserialize that data that actually built
a whole new type at run-time to hold that data. It would be more
complicated, of course, and you'd have to execute an entirely custom
deserialization library rather than using BinaryFormatter (for example).
But it could be done.

>> …include enough type information in the serialized data…
>
> Doesn't the above excerpt somewhat contradicts with your statement
> that only information persisted is field/property data:
>
> "Field data (or in many cases more properly, property data) is the
> only thing you have in serialized data."

Not if you understand the context of both statements, no. The point of
the quoted statement was to contrast the idea of reconstructing an
entire object (data and implementation) versus reconstructing just the
data in an object. The question of meta-data stored within the
serialized data is relevant in that context.

If you try to read the statement out of context then yes, you might wind
up confused. :)

>> Note, of course, that the data in an object does not fully define the
>> object. There's all the code that goes with an object as well. Some
>> simple types are just data containers, and those are easy enough to
>> recreate from scratch based on type information embedded in the
>> serialized data. But most interesting types have a wealth of
>> implementation detail, none of which is part of the serialized data.
>>
> Is there a way to a also serialize the behaviour of an object ( ie
> methods ), which could then be de-serialized on some non-Net
> environment?

There's always a way. That's what makes computers so useful. They can
be programmed to do just about anything (not counting performance
constraints, of course).

For example, one could serialize not just the object data, but also the
implementation itself as MSIL (easy enough to get using reflection).
Then the receiver would "only" have to recompile that IL into something
that could execute on its end.

Of course, that depends somewhat on how you define "non-.NET
environment". Some people might say that any environment that can
successfully compile and execute MSIL must be in fact a .NET
environment. But maybe we stipulate that the deserialization
environment handles only simple IL, without a full .NET library behind
it. Or maybe we use some other computer language to represent the
implementation. Or whatever.

The point is, code is data, and so you can always send code along with
your data. The trick is interpreting the data that is code as well as
the data that is data. Data that is data is usually much easier to deal
with than data that is code. But it's still always possible to deal
with data that is code, if you really want to.

Not that I think this is in any way relevant to the real world. It
would be highly impractical as a general rule to include implementation
as part of the serialization, and for sure there's nothing in .NET that
does that. A key feature of the distinction between code and data is
that one generally has a lot more of the latter than the former. Why
would you incur all the overhead of repeatedly serializing the code as
well as the data? In general, it would always make more sense to just
serialize the data and simply require the deserialization side to
provide its own pre-written implementation.

>> You should use SOAP if you're serializing data to be consumed by some
>> third-party that requires the use of SOAP. I would personally not use
>> it for internal-only serialization,
>
> By internal-only serialization are you referring to data that gets
> serialized and de-serialized only within the same application ( thus
> data that won’t get de-serialized by external app )?

Yes.

> BTW - my book claims that strictly speaking XmlSerializer does not
> persist state using object graph. XmlSerializer persists field/
> property data and it also indirectly persists a relationship between
> objects ( via subelements ), so why would book make such claims?

It's all about context and the exact wording. It's true that
XmlSerializer by default isn't very good at serializing arbitrary object
graphs. Perhaps that's what the book is talking about.

In any case, I have no idea what book you're talking about, nor do I
know the exact text to which you're referring, nor do I have any
particular insight as to why the author of the book might have written
one thing versus another. So no matter what, there's no really any good
way for me to answer a question like that.

Pete

klem s

unread,
Oct 5, 2010, 3:41:51 PM10/5/10
to
On Oct 4, 2:49 am, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com> wrote:
> klem s wrote:
> >> Field data (or in many cases more properly, property data) is the only
> >> thing you have in serialized data.
>
> > I don’t completely agree with your claim since ( at least with
> > BinaryFormatter ), there’s additional data stored in object graph that
> > also describes parent-child relationship of a persisted object – I
> > wouldn’t consider this information as field / property data?!
>
> Why not? The graph has to be stored in fields _somewhere_. Most
> commonly, this is in the form of object references within the objects
> that make up the graph themselves. In that case, _obviously_ even the
> graph information being persisted is coming from field or property data.
>
I assume you mean that with each object in object graph an additional
field(s) is serialized, which describes object’s dependencies/
references? Thus if each object in object graph is assigned a unique
numerical value, then information in these additional fields somehow
describes how an object is related to other objects in graph?

And thus, if app de-serializing this object graph knows how to
interpret these additional fields, it can correctly reconstruct an
object graph?


>
> >> Sometimes those fields are
> >> references to other objects. As long as your serialization method can
> >> allow for storing the data and relationships for all of the objects
> >> involved, the entire graph can be reconstructed.
>
> >> And all of the serialization methods discussed here allow that.
>
> > That’s not strictly true when it comes to XmlSerializer, since it
> > doesn’t describe parent-child relationship?
>
> Of course it does. It requires more work, but nested XML elements are
> commonly used with XmlSerializer to describe parent/child relationships.

I understand your point that with some effort we can also serialize
into xml relationships between objects, but I assume in next example
XmlSerializer doesn’t store the parent-child relationship, since no
sub-element is created for type A:

XmlSerializer xmlFormat = new XmlSerializer ( typeof( B ),
new Type[] { typeof( A ), typeof( D ) } )

class A{}
class B:A { D d = new D(): }
class D{}


> >> That depends entirely on the serialization method. But yes, usually
> >> it's expected that the data types are implicit and that the code
> >> deserializing the data knows what types to expect.
>
> > What do you mean by “data types being implicit”?
>
> I mean that you cannot tell by inspecting the serialized data what the
> types are. For example, if I write the text "17", can you tell me
> whether that is supposed to be deserialized as a byte, a short, an int,
> a float, a double, a decimal, or a string? No. The data types are
> implied by — that is, they are implicit — the serialization method that
> was used.
>
> But if I include some meta-data that goes along with the data itself,
> and which describes the format of the data, then the data types are
> stored _explicitly_ in the serialized data.
>
> >> That's not to rule out other serialization methods in which the type
> >> information is embedded. In fact, it's my recollection that both
> >> BinaryFormatter and SoapFormatter include enough type information in the
> >> serialized data that you could in fact reconstruct a _container_ type in
> >> which all of the data could be stored upon deserialization.
>

So BinaryFormatter and SoapFormatter also somehow serialize
information which describes the types of serialized fields. But do the
two include type information only for primitive types?

> > I’m aware that both BinaryFormatter and SoapFormatter also serialize
> > assembly and type names, which enables Net application to re-construct
> > an instance of same type as one that was serialized. Is that what you
> > meant by reconstructing a _container_type?
>
> No. By "reconstructing a container type", I mean that if you've got
> (for example) a class A that looks like this:
>
> class A
> {
> public int IntValue { get; set; }
> public byte ByteValue { get; set; }
> public float FloatValue { get; set; }
> }
>
> …then one could write code to deserialize that data that actually built
> a whole new type at run-time to hold that data.

Slightly off topic, but are you saying that Net (and perhaps some
other frameworks also) enables us to define new class at runtime?

thank you

Peter Duniho

unread,
Oct 5, 2010, 11:24:22 PM10/5/10
to
klem s wrote:
>> Why not? The graph has to be stored in fields _somewhere_. Most
>> commonly, this is in the form of object references within the objects
>> that make up the graph themselves. In that case, _obviously_ even the
>> graph information being persisted is coming from field or property data.
>>
> I assume you mean that with each object in object graph an additional
> field(s) is serialized, which describes object’s dependencies/
> references? Thus if each object in object graph is assigned a unique
> numerical value, then information in these additional fields somehow
> describes how an object is related to other objects in graph?
>
> And thus, if app de-serializing this object graph knows how to
> interpret these additional fields, it can correctly reconstruct an
> object graph?

I don't think you are quite getting my precise point. Consider this
example:

class A
{
public string Name { get; set; }
public B B { get; set; }
}

class B
{
public string Name { get; set; }
}

Also suppose that the fields are initialized as A.Name = "A.Name", and
B.B as an instance of B with B.Name = "B.Name".

Then this may be serialized like this:

<A>
<Name>A.Name</Name>
<B>
<Name>B.Name</Name>
</B>
</A>

Nothing fancy. Just using the normal object relationships.

Of course, if you have a more complicated object graph, it may not be
sufficient to just represent a parent/child relationship. Then each
object may wind up with some kind of unique ID, if not inherent in the
data then generated by the serialization process. With that unique ID,
one can then include object references in the serialized data, rather
than actually enclosing one object in its parent.

But either way, the relationship is maintained, and that information
represents fields in the objects that store the object references
themselves.

>> Of course it does. It requires more work, but nested XML elements are
>> commonly used with XmlSerializer to describe parent/child relationships.
>
> I understand your point that with some effort we can also serialize
> into xml relationships between objects, but I assume in next example
> XmlSerializer doesn’t store the parent-child relationship, since no
> sub-element is created for type A:
>
> XmlSerializer xmlFormat = new XmlSerializer ( typeof( B ),
> new Type[] { typeof( A ), typeof( D ) } )
>
> class A{}
> class B:A { D d = new D(): }
> class D{}

You assume incorrectly.

First, you need to understand that the class A is not a child of the
class B. Thus, there is no parent/child relationship to emit.

Second, if in your example the class B _did_ have a public member
referencing a child object, XmlSerializer would in fact emit that child
object as part of the serialized data, by representing it as a nested
element within the XML.

For example:

using System;
using System.IO;
using System.Xml.Serialization;

namespace TestXmlSerializer
{
public class Base
{
public int baseInt = 31;
}

public class Derived : Base
{
public Other other = new Other();
}

public class Other
{
public int i = 17;
}

class Program
{
static void Main(string[] args)
{
XmlSerializer serializer = new XmlSerializer(typeof(Derived));

using (StringWriter writer = new StringWriter())
{
serializer.Serialize(writer, new Derived());
Console.WriteLine(writer.ToString());
}

Console.ReadLine();
}
}
}

That code emits the following XML:

<?xml version="1.0" encoding="utf-16"?>
<Derived xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<baseInt>31</baseInt>
<other>
<i>17</i>
</other>
</Derived>

Note that the instance of the Other class, in the field "other", is
included as a nested element of the Derived object. Note also that the
public member in the base class A is emitted, not as a child of the
"Derived" element, but simply as a normal member of the "Derived"
element, just as it is in the language object model.

> So BinaryFormatter and SoapFormatter also somehow serialize
> information which describes the types of serialized fields. But do the
> two include type information only for primitive types?

Assuming my recollection is correct, they would include type information
for member fields exactly as they do for the objects being serialized.
It would support all types, not just "primitive" ones (noting that in
any case, in .NET it's very difficult to pin down what a "primitive"
type really would be…there are built-in types, but that doesn't make
them any more "primitive" than user-defined types).

> Slightly off topic, but are you saying that Net (and perhaps some
> other frameworks also) enables us to define new class at runtime?

Yes. Dynamic code generation is not actually an unusual feature, and
.NET includes multiple mechanisms for accomplishing that. It's used
internally in XmlSerializer and the System.Text.RegularExpressions.Regex
class, for example, and it's available to client code as well.

Pete

klem s

unread,
Oct 7, 2010, 8:14:52 PM10/7/10
to
Ok, I've already posted two replies but for some reason google groups
doesn't display them. I will try yet again and hopefully this time it
will work.

On Oct 6, 5:24 am, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com> wrote:
> klem s wrote:
> >> Why not?  The graph has to be stored in fields _somewhere_.  Most
> >> commonly, this is in the form of object references within the objects
> >> that make up the graph themselves.  In that case, _obviously_ even the
> >> graph information being persisted is coming from field or property data.
>
> > I assume you mean that with each object in object graph an additional
> > field(s) is serialized, which describes object’s dependencies/
> > references? Thus if each object in object graph is assigned a unique
> > numerical value, then information in these additional fields somehow
> > describes how an object is related to other objects in graph?
>
> > And thus, if app de-serializing this object graph knows how to
> > interpret these additional fields, it can correctly reconstruct an
> > object graph?
>
> I don't think you are quite getting my precise point.  

I should be more specific in my post, but I was actually asking more
about how dependencies are stored when serializing via
BinaryFormatter. Knowing that, is my assumption about how dependencies
are Serialized via BinaryFormatter more or less correct?

I already knew all of that :). What I was pointing out is that Derived
class is a child of Base class, but this parent child relationship
isn’t recorded in xml. Thus, just by looking at resulting xml one
would never figure out that Derived is child of a Base.

BTW - I realize I was wrong when suggesting we should create
subelement A to indicate parent child relationship, since subelements
indicate members of Derived and not parents/children – I had one of my
brain fart episodes and as such wasn’t thinking clearly

thank you

Peter Duniho

unread,
Oct 8, 2010, 11:10:07 PM10/8/10
to
klem s wrote:
> I should be more specific in my post, but I was actually asking more
> about how dependencies are stored when serializing via
> BinaryFormatter. Knowing that, is my assumption about how dependencies
> are Serialized via BinaryFormatter more or less correct?

I haven't looked at the specific data format for BinaryFormatter.
Object dependencies could be handled in a variety of ways.

However, a simple test reveals that BinaryFormatter almost certainly
deals with object references in a way like what you're asking about.

Specifically, look at this code:


using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;

namespace TestBinaryFormatter
{
[Serializable]
public class Child
{
public byte[] Data { get; set; }

public Child(int dataSize)
{
Data = new byte[dataSize];
}
}

[Serializable]
public class Container
{
public List<Child> Children { get; set; }

public Container(int count, int dataSize)
{
Child child = new Child(dataSize);

Children = new List<Child>(count);

while (count-- > 0)
{
Children.Add(child);
}
}
}

class Program
{
static void Main(string[] args)
{

int[] sizes = { 10, 100, 1000 };
int[] counts = { 1, 100, 10000 };
MemoryStream stream = new MemoryStream();
BinaryFormatter formatter = new BinaryFormatter();

foreach (int size in sizes)
{
foreach (int count in counts)
{
Container container = new Container(count, size);

stream.SetLength(0);
formatter.Serialize(stream, container);

Console.WriteLine("Serialized data length: {0}",
stream.Length);
}
}

Console.ReadLine();
}
}
}

If you run the code, you'll find that the amount of data directly
correlates to the number of entries in the container's collection, but
changes very little according to the size of the child object being
contained.

If the child object were being copied, as it would be in XmlSerializer,
the data sizes for the output data would grow dramatically not just as
the number of child objects goes up, but also as the size of the child
object itself goes up.

So, the child object must just be serialized once, and then some kind of
reference to it stored within the containing object's serialized data.

> I already knew all of that :).

Define "that". Because this statement:

> What I was pointing out is that Derived
> class is a child of Base class,

…is definitely not true. The Derived class is a _sub-class_ of the Base
class. It has no parent/child relationship. It's simply an instance of
Base, by virtue of having inherited that class.

> but this parent child relationship
> isn’t recorded in xml. Thus, just by looking at resulting xml one
> would never figure out that Derived is child of a Base.

There's no need for the XML, or any serialization format, to preserve
that information.

> BTW - I realize I was wrong when suggesting we should create
> subelement A to indicate parent child relationship, since subelements
> indicate members of Derived and not parents/children – I had one of my
> brain fart episodes and as such wasn’t thinking clearly

You seem to be changing your definition of "parent/child relationship".
In general, we use "parent" and "child" to refer to completely
different objects that have a specific referencing relationship.
Specifically, the parent is an object that references the child in some
type of hierarchy (typically a tree data structure, which is a form of
graph).

Pete

klem s

unread,
Oct 9, 2010, 3:27:52 PM10/9/10
to
On Oct 9, 5:10 am, Peter Duniho <NpOeStPe...@NnOwSlPiAnMk.com> wrote:

Uh, I’ve “missed” ( or somehow forgot about it ) the following
statement from your previous post, else I’d already address the issue
in previous post:

> > First, you need to understand that the class A is not a child of the
> > class B. Thus, there is no parent/child relationship to emit.

> [...]


> > I already knew all of that :).
>
> Define "that".  Because this statement:
>
> > What I was pointing out is that Derived
> > class is a child of Base class,
>
> …is definitely not true.  The Derived class is a _sub-class_ of the Base
> class.  It has no parent/child relationship.  It's simply an instance of
> Base, by virtue of having inherited that class.
>

It was only in your last post that I’ve noticed that you use the term
parent/child to describe different type of relationship ( aka the
relationship between a class and its members ), but if I’m not
mistaken parent/child term also applies to base/derived class
relationship?!

Namely, book I’m reading often uses this term to describe relationship
between the base class and the derived class. I’ve also seen quite a
few articles on the net using the term parent/child when talking about
base/derived classes

> > but this parent child relationship
> > isn’t recorded in xml. Thus, just by looking at resulting xml one
> > would never figure out that Derived is child of a Base.
>
> There's no need for the XML, or any serialization format, to preserve
> that information.

Why wouldn't it be important to also persist base/derived class
relationship?


>
> > BTW - I realize I was wrong when suggesting we should create
> > subelement A to indicate parent child relationship, since subelements
> > indicate members of Derived and not parents/children – I had one of my
> > brain fart episodes and as such  wasn’t thinking clearly
>
> You seem to be changing your definition of "parent/child relationship".

Not really. I always associated the term with base/derived class
relationship ( in this thread the only time I’ve associated the term
with different kind of relationship ( and even than implicitly ) was
with statement “ I already knew all of that”, which was a response to
your explanation that all public members get persisted by
XmlSerializer ( and you used the term to describe class/members
relationship )


>   In general, we use "parent" and "child" to refer to completely
> different objects that have a specific referencing relationship.
> Specifically, the parent is an object that references the child in some
> type of hierarchy (typically a tree data structure, which is a form of
> graph).
>

So parent/child should be used to describe class/member relationship?

So throughout the entire thread when we talked about parent/child
relationships, I was thinking of base/derived class relationship and
you of class/members relationships?!

thank you

Peter Duniho

unread,
Oct 9, 2010, 3:43:39 PM10/9/10
to
klem s wrote:
> [...]

> It was only in your last post that I’ve noticed that you use the term
> parent/child to describe different type of relationship ( aka the
> relationship between a class and its members ), but if I’m not
> mistaken parent/child term also applies to base/derived class
> relationship?!

It can. But it's not something I'd advise, due to the potential for
confusion with "parent/child" in the context of complex data structures.
The concept of "parent/child" is well-known and applicable even
outside of OOP. To reuse the term for a completely different concept
within the context of OOP, especially when we already have a perfectly
workable concept of "base/sub-class" or even "super/sub-class", is to
simply invite confusion.

> Namely, book I’m reading often uses this term to describe relationship
> between the base class and the derived class. I’ve also seen quite a
> few articles on the net using the term parent/child when talking about
> base/derived classes

I've never seen it used regularly in the context of C#, Java, or C++. I
have seen it used in the context of PHP, and in fact as near as I can
tell that usage is "baked in" to the language (i.e. the language itself
actually uses the word "parent" to describe the base class).

I think it's particularly dangerous to use "parent/child" to refer to
_both_ the class hierarchy and the object hierarchy in the same
discussion. And since we have definitely been talking about
parent/child objects in an object hierarchy, we definitely should not
also use the phrase "parent/child" to describe a class hierarchy.

>>> but this parent child relationship
>>> isn’t recorded in xml. Thus, just by looking at resulting xml one
>>> would never figure out that Derived is child of a Base.
>> There's no need for the XML, or any serialization format, to preserve
>> that information.
>
> Why wouldn't it be important to also persist base/derived class
> relationship?

Because that's an implementation detail and serialization does not
generally concern itself with implementation (as noted previously).

All you really care about are the data that are included in an instance
of the object. That's what the serialization is doing for you. There's
only one instance of the object, and it has some data. There's no
reason to care whether the data is part of that object because of
members declared in a base class or a derived class. It's all just data.

> [...]


> So throughout the entire thread when we talked about parent/child
> relationships, I was thinking of base/derived class relationship and
> you of class/members relationships?!

It would seem so.

Pete

klem s

unread,
Oct 9, 2010, 4:50:24 PM10/9/10
to
much appreciated
0 new messages