Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

How to to convert object to XML string and back again

9 views

Skip to first unread message

ap...@student.open.ac.uk

unread,

Dec 18, 2007, 11:16:31 AM12/18/07

As a thought experiment I am considering how I would recode a java
program I have been working on in C++. This is an interesting
experiment for me because the program is in an environment where a
number of enterprise technologies are being used so it is interesting
to see how java and C++ differ in their support for enterprise
technologies. I stress it is only a thought experiment. I will NOT do
the coding, in fact the project is trying to migrate away from C++,
that is why I have rewritten the servers in java. The experience is
making me think that enterprise programming is easier in java because
of all the packages that have been developed over the years, some std,
some non-std but widely used open source.

The program converts objects to XML strings and back again using the
reflection-based package XStream. I am wondering how one can do a
similar job in C++. I realise there is no reflection but I don't mind
a bit of manual work. In C++ I expect that most solutions (except
those that use macros) will involve writing save and restore methods
for each private member that is to be serialized.

I have not done much XML work in C++ but when I have I remember it
being very painful. I have used Xerces and never again! Gnome's libxml
is much better IMO but I have not used it to converts objects to XML
strings and back again, only to traverse the DOM. libxml is not too
bad for that, but can it also be used for stringyfying objects? If not
then I wonder what people use.

Regards,

Andrew Marlow

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Le Chaud Lapin

unread,

Dec 19, 2007, 3:10:27 PM12/19/07

I use serialization. It works.

As far as serializing strings, that's possible: It is not
inconceivable to make a serialization framework where the target of
serialization converts types and values to a string encode of those
types and values [int:9107], storing them in an associative polyarchy
(my terminology) in XML like format, but not using XML, as XML is
grossly overrated. [XML is one of those things that some engineers
find insanely appealing without really understanding what about it
exactly is so appealing, which leads to inflated expectations of what
it can do for them, which is most often much less than they think.]

In any case, Java programmers, and programmers in some other
"interpreted" languages, should know by now that it is farcical to
make comparisons between C++ and Java. C++ is a language whose target
code is interpreted by a CPU. Java is a language whose target code is
interpreted by target code that is interpreted by a CPU. The
intermediate layer, the JVM, or Java diaper as some of us like to call
it, creates an environment that would be strange to C++, but is a
necessity for Java, and this inseparability leads one to ask whether
Java is a language or a language+environment.

On the matter of XML, I have been admonishing other engineers for
years that there will never be a system that magically converts from
XML to C++, or vice-versa, because the very proposition is senseless.
Most of them don't listen....there are 1000's of programmers around
the world today banging their heads trying to find a "breakthrough"
method for doing this. They think there is something magical about
XML, not realizing that the magic comes from a special computer, the
one sitting at the top of their neck, and that the other computer, the
one with 500 million transistors, not being artificially-intelligent,
is horribly inept at magic, and only does things by-the-book, and will
never be able to have a conversation of any kind with anything,
especially strings on disks, without gross changes to the C++
compiler, ultimately under the control of a human who has deliberate
and specified intent.

I get asked about once every 3-4 months by a Java/C#/C++-Convert
programmer if I know of a good package to do XML to C++ conversion. I
respond by asking them an equally senseless, equally irritating
question.

-Le Chaud Lapin-

softwa...@gmail.com

unread,

Dec 19, 2007, 7:56:28 PM12/19/07

On Dec 19, 3:10 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:

> I get asked about once every 3-4 months by a Java/C#/C++-Convert
> programmer if I know of a good package to do XML to C++ conversion. I
> respond by asking them an equally senseless, equally irritating
> question.
>
> -Le Chaud Lapin-

It may not be up to the poster, they could be communicating with some
other app/system/library that needs xml for something. There are many
things about our jobs that we have no control over.

I beleive that maybe with more work than you would probably do in a
language with reflection, you can look at boost serialization
http://www.boost.org/libs/serialization/doc/index.html

Nominal Pro

unread,

Dec 20, 2007, 11:49:53 AM12/20/07

On Dec 18, 10:16 am, ap...@student.open.ac.uk wrote:

> The program converts objects to XML strings and back again using the
> reflection-based package XStream. I am wondering how one can do a
> similar job in C++. I realise there is no reflection but I don't mind
> a bit of manual work. In C++ I expect that most solutions (except
> those that use macros) will involve writing save and restore methods
> for each private member that is to be serialized.

Right. From the C++ language point-of-view, serializing and
deserializing to and from XML is conceptually no different from
serializing to and from any stream format, except you would have to
implement the code to format and parse the XML. You can obviously do
that with the aid of a C++ XML library. I have never used Xerces, but
MSXML has SAX and DOM support. I can't say that it is pretty or easy
to use, but once you figure it out, it works well.

> I have not done much XML work in C++ but when I have I remember it
> being very painful. I have used Xerces and never again! Gnome's libxml
> is much better IMO but I have not used it to converts objects to XML
> strings and back again, only to traverse the DOM. libxml is not too
> bad for that, but can it also be used for stringyfying objects? If not
> then I wonder what people use.

How would you "stringyfy" objects in C++ program at runtime using code
in some library without full reflection and introspection capabilities
in the language? I don't think it can be done (and if anyone says
libxml can do it on a C++ program, I'd love to know how). However,
there are some compile-time solutions that could be created:

1. Roll your own metaobjects. These would be C++ classes that describe
other C++ classes (the ones you want to "stringyfy"). A
"stringyfication" function could use the metaobject to generate the
serialize and deserialize functions.

2. Preprocess the header files. Another way to get at the type
information for the purposes of generating "stringyfication" code is
to preprocess the header files that contain the class definitions that
you want to "stringyfy" and extract all the metadata from that.
However, this is no trivial task; if you could do it, you could almost
implement your own C++ compiler, in which case, see #3. I've sometimes
speculated as to whether an old "cfront" preprocessor from the early
days of C++ could be updated to generate the necessary
"stringyfication" functions.

3. Implement your own C++ compiler. You can generate serialize/
deserialize functions for every C++ type.

4. Preprocess a type definition language (TDL) file--which could be
expressed in XML--and generate the headers containing the C++ type
declaration as well as the "stringyfication" functions generated from
the same TDL file.

These are all generative approaches that require that the
"stringyfication" code be generated at compile time, except for #1,
which could be made to operate at runtime using the metaobjects.
Expert C++ programmers might still be able to fashion a runtime
reflection mechanism that would work on any arbitrary C++ compiled
program by parsing debugging symbols either embedded in the executable
file, or in a separate symbol file.

Le Chaud Lapin

unread,

Dec 20, 2007, 11:49:35 AM12/20/07

{ it seems the discussion veers away from C++ toward XML. please
consider bringing it back or wrapping it up. thanks. -mod }

On Dec 19, 6:56 pm, "softwared...@gmail.com" <softwared...@gmail.com>
wrote:

> On Dec 19, 3:10 pm, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
>
> > I get asked about once every 3-4 months by a Java/C#/C++-Convert
> > programmer if I know of a good package to do XML to C++ conversion. I
> > respond by asking them an equally senseless, equally irritating
> > question.
>
> > -Le Chaud Lapin-
>
> It may not be up to the poster, they could be communicating with some
> other app/system/library that needs xml for something. There are many
> things about our jobs that we have no control over.

True.

That's why is so important that, as engineers, we remain critical of
new technology. It's ok for non-engineers (those who manage engineers
notwithstanding) to feed the hype monster, but it's not ok for us.

If we were having a discussion about cold-fusion or protein folding, I
would be more forgiving, as those concepts are difficult and
inaccessible to the average engineer, but XML is very low-tech. There
is no heinous math, no requirement of mental manipulation in 10
dimensions, no deep knowledge of physics required, or need for insight
into the theory of computation.

No, its just a vague prescription of how to organize state encoded as
an associative polyarchy of strings, a data format, and most XML-fans
blindly follow this format it because it implicitly acknowledges a
virute of systems engineer, they hierarchy, while simultaneosly being
intepretable by human beings.

A small amount of critical thinking, lasting for perhaps 30 minutes,
should be sufficient for the average C++ programmer to arrive at the
conconclusion that there is no magic. And even without such critical
thinking, the first time a programmer tries to get their C++ objects
to magically put themselves to and from disk without human
intervention using abitrary encodings of state (string or not), the
severe intellectual pinching that results should be an indicator that
they are embarking upon the path of futility.

Instead, we have the very situation that you describe: 100,000's of
programmers, staring at their XML thinking, "Hmm....I'm looking at
this XML, and it is patently obvious to me what it means and what its
structure is....so why is it so hard for me to develop a framework so
that the code can know what it means?"

The answer is simple:

People can read. Computers cannot.

-Le Chaud Lapin-

Rune Allnor

unread,

Dec 20, 2007, 4:03:57 PM12/20/07

{ this discussion has turned away from C++ and into one about the merits
of, or illusions associated with, XML. please bring it back on topic
or take it somewhere else. thanks. -mod }

On 19 Des, 21:10, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> On Dec 18, 10:16 am, ap...@student.open.ac.uk wrote:

> > The program converts objects to XML strings and back again using the
> > reflection-based package XStream. I am wondering how one can do a
> > similar job in C++. I realise there is no reflection but I don't mind
> > a bit of manual work. In C++ I expect that most solutions (except
> > those that use macros) will involve writing save and restore methods
> > for each private member that is to be serialized.

> [XML is one of those things that some engineers

> find insanely appealing without really understanding what about it
> exactly is so appealing, which leads to inflated expectations of what
> it can do for them, which is most often much less than they think.]

Maybe I am naive, but the below is completely typical
for data files in what used to be my field of work
(cut'n paset from page 21,
http://acoustics.mit.edu/faculty/henrik/oases.pdf):

SAFARI-FIP case 3. Poroelastic.
N C A D J
30 30 1 0
5
0 0 0 0 0 0 0
0 1500 -999.9990 0 1 0# SVPcontinuous at z = 30 m
30 1480 -1490 0 0 1 0
100 -1 -1 0 0 0 0 0 # Cp<0 Cs<0 flag poro-elastic layer
1 2.E9 .001 2.65 9.E9 .4 2.E-9 1.E-5 3.13E8 5.14E9 .8 1.55 1.25
120 1800 600 0.1 0.2 2.0 0
50
0.1 120 41 40
1350 1E8
-1 1 950
0 5 20 1
20 80 12 10
0 120 12 20
40 70 6

XML would be a huge improvement on this state of affairs.
I even find XML encoding of PDF, Word or Excel files
appealing, as it allows me to parse such documents
for information far more efficiently than without XML.

Did I misunderstand something about XML? Or are there,
in your opinion, better ways of achieving the same
functionality? Preferably with C++, but that's not
essential.

Rune

Alex

unread,

Dec 20, 2007, 8:59:52 PM12/20/07

On Dec 20, 11:49 am, Nominal Pro <majorsc...@gmail.com> wrote:

> 2. Preprocess the header files. Another way to get at the type
> information for the purposes of generating "stringyfication" code is
> to preprocess the header files that contain the class definitions that
> you want to "stringyfy" and extract all the metadata from that.
> However, this is no trivial task; if you could do it, you could almost
> implement your own C++ compiler, in which case, see #3. I've sometimes
> speculated as to whether an old "cfront" preprocessor from the early
> days of C++ could be updated to generate the necessary
> "stringyfication" functions.

POCO has its own parser for header files that we use to generate docs:

http://poco.svn.sourceforge.net/viewvc/poco/sandbox/CppParser/

It is in the sandbox, which means not release grade, but it works.
POCO documentation is generated with it:

http://www.appinf.com/poco/docs/

You can use that in conjunction with POCO::XML (or some other XML
parser, there are many around) to do the job.

Alex

Le Chaud Lapin

unread,

Dec 20, 2007, 8:59:41 PM12/20/07

On Dec 20, 3:03 pm, Rune Allnor <all...@tele.ntnu.no> wrote:
> Maybe I am naive, but the below is completely typical
> for data files in what used to be my field of work

> (cut'n paset from page 21,http://acoustics.mit.edu/faculty/henrik/oases.pdf):

You're right that looking at the dataset above versus XML is easier on
your eyes.

But it is not easier on the eyes of the computer. The computer does
not care. The labels and the hiearchical structure is what makes it
better, but only for human consupmption. This is a subtlety that the
XML-with-C++ pronenents do not seem to understand.

Let me illustrate more clearly by starting with a C++ object, and show
three methods of storing its state to disk, then explain why the first
two are actually identical, one being easier on the eyes than the
other, but the third, based on the *mindset* of the XML proponents, is
entirely different from the first two, even though it looks like the
second of the first two.

Our goal is to take a C++ object that represents a system that is
hugely hierarchical, containing a massive amount of state, and put it
to and from disk:

struct Airbus_A380 : public Airbus
{
Fuselage fuselage;
Empanage empanage;
Wing wing_left;
Wing wing_right;
Engine E1, E2, E3, E4;
etc;
} ;

Method 1 [Straight Binary]:

Airbus_A380 my_airbus_A380;
Target t;

t << my_airbus_A380; // sends out the fuselage, empannage, wings,
engines, etc. in raw binary

This is the method that Boost Serialization uses of course. The
binary state of the object is simply serialized to disk.

Method 2 [String Encoding of C++ Data Types]

This method is very much like the straight binary method, except,
instead of sending say, 4 bytes for an int, you write out a string
describing the fact that what is being written has type int, then
write its value as a string:

[unsigned int : 15625]

Normally, only the type is writen, not the actual name of the variable
(of course). If a structure like the Airbus above is written, these
elements become hierarchical. Unfortunately, if the only type name
information written to the file is for the 13 scalar types of C++, a
human would see the hierarchical structure, but still would not be
able interpret the state to be that of an Airbus_A380. The types
(scalar/vector/aggregate) must be labeled, by hand, by the programer,
and written to disk:

[Fuel_Tank_Left : Fuel_Tank]
[fuel_level : unsigned int : 15625]

Method 1 and Method 2 are almost equivalent. Even without labels, it
is easier to interpret the format of Method 2 while being unable to
interpret the data format of Method 1. It is also possible for the
Method 1 "import" serialization code written to be reused to import
data written with Method 2, by simply discarding the labels.

Exporting, with inclusion of labels, however, is a different matter.
The programmer must explicitly write code to export the labels of all
variables. He could call the Fuel_Tank "Coca_Cola_Bottle" during
exportation, and the importation would still work. The problem is
that the computer doesn't care when it imports.

So with every change of the Airbus_A380 , the programmer must touch
the serialization code, and this is what bothers the XML-to-object-to-
XML people so much. They want to eliminate this touching and many try
in vain to find a breakthrough to do this, perplexed that it is
trivial in Java, but seemingly impossible in C++, not realizing that
Java and C++ are not really comparable as programming languages, in
the sense that one cannot call Java a generic programming language any
more than one could call 8080 assembly a generic programming language,
because both presume a specific execution environment. Java presumes
the JVM, 8080 assembly presumes the 8080 CPU, and C++ presumes pretty
much any computer that has a Von-Neumann architecture and sufficient
RAM, and even a few other architectures. So one must be specific
comparing programming "languages" by simultaneously defining the
target environment before making the comparisons. Pascal, and similar
languages that do not allow significant amounts of superfluous run-
time state could be meaningfully compared to C++.

So if you are C++ programmer, and you want the benefit of XML, it is
best to recognize specifically that which makese XML so appealing: the
human interpretability. Then STOP! LOL. :). Assume nothing more. Most
importantly, accept that it is only interpretable by humans, and not
by computers, on any level. Then you can use one of the following
options:

Option A:

Use Method 2, explicitly writing out the labels of the structures in
Boost-like serialization code, using a target arhive that knows that a
long double should be writtten out with the string 'long double', for
example.

Option B:

Define a data structure in C++ which is an associative tree of
strings, each node of the street a map from string-to-string or
string-to-list-of-string. The C++ code surrounding the object in RAM
will do lookups of string values, then interpret them at run-time.
The object containing the state can be trivial serialized to and from
disk, with hiearchical formatting and even comments, but rigitity of
form inside the program will quickly disintegrate with this method.
The code will be very messy, as the "raw" internal C++ code will be
doing run-time intepretation of strings, and it will be easy to add an
unknown string ....(then what?)...

The XML proponents are not looking for either of these methods. They
want code to magically write itself, in a language that was designed
with the premise that the target architecture is mostly devoid of
superfluous run-time state.

-Le Chaud Lapin-

LR

unread,

Dec 21, 2007, 4:45:24 PM12/21/07

Le Chaud Lapin wrote:

> But it [XML] is not easier on the eyes of the computer. The computer does

> not care. The labels and the hiearchical structure is what makes it
> better, but only for human consupmption. This is a subtlety that the
> XML-with-C++ pronenents do not seem to understand.

If you handle the XML as being straight serialization, yes, that's true.

>
> Let me illustrate more clearly by starting with a C++ object, and show
> three methods of storing its state to disk, then explain why the first
> two are actually identical, one being easier on the eyes than the
> other, but the third, based on the *mindset* of the XML proponents, is
> entirely different from the first two, even though it looks like the
> second of the first two.
>
> Our goal is to take a C++ object that represents a system that is
> hugely hierarchical, containing a massive amount of state, and put it
> to and from disk:

I'm not sure that's our goal. Not everything is hierarchical.

>
> struct Airbus_A380 : public Airbus
> {
> Fuselage fuselage;
> Empanage empanage;
> Wing wing_left;
> Wing wing_right;
> Engine E1, E2, E3, E4;
> etc;
> } ;
>
> Method 1 [Straight Binary]:
>
> Airbus_A380 my_airbus_A380;
> Target t;
>
> t << my_airbus_A380; // sends out the fuselage, empannage, wings,
> engines, etc. in raw binary
>
> This is the method that Boost Serialization uses of course. The
> binary state of the object is simply serialized to disk.
>
> Method 2 [String Encoding of C++ Data Types]
>
> This method is very much like the straight binary method, except,
> instead of sending say, 4 bytes for an int, you write out a string
> describing the fact that what is being written has type int, then
> write its value as a string:
>
>
> [unsigned int : 15625]

But it's not identical. Just by moving to ASCII or some more likely to
be human readable form (I say more likely, since I've met one or two
people who have no problems with, say, binary representations of
floating point values), you've gained some human readability and some
portability.

>
> Normally, only the type is writen, not the actual name of the variable
> (of course). If a structure like the Airbus above is written, these
> elements become hierarchical. Unfortunately, if the only type name
> information written to the file is for the 13 scalar types of C++, a
> human would see the hierarchical structure, but still would not be
> able interpret the state to be that of an Airbus_A380. The types
> (scalar/vector/aggregate) must be labeled, by hand, by the programer,
> and written to disk:
>
> [Fuel_Tank_Left : Fuel_Tank]
> [fuel_level : unsigned int : 15625]
>
> Method 1 and Method 2 are almost equivalent. Even without labels, it
> is easier to interpret the format of Method 2 while being unable to
> interpret the data format of Method 1. It is also possible for the
> Method 1 "import" serialization code written to be reused to import
> data written with Method 2, by simply discarding the labels.
>
> Exporting, with inclusion of labels, however, is a different matter.
> The programmer must explicitly write code to export the labels of all
> variables. He could call the Fuel_Tank "Coca_Cola_Bottle" during
> exportation, and the importation would still work. The problem is
> that the computer doesn't care when it imports.

That's a good reason to have your code check the names of the labels and
use these to populate the data in your class.

This will require some extra work, and also perhaps some less than
general classes, but then, we probably don't want our persisted data to
have too close coupling with our application of that data.

>
> So with every change of the Airbus_A380 , the programmer must touch
> the serialization code, and this is what bothers the XML-to-object-to-
> XML people so much.

Bothers me not at all in the case of XML and the case of other labeled
formats.

Write code to be robust enough to allow for change, either the addition
or elimination of members of your class and this problem, well, it
doesn't exactly go away, but it does get easier.

Plus you've avoided a maintenance headache, as most people will lack
either the discipline or knowledge to do versionisation of serialized
classes.

That doesn't eliminate every problem. And it does make for other
problems. And of course, YMWV, as your application will have some
special thing that won't allow for this.

I have found that searches for perfection don't yield results.

[snip argument about Java not being generic. I suspect I disagree, but
it's not really relevant to the point I'm struggling to make.]

> So if you are C++ programmer, and you want the benefit of XML, it is
> best to recognize specifically that which makese XML so appealing: the
> human interpretability. Then STOP! LOL. :). Assume nothing more.

I disagree. Don't stop, continue. Make use of the format to accomplish
the task you want to accomplish. There are enough real constraints,
don't be constrained by some imagined limitation.

> Most
> importantly, accept that it is only interpretable by humans, and not
> by computers, on any level.

Will the same hold true for any attempt to write code that is primarily
for human interpretation? C++ for example?

[snip option A, using XML to write types and values]

[snip option B, using XML to write out labels and values of which it is
claimed that the:]

> rigitity of
> form inside the program will quickly disintegrate with this method.
> The code will be very messy, as the "raw" internal C++ code will be
> doing run-time intepretation of strings, and it will be easy to add an
> unknown string ....(then what?)...

I doubt that. Should you choose a labeled format, you should probably
add a level or two of abstraction between the layer that does the actual
I/O to the file and your classes. I think there are several reasons for
this.

1) It'll be easier to read your application code.

2) Your code will be less brittle.

3) If you implement the abstraction reasonably well, your code will be
easier to maintain. And since most code spends most of its time and
your customer's money being maintained, anything that can be done to
make that less expensive is a Good Thing.

4) You will have some nice code that will be somewhat reusable. If you
put it in a library or whatever the equivalent is on your platform.

But, and there's always a but, be careful. Keep it simple as it needs
to be but no simpler, and test it.

>
> The XML proponents are not looking for either of these methods. They
> want code to magically write itself, in a language that was designed
> with the premise that the target architecture is mostly devoid of
> superfluous run-time state.

Ah, I see, well then, I am not a proponent of XML. I am a proponent of
mostly, but not always, writing files that are labeled and not binary.

I suppose because I've seen not having that cost time and money too often.

And of course, code cannot magically or otherwise write itself. But
there are decisions we can make that will make it easier to write code.

This problem is really not very different from asking which is easier to
maintain:

Animal *a = new Dog;
...
a->talk();
or
Animal dog;
dog.name = "dog";
if(dog.name == "snake") {
std::cout << "hiss" << std::endl;
}
else if(dog.name == "giraffe") {
// do nothing
}
....

And I suppose we ought to revisit our airplane.

> struct Airbus_A380 : public Airbus
> {
> Fuselage fuselage;
> Empanage empanage;
> Wing wing_left;
> Wing wing_right;
> Engine E1, E2, E3, E4;
> etc;
> } ;

> [Fuel_Tank_Left : Fuel_Tank]
> [fuel_level : unsigned int : 15625]

That's starting to look a little lispish. ;)

But my new model has a tank in the fuselage and not in the wing.
No matter.

I can use some default value to make the fuel tank in the fuselage and
ignore the tanks in the wing. If that's what I want to do.

Or, it's not a difficult matter to run some quick and dirty and
discardable code to convert old files to the new format. Particularly if
our code is nicely modular and allows us to take just the parts of our
app that deal with I/O and reuse them in this way. Of course, even using
binary serializazation you can do that. But with some labeled ASCII (or
EBCIDC or whatever) if I only have a few to do I might even be able to
hand edit them. Most people cannot edit serialized files. (I have no
data to support this supposition.) And looking at the results will
certainly be easier.

Data has no meaning without code. And data is not a constraint on code,
nor is it a constraint on interpretation.

Dave Harris

unread,

Dec 21, 2007, 4:43:15 PM12/21/07

jaibu...@gmail.com (Le Chaud Lapin) wrote (abridged):

> On Dec 18, 10:16 am, ap...@student.open.ac.uk wrote:
> > The program converts objects to XML strings and back again using
> > the reflection-based package XStream. I am wondering how one can
> > do a similar job in C++. I realise there is no reflection but
> > I don't mind a bit of manual work.

I happen to have been doing this recently. With no built-in language
reflection we just wrote all the functions by hand.

Personally I am highly suspicious of automated methods because in my
experience there are too many special cases needing too much programmer
knowledge. For example, we wanted to minimise file size by not storing
members that had their default values. The default value might not be
zero, and in some cases might be quite complex. When loading, if the
attribute wasn't present in the file, we needed to construct that complex
default. Sometimes it would depend on values loaded earlier.

Another problem area is schema evolution. As the data structures change
we need code to convert old XML into the new format during loads.
Automation only gets you so far. At some point you need to understand
what the data means.

I have had bad experiences in the past with systems that appear easy in
the easy cases but which don't scale well to the hard cases, and
especially don't handle easy cases /evolving into/ hard ones.

-- Dave Harris, Nottingham, UK.

Dave Harris

unread,

Dec 21, 2007, 4:49:13 PM12/21/07

jaibu...@gmail.com (Le Chaud Lapin) wrote (abridged):

> You're right that looking at the dataset above versus XML is easier
> on your eyes.

I usually find XML easier to follow because it usually has more names in
it.

> So with every change of the Airbus_A380 , the programmer must touch
> the serialization code, and this is what bothers the
> XML-to-object-to-XML people so much.

I guess by "XML-to-object-to-XML people" you mean the ones who want to
generate the XML automatically. If so, that is a rather misleading way to
refer to them. You can attempt to generate data automatically without
using XML, and you can also use XML without automated generation.

> They want to eliminate this touching and many try in vain to find a
> breakthrough to do this, perplexed that it is trivial in Java, but
> seemingly impossible in C++, not realizing that Java and C++ are not
> really comparable as programming languages, in the sense that one
> cannot call Java a generic programming language any more than one
> could call 8080 assembly a generic programming language, because
> both presume a specific execution environment. Java presumes
> the JVM, 8080 assembly presumes the 8080 CPU, and C++ presumes
> pretty much any computer that has a Von-Neumann architecture
> and sufficient RAM, and even a few other architectures.

Java can be compiled directly to machine code - it doesn't need a JVM.
Certainly automatically generating XML does not require a full JVM-like
runtime. It just needs the compiler to include a description of the
program's types in the executable. Doing so wouldn't make a language less
general purpose.

> Use Method 2, explicitly writing out the labels of the structures in
> Boost-like serialization code, using a target arhive that knows
> that a long double should be writtten out with the string 'long
> double', for example.

I found I generally didn't want too much type information in the file.
It's more important to know the names of the fields. You need types if
the objects are polymorphic, of course, but for something like:

> Engine E1, E2, E3, E4;

the program already knows the types and just needs the names "E1" etc.
Having names enables the program to cope with data which is missing,
present unexpectedly, or in a different order. Having types just enables
type-checking of the data. For me that's not much benefit, and is
sometimes a positive hinderance as I may want to change the type between
versions.

> Define a data structure in C++ which is an associative tree of
> strings, each node of the street a map from string-to-string or
> string-to-list-of-string. The C++ code surrounding the object in
> RAM will do lookups of string values, then interpret them at run-time.
> The object containing the state can be trivial serialized to and
> from disk, with hiearchical formatting and even comments, but
> rigitity of form inside the program will quickly disintegrate
> with this method. The code will be very messy, as the "raw"
> internal C++ code will be doing run-time intepretation of
> strings, and it will be easy to add an unknown string ....
> (then what?)...

I'm not sure what you have in mind here. My approach led to code like:

enum Colour { Black, Red, Green, Blue, White };

....
Colour faciaColour;
LoadEnum( pXml, L"facia", faciaColour, White );

which seems tidy enough. There is code that does run-time interpretation
of string values, but it's in a library somewhere and doesn't add to the
mess in client code. Unexpected strings get ignored, and missing strings
get their default values (White in this case).

It becomes less clean as the data evolves, leading to things like:

ColourObject *pFaciaColour;
if (!LoadObject( pXml, L"facia2", pFaciaColour, White )) {
Colour faciaColour;
LoadEnum( pXml, L"facia", faciaColour, White );
pFaciaColour = new EnumColour( faciaColour );
}

where the original enum type has been replaced by a heirarchy of classes
that provide different colour models - RGB, CMYK etc. The serialisation
code first looks for the new format, and if it can't find it then looks
for the old format and converts it.

Do you think this is too rigid and will "quickly disintegrate"?

-- Dave Harris, Nottingham, UK.

Le Chaud Lapin

unread,

Dec 22, 2007, 3:53:59 PM12/22/07

On Dec 21, 3:43 pm, brang...@ntlworld.com (Dave Harris) wrote:
> jaibudu...@gmail.com (Le Chaud Lapin) wrote (abridged):

>
> > On Dec 18, 10:16 am, ap...@student.open.ac.uk wrote:
> > > The program converts objects to XML strings and back again using
> > > the reflection-based package XStream. I am wondering how one can
> > > do a similar job in C++. I realise there is no reflection but
> > > I don't mind a bit of manual work.
>
> I happen to have been doing this recently. With no built-in language
> reflection we just wrote all the functions by hand.
>
> Personally I am highly suspicious of automated methods because in my
> experience there are too many special cases needing too much programmer
> knowledge. For example, we wanted to minimise file size by not storing
> members that had their default values. The default value might not be
> zero, and in some cases might be quite complex. When loading, if the
> attribute wasn't present in the file, we needed to construct that complex
> default. Sometimes it would depend on values loaded earlier.
>
> Another problem area is schema evolution. As the data structures change
> we need code to convert old XML into the new format during loads.
> Automation only gets you so far. At some point you need to understand
> what the data means.
>
> I have had bad experiences in the past with systems that appear easy in
> the easy cases but which don't scale well to the hard cases, and
> especially don't handle easy cases /evolving into/ hard ones.

I wish I would have read this post first before I responded to your
other post 3 minutes ago.

This is basically what I was trying to say, especially the part about
evolution of the scheme, and the value of elements: there is no free
lunch.

The engineer must always exercise speficity.

-Le Chaud Lapin-

Le Chaud Lapin

unread,

Dec 22, 2007, 3:54:00 PM12/22/07

On Dec 21, 3:49 pm, brang...@ntlworld.com (Dave Harris) wrote:
> jaibudu...@gmail.com (Le Chaud Lapin) wrote (abridged):

> I guess by "XML-to-object-to-XML people" you mean the ones who want to
> generate the XML automatically. If so, that is a rather misleading way to
> refer to them. You can attempt to generate data automatically without
> using XML, and you can also use XML without automated generation.

True.

> Java can be compiled directly to machine code - it doesn't need a JVM.
> Certainly automatically generating XML does not require a full JVM-like
> runtime. It just needs the compiler to include a description of the
> program's types in the executable. Doing so wouldn't make a language less
> general purpose.
>

> I'm not sure what you have in mind here. My approach led to code like:

>
> enum Colour { Black, Red, Green, Blue, White };
>
> ....
> Colour faciaColour;
> LoadEnum( pXml, L"facia", faciaColour, White );
>
> which seems tidy enough. There is code that does run-time interpretation
> of string values, but it's in a library somewhere and doesn't add to the
> mess in client code. Unexpected strings get ignored, and missing strings
> get their default values (White in this case).

Ok, this last sentence, "Unexpected strings get ignored, and missing
strings get their default values (White in this case)." is very
important, and touches on the thesis of pretty much all my
philosophical posts in this group:

>>> The engineer must exercise specificity at some point. <<<

Here you've exercised it. You decided beforehand that missing strings
get their default values. You've also implicitly decided that extra
strings get ignored.

Note that, in your case, because you have said, in advance, that it's
ok that strings can be missing, there is no issue. But there are many
situations where XML will model a C++ object on disk, and if someone
were to "version" the XML and add and extra field, the C++ importation
code would not be able to cope, or rather, "coping" would be a very
bad idea. There must then be either:

1. A policy stipulating that extra fields are simply ignored.
2. Exception thrown.

There are some people (not you) who would actually say just ignore any
extra fields as a rule. C++ objects representing missile launchers
might not appreciate this rule.

This same problem happens with versioning in binary serialization.
People say that the objects are versioned, but that's not really
what's happening. What's happening is that a new type is being
formed. If a target T of serialization receives an object from source
S, and T is not expecting the new "version" of the object that S sent,
then what? Exception? Take what you can and discard the rest? What if
the new object is 3 times the size of the original object? When is the
disparity too great to ignore? The computer cannot know - it cannot
think. The programemr will have to tel it.."Throw an exception for
even the minutest disparity" or "Just ignore and move on." There is no
real gray area. Whatever the computer does, a programmer will have
told it to do that, whether purposely, or not.

> It becomes less clean as the data evolves, leading to things like:
>
> ColourObject *pFaciaColour;
> if (!LoadObject( pXml, L"facia2", pFaciaColour, White )) {
> Colour faciaColour;
> LoadEnum( pXml, L"facia", faciaColour, White );
> pFaciaColour = new EnumColour( faciaColour );
> }
>
> where the original enum type has been replaced by a heirarchy of classes
> that provide different colour models - RGB, CMYK etc. The serialisation
> code first looks for the new format, and if it can't find it then looks
> for the old format and converts it.
>
> Do you think this is too rigid and will "quickly disintegrate"?

No, the problem is often that it is too flexible.

I wrote post in this group, long ago, about about rigidity being the
foundation of flexibility. Lack of rigidity (concrete classes) means
broken assignment, broken default construction, etc (or weird and
tedious at best).

You're using a simple colour object here. But more complex object
will make more complex hierarchical XML. And one of the things that
the XML fanatics (not you) espouse is the interchangeability of the
XML data file.

I am not referring to the encoding being ASCII strings to be
interpreted. I am going to demonstrate in a moment how I would do the
ASCII human-readability part in C++ only. I am talking about the
people who use the phrase..."it consumes XML objects"

These individuals speak as if the XML file can float around in
cyberspace from company to company, changing as it wishes, and each
company will be able to have C++ objects that yield themselves from it
automagically.

This is not true.

If one company adds a single field, and you have not specified the
"missing values get ignored" stipulation _up front_, all bets are
off. It's a recipe for nightmare.

If you _do_ specify that wrongfully absent or present values get
ignored, then you do not need XML, you only need C++:

struct ChaudLapinPseudoXML : Asscociative_Polyarchy <String,
Associative_Set<String, List<String> > >
{
} ;

Notes:

1. An Associative_Polyarchy is nothing more than the tree equivalent
of std:map<>.
2. An Associative_Set is the equivalent of std:map<>

Note here, that everything in this data structure is a string or list
of strings. You can do a lookup in it, given the path to a node, and
once the value is found (List<String>), do your conversions.
Conversions can be done either by hand each time or using library.

The key here is that there is no magic. You've bitten the bullet and
said, "I'm going to be doing a lot of string conversions
(intepretations)." This is not the same as what the OP might like -
to load arbitrary objects off disk as Java would, and I do not mean
the code of the objects, but the arbitary state structure.

In any case, my structure too, would be portable, and with the "ignore
unexpected rule"..., companies could do whatever they wanted and it
would be unbreakable..because the form, the *type*, is pre-specified,
and unchangeable "an associative polyarchy maping string to
asscociative set mapping string to list of string". New "fields" could
come and go because of the "ignore unexpected" rule.

You cannot do this with any arbitrary C++ object and XML because
changing an field changes the type, and it is not prudent to prescribe
ignorance up front because motivation for the new field (new type)
cannot be known in advance.

So in summary, if you are using XML as a hierarchical data store of
intepretable strings that will be interpreted as state separate from
the state of object itself, that's fine. If you are using it to model
the actual structure of C++ objects, the XML must remain rigid in
type, or there will be a mess during "interoperability".

-Le Chaud Lapin-

Dave Harris

unread,

Dec 23, 2007, 3:42:11 PM12/23/07

jaibu...@gmail.com (Le Chaud Lapin) wrote (abridged):

> You cannot do this with any arbitrary C++ object and XML because
> changing an field changes the type, and it is not prudent to
> prescribe ignorance up front because motivation for the new
> field (new type) cannot be known in advance.

The way I envisage it, the program that outputs the data has most of the
responsibility for saying whether a new field can reasonably be ignored.
That policy can be stored in the file itself.

The generating program also has the option of outputting an alternative
representation. The file could be marked with something that said,
"Either of these two fields can be ignored, but not both". So I don't
think we have to be completely rigid or completely flexible. But I agree
we do need to think about it in advance.

-- Dave Harris, Nottingham, UK.

ap...@student.open.ac.uk

unread,

Dec 23, 2007, 3:52:17 PM12/23/07

{ let us try to keep this focused on C++, please. thanks. -mod }

On 19 Dec, 20:10, Le Chaud Lapin <jaibudu...@gmail.com> wrote:
> > The program converts objects to XML strings and back again using the
> > reflection-based package XStream. I am wondering how one can do a
> > similar job in C++.

> I use serialization. It works.

>
> As far as serializing strings, that's possible: It is not
> inconceivable to make a serialization framework where the target of
> serialization converts types and values to a string encode of those
> types and values [int:9107], storing them in an associative polyarchy
> (my terminology) in XML like format, but not using XML, as XML is
> grossly overrated.

XML is not my choice, especially when the data is to go over the wire,
as it is in this case. My preference is actually ASN.1 using BER but
not many people seem to have heard of it and there is not much tool
support.

> [XML is one of those things that some engineers
> find insanely appealing without really understanding what about it
> exactly is so appealing, which leads to inflated expectations of what
> it can do for them,

Most s/w engineers I have come across have XML in the right
perspective, but unfortunately mgmt do not and it is they that dictate
it be used for things it is not appropriate for.

> which is most often much less than they think.]
>
> In any case, Java programmers, and programmers in some other
> "interpreted" languages, should know by now that it is farcical to
> make comparisons between C++ and Java.

I think the comparison is inevitable, especially in the City (London)
where most of my work is. There java has largely replaced C++ which
suggests to me that C++ was being used when java can now be used
instead with no loss and several gains. The main gain is the skill is
cheaper. C++ is no longer taught. Java is.

> C++ is a language whose target
> code is interpreted by a CPU. Java is a language whose target code is
> interpreted by target code that is interpreted by a CPU. The
> intermediate layer, the JVM, or Java diaper as some of us like to call
> it, creates an environment that would be strange to C++, but is a
> necessity for Java, and this inseparability leads one to ask whether
> Java is a language or a language+environment.

Java is a language+environment. People that move over to java just
have to accept this and they do, even though this can cause issues of
its own. I find that with the JVM in the game one has to be aware of
the time it takes for an app to start up (in C++ it is practically
instant) and from what I have seen one has to assume that more memory
will be consumed than the equivalent C++ program. It seems like
although the theory is memory will be collected via GC, in prcatise
one does not want GC to be invoked unless it is critical to be, since
GC occurs at non-deterministic points and is quite expensive.

>
> On the matter of XML, I have been admonishing other engineers for
> years that there will never be a system that magically converts from
> XML to C++, or vice-versa, because the very proposition is senseless.

This seems like a very strong reaction to me. I know I would not have
picked XML but given that this is what is dictated, I have the need to
convert. I find that in java I can do this easily, in C++ it seems
very painful.

> Most of them don't listen....

[snip]

>
> I get asked about once every 3-4 months by a Java/C#/C++-Convert
> programmer if I know of a good package to do XML to C++ conversion. I
> respond by asking them an equally senseless, equally irritating
> question.

I can only conclude that you must be in an environment where the
choices about which technological mixes are in use are made purely by
technical criterea. Lucky you. I don't think that's the case for most
of us.

mbha...@gmail.com

unread,

Dec 28, 2007, 1:35:09 PM12/28/07

Andrew,
Here's my personal favorite for the task you described in C++. I
implemented the pattern while developing a webservice client and it
was fun.
Please read through -> http://www.coolapps.net/composite.htm and let
me know your thoughts.

0 new messages