On 12/03/12 03:45, Joe Kesselman wrote:
> Haven't checked the logic, but I'd note that by using literal result
> elements you can simplify that slightly. Also, I'd leave out the
> encoding and let it stay in UTF8 unless you have some specific reason
> for doing otherwise. Finally, you might want to explicitly process only
> the <faxes> elements, and make sure anything else doesn't contribute, by
> adding a root template which selects only those for processing.
>
> But, yeah, whoever created that XML in the first place should be ashamed
> of themselves for not structuring it better.
Amen. Unfortunately many people don't think before creating XML.
[OP]
>> I am clueless and a newbie, so don't be too rough on me!
You are not the clueless one: that is your client, unfortunately.
When I get a large quantity (or many repeat instances) of files like
this, where the markup itself is regular enough to be SGML, but the
design defective, I resort to invoking omissibility. This SGML document:
<!doctype content [
<!element content - - (status,faxes)>
<!element status - - (#pcdata)>
<!element faxes - - (fax)+>
<!element fax o o (faxid,date,source,destination,status)>
<!element faxid - - (#pcdata)>
<!element date - - (#pcdata)>
<!element source - - (#pcdata)>
<!element destination - - (#pcdata)>
]>
<content>
<status>ok</status>
<faxes>
<faxid>21404974</faxid>
<date>2012-01-05 07:34:10</date>
<source>
5194852368</source>
<destination>
8885216725</destination>
<status>read</status>
<faxid>23223059</faxid>
<date>2012-03-01 07:27:52</date>
<source>
5194211862</source>
<destination>
8885216725</destination>
<status>new</status>
<faxid>23223164</faxid>
<date>2012-03-01 07:29:45</date>
<source>
5194211862</source>
<destination>
8885210692</destination>
<status>new</status>
<faxid>23224287</faxid>
<date>2012-03-01 07:51:07</date>
<source>
8885216725</source>
<destination>
8885210692</destination>
<status>new</status>
</faxes></content>
can be processed with sgmlnorm to normalize it so that the missing <fax>
and </fax> elements are inserted. Usually this is only worth doing for a
workflow, so that it will create fully-normalized SGML which an XML
processor will accept.
>> When I get the XML document delivered, this is EXACTLY what it looks like.
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <content>
>> <status>ok</status><faxes><faxid>21404974</faxid><date>2012-01-05
>> 07:34:10</date><source>
5194852368</source><destination>
8885216725</destination><status>read</status><faxid>23223059</faxid><date>2012-03-01
>> 07:27:52</date><source>
5194211862</source><destination>
8885216725</destination><status>new</status><faxid>23223164</faxid><date>2012-03-01
>> 07:29:45</date><source>
5194211862</source><destination>8885210692</destination><status>new</status><faxid>23224287</faxid><date>2012-03-01
>> 07:51:07</date><source>
8885216725</source><destination>8885210692</destination><status>new</status></faxes></content>
That's fine. XML doesn't need to be pretty-printed unless you want to
show it to a human. As Martin indicated, there are ways to pretty-print
it if you need (and from a later post you discovered Tidy). But it's
usually more effective to concentrate on making the markup processable
rather than on making it look attractive.
///Peter