hbmxml - testmxml: messed output

76 views
Skip to first unread message

Viktor Szakáts

unread,
Jan 15, 2011, 6:13:40 PM1/15/11
to harbou...@googlegroups.com
Hi All,

I was testing hbmxml with this line on win/mingw:
    testmxml setup.ui

where setup.ui comes from /contrib/hbide/setup.ui, 
and the output (in out.xml) has its indenting and EOLs 
messed up.

F.e. newlines are sometimes CRLF, sometimes LF only (which 
doesn't work on win platform), sometimes space char seems to 
be used instead of newline (LF).

Is this expected behavior?

Viktor

--- (easier to see in plain text viewer, but here you go)
0000000000: 3C 3F 78 6D 6C 20 76 65 72 73 69 6F 6E 3D 22 31  <?xml version="1
0000000010: 2E 30 22 20 65 6E 63 6F 64 69 6E 67 3D 22 55 54  .0" encoding="UT
0000000020: 46 2D 38 22 3F 3E 0D 0A 3C 75 69 20 76 65 72 73  F-8"?>♪◙<ui vers
0000000030: 69 6F 6E 3D 22 34 2E 30 22 3E 20 09 3C 63 6C 61  ion="4.0"> ○<cla
0000000040: 73 73 3E 44 69 61 6C 6F 67 53 65 74 75 70 3C 2F  ss>DialogSetup</
0000000050: 63 6C 61 73 73 3E 0D 0A 09 3C 77 69 64 67 65 74  class>♪◙○<widget
0000000060: 20 63 6C 61 73 73 3D 22 51 44 69 61 6C 6F 67 22   class="QDialog"
0000000070: 20 6E 61 6D 65 3D 22 44 69 61 6C 6F 67 53 65 74   name="DialogSet
0000000080: 75 70 22 3E 20 09 09 3C 70 72 6F 70 65 72 74 79  up"> ○○<property
0000000090: 0A 6E 61 6D 65 3D 22 67 65 6F 6D 65 74 72 79 22  name="geometry"
00000000A0: 3E 20 09 09 09 3C 72 65 63 74 3E 20 09 09 09 09  > ○○○<rect> ○○○○
00000000B0: 3C 78 3E 30 3C 2F 78 3E 0D 0A 09 09 09 09 3C 79  <x>0</x>♪◙○○○○<y
00000000C0: 3E 30 3C 2F 79 3E 0D 0A 09 09 09 09 3C 77 69 64  >0</y>♪◙○○○○<wid
00000000D0: 74 68 3E 34 37 36 3C 2F 77 69 64 74 68 3E 0D 0A  th>476</width>♪◙
00000000E0: 09 09 09 09 3C 68 65 69 67 68 74 3E 34 31 33 3C  ○○○○<height>413<
00000000F0: 2F 68 65 69 67 68 74 3E 0D 0A 3C 2F 72 65 63 74  /height>♪◙</rect
0000000100: 3E 0D 0A 3C 2F 70 72 6F 70 65 72 74 79 3E 0D 0A  >♪◙</property>♪◙
0000000110: 09 09 3C 70 72 6F 70 65 72 74 79 20 6E 61 6D 65  ○○<property name
0000000120: 3D 22 77 69 6E 64 6F 77 54 69 74 6C 65 22 3E 20  ="windowTitle">
---

Tamas TEVESZ

unread,
Jan 15, 2011, 6:50:29 PM1/15/11
to harbou...@googlegroups.com
On Sun, 16 Jan 2011, Viktor Szakáts wrote:

hi,

> I was testing hbmxml with this line on win/mingw:
> testmxml setup.ui
>
> where setup.ui comes from /contrib/hbide/setup.ui,
> and the output (in out.xml) has its indenting and EOLs
> messed up.
>
> F.e. newlines are sometimes CRLF, sometimes LF only (which
> doesn't work on win platform), sometimes space char seems to
> be used instead of newline (LF).
>
> Is this expected behavior?

hard to say. if you run the upstream testmxml on this file, you'll see
that it's indenting too is messed up (iow testmxml isn't really an
xmllint --format-replacement, and by no means a generic tool), but it
should at least be comparable to our version's result (as long as you
look at ti with tab=8sp, as i've just spotted a bit of a problem).

new lines should probably be consistent, if nothing else.

there's a bit of a work left to be done on testmxml, i made a mental
note of these issues raised, and will look at them.


--
[-]

mkdir /nonexistent

Viktor Szakáts

unread,
Jan 16, 2011, 5:38:01 AM1/16/11
to harbou...@googlegroups.com
Hi,

I repeated the same test using original testmxml.c, and 
the result are similar except that the .c version consistently 
uses 0x0A (LF) EOLs, whilst the Harbour version has a 
mixup of CRLF and LF. (BTW I used the Harbour hosted 
mxml lib with the .c test suite, too)

This is caused by HB_EOL() calls in the .prg version. Replacing 
these to Chr( 10 ) (LF) gives exact same output with both tests.

[ I've made both tests on win platform. ]

The more worrying aspect is that this conveys that minixml 
code uses hard-coded LF EOLs. This latter might either be 
minixml bug or configuration problem on our local build (or 
some runtime minixml setting?). Anyways, definitely something 
to check out, since this is showstopper for production use.

Viktor

Tamas TEVESZ

unread,
Jan 16, 2011, 5:52:11 AM1/16/11
to harbou...@googlegroups.com
On Sun, 16 Jan 2011, Viktor Szakáts wrote:

hi,

> I repeated the same test using original testmxml.c, and


> the result are similar except that the .c version consistently
> uses 0x0A (LF) EOLs, whilst the Harbour version has a
> mixup of CRLF and LF. (BTW I used the Harbour hosted
> mxml lib with the .c test suite, too)
>
> This is caused by HB_EOL() calls in the .prg version. Replacing
> these to Chr( 10 ) (LF) gives exact same output with both tests.

that's good news.

> [ I've made both tests on win platform. ]

that's even better news ;)

> The more worrying aspect is that this conveys that minixml
> code uses hard-coded LF EOLs. This latter might either be
> minixml bug or configuration problem on our local build (or
> some runtime minixml setting?). Anyways, definitely something
> to check out, since this is showstopper for production use.

how is that? there's no significance of eols in xml whatsoever, i'd
qualify this as a nuisance rather than a showstopper problem.

mxml does use hard-coded \ns, but i don't see how that in any way is a
showstopper on any platform.

--
[-]

mkdir /nonexistent

Viktor Szakáts

unread,
Jan 16, 2011, 6:02:31 AM1/16/11
to harbou...@googlegroups.com
 > I repeated the same test using original testmxml.c, and
 > the result are similar except that the .c version consistently
 > uses 0x0A (LF) EOLs, whilst the Harbour version has a
 > mixup of CRLF and LF. (BTW I used the Harbour hosted
 > mxml lib with the .c test suite, too)
 >
 > This is caused by HB_EOL() calls in the .prg version. Replacing
 > these to Chr( 10 ) (LF) gives exact same output with both tests.

that's good news.

 > [ I've made both tests on win platform. ]

that's even better news ;)

 > The more worrying aspect is that this conveys that minixml
 > code uses hard-coded LF EOLs. This latter might either be
 > minixml bug or configuration problem on our local build (or
 > some runtime minixml setting?). Anyways, definitely something
 > to check out, since this is showstopper for production use.

how is that? there's no significance of eols in xml whatsoever, i'd
qualify this as a nuisance rather than a showstopper problem.

I'd never create an output file with inconsistent EOLs.
There is hardly any official file specification which would 
explicitly allow that, so it's only a chance to create problems 
and put the receiver side under unnecessary stress test.
Best case it's a nuisance for anyone wanting to look into 
a generated .xml file (human readability is one of the points 
of .xml after all), but certainly a clear sign of bad quality 
software. Also note that f.e. SVN doesn't even allow you 
to upload such mixed EOL files due to the ambiguity it 
creates.

These are factors which makes me drop this library.

Of course it's not a problem for *nix users, but the world 
still didn't switch fully to *nix, especially on the end-user 
side.

mxml does use hard-coded \ns, but i don't see how that in any way is a
showstopper on any platform.

It very much is. If it bothers you, let's not call it 
showstopper, but a simple bug and/or portability problem, 
sloppy coding, whatever.

Please note that all parts of Harbour uses consistent 
EOLs in sync with the platform, and there is great care 
taken that it's done like this.

Viktor

Tamas TEVESZ

unread,
Jan 16, 2011, 6:37:30 AM1/16/11
to harbou...@googlegroups.com
On Sun, 16 Jan 2011, Viktor Szakáts wrote:

> I'd never create an output file with inconsistent EOLs.

so there's a bug in my interpretation of testmxml.

> There is hardly any official file specification which would
> explicitly allow that, so it's only a chance to create problems
> and put the receiver side under unnecessary stress test.
> Best case it's a nuisance for anyone wanting to look into
> a generated .xml file (human readability is one of the points
> of .xml after all), but certainly a clear sign of bad quality
> software. Also note that f.e. SVN doesn't even allow you
> to upload such mixed EOL files due to the ambiguity it
> creates.

so there's a bug in my interpretation of testmxml. it's quite a
stretch to call mxml names based on that.

> mxml does use hard-coded \ns, but i don't see how that in any way is a
> > showstopper on any platform.
> >
>
> It very much is. If it bothers you, let's not call it
> showstopper, but a simple bug and/or portability problem,
> sloppy coding, whatever.

it is exactly neither of the above. white spaces (which mxml writes)
still bear no significance whatsoever in xml, and for parsed entities
(which mxml does not write, nor are they affected by the whitespace
callback) the ambiguity is taken out by the xml specification.

as for human readability, *wordpad* can render unix (and mac, for that
matter) eols. wordpad. not vim or emacs or whatever their
grandma-and-the-kitchensink equivalent on windows is, wordpad.

> Please note that all parts of Harbour uses consistent
> EOLs in sync with the platform, and there is great care
> taken that it's done like this.

so i'll just fix testmxml up.

--
[-]

mkdir /nonexistent

Viktor Szakáts

unread,
Jan 16, 2011, 7:09:14 AM1/16/11
to harbou...@googlegroups.com
 > I'd never create an output file with inconsistent EOLs.

so there's a bug in my interpretation of testmxml.

 > There is hardly any official file specification which would
 > explicitly allow that, so it's only a chance to create problems
 > and put the receiver side under unnecessary stress test.
 > Best case it's a nuisance for anyone wanting to look into
 > a generated .xml file (human readability is one of the points
 > of .xml after all), but certainly a clear sign of bad quality
 > software. Also note that f.e. SVN doesn't even allow you
 > to upload such mixed EOL files due to the ambiguity it
 > creates.

so there's a bug in my interpretation of testmxml. it's quite a
stretch to call mxml names based on that.

No, it's not your bug, you used hb_eol() which is 
perfect there. minixml should itself use platform 
dependent EOLs instead of forcing LF EOLs on 
win platform (I don't know the exact scale of 
the bug).
 
 > It very much is. If it bothers you, let's not call it
 > showstopper, but a simple bug and/or portability problem,
 > sloppy coding, whatever.

it is exactly neither of the above. white spaces (which mxml writes)
still bear no significance whatsoever in xml, and for parsed entities
(which mxml does not write, nor are they affected by the whitespace
callback) the ambiguity is taken out by the xml specification.

It bears significance because it's text output and 
the lib adds EOLs to this text output. If it doesn't bear any significance, 
maybe minixml should not add any EOLs in the first place. Granted, 
in that case the output would still be valid XML, though not 
very readable ones for the humans. Probably to avoid this, minixml 
developers chose to add EOLs, and they added it wrongly, so now 
there is a bug, and there is no point in arguing on the necessity of 
EOLs themselves.

Shortly: Although EOLs are not necessary, they are generated, 
but generated wrongly.
 
as for human readability, *wordpad* can render unix (and mac, for that
matter) eols. wordpad. not vim or emacs or whatever their
grandma-and-the-kitchensink equivalent on windows is, wordpad.

 > Please note that all parts of Harbour uses consistent
 > EOLs in sync with the platform, and there is great care
 > taken that it's done like this.

so i'll just fix testmxml up.

Not necessary since it doesn't solve the problem.

The output on win/dos/os2 platforms will still be wrong, 
although in this case more consistently wrong with all LF EOLs.

LF EOLs are not recognized or accepted as EOL on 
win/dos/os2 platforms.

Viktor

Viktor Szakáts

unread,
Jan 16, 2011, 10:53:47 AM1/16/11
to harbou...@googlegroups.com
Here is how test output looks like when viewed in Notepad (Windows).
The outputs were created using original testmxml.c.
(still using the Harbour build of minixml 3rd party lib)

Viktor

Screen shot 2011-01-16 at 16.49.28.png
Screen shot 2011-01-16 at 16.47.37.png

myo...@mail.ru

unread,
Jan 16, 2011, 4:48:12 PM1/16/11
to harbou...@googlegroups.com

> Here is how test output looks like when viewed in Notepad (Windows).
> The outputs were created using original testmxml.c.
> (still using the Harbour build of minixml 3rd party lib)
>

Much better is output in SciTE (of course Windows)


Regards,
Petr

scite2.jpg

Viktor Szakáts

unread,
Jan 16, 2011, 5:27:01 PM1/16/11
to harbou...@googlegroups.com
Are we really arguing about LF EOLs being normal on 
Windows platform?

Your screenshot shows LFs so if the file was generated 
using win build of hbmxml, it's wrong. Citing win/dos/os2 
editors which can cope with LF EOLs is rather pointless. 
The point is that LF EOLs are _not the standard_ on 
win/dos/os2 platforms. I'm sure none of my end-users 
use SciTE editor. They use Notepad. Do your users 
use programmer's editors?

For me it seemed we value supporting the "quirks" of 
all platforms, of which EOL difference is one of the top 
ones. In the life of Harbour this is the first time I see 
this basic issue questioned, and it really seems 
strange. Did our portability goals changed? Or is this 
technical problem (bug) sensed as a personal offense 
by some? Is this a special case, where we should 
disregard proper EOLs?

Viktor

Bacco

unread,
Jan 16, 2011, 5:44:39 PM1/16/11
to harbou...@googlegroups.com
Hi all.

Due to the interoperable nature of XMLs and the recomendation of w3c
that XML parsers complain with any combination of CR, LF or both, I
believe that this should be something that we can choose some way, not
only inherent to the OS generating the XML.

Of course the current discussion still aplies to the default EOL.


Regards,
Bacco

April White

unread,
Jan 16, 2011, 6:05:43 PM1/16/11
to harbou...@googlegroups.com
SciTE always has/had good EOL handling, even mixed files.

Apil

> <scite2.jpg>

Viktor Szakáts

unread,
Jan 16, 2011, 6:08:33 PM1/16/11
to harbou...@googlegroups.com
SciTE always has/had good EOL handling, even mixed files.

Maybe we should change HB_EOL() to return LF on 
win/dos/os2 platform.

May I commit it?

Viktor

ToninhoFWi

unread,
Jan 16, 2011, 7:14:04 PM1/16/11
to harbou...@googlegroups.com
> Due to the interoperable nature of XMLs and the recomendation of w3c
> that XML parsers complain with any combination of CR, LF or both, I
> believe that this should be something that we can choose some way, not
> only inherent to the OS generating the XML.

+1.

I access some web services that need a continuos string. No CRLF or CR or LF
at XML file. Maybe this is non professioanl WS but here in Brazil I found
this scenario.

IMHO, the OS default LF is the best, but we need a way to change it, if
necessary.

TIA and best regards,

Toninho.

__________________________________________________
Fale com seus amigos de gra�a com o novo Yahoo! Messenger
http://br.messenger.yahoo.com/

Viktor Szakáts

unread,
Jan 17, 2011, 1:14:26 AM1/17/11
to harbou...@googlegroups.com
I access some web services that need a continuos string. No CRLF or CR or LF at XML file. Maybe this is non professioanl WS but here in Brazil I found this scenario.

IMHO, the OS default LF is the best, but we need a way to change it, if necessary.

The OS default should be the OS native EOL, not LF. LF is *nix 
specific. This is how we do it all over Harbour, so its nothing new.

Viktor

ToninhoFWi

unread,
Jan 17, 2011, 6:30:47 AM1/17/11
to harbou...@googlegroups.com
>The OS default should be the OS native EOL, not LF. LF is *nix 
>specific. This is how we do it all over Harbour, so its nothing new.
 
Thank you, but users need a way to create XML with no default OS EOL. As I saw, I have situations, where I need a XML without EOL.
 
Best regards,
 
Toninho.
 
 
 
 

Viktor Szakáts

unread,
Jan 17, 2011, 6:40:27 AM1/17/11
to harbou...@googlegroups.com
>The OS default should be the OS native EOL, not LF. LF is *nix 
>specific. This is how we do it all over Harbour, so its nothing new.
 
Thank you, but users need a way to create XML with no default OS EOL. As I saw, I have situations, where I need a XML without EOL.

Definitely it's a plus if user configurable.

Viktor

ToninhoFWi

unread,
Jan 17, 2011, 6:49:02 AM1/17/11
to harbou...@googlegroups.com
>Definitely it's a plus if user configurable.
 
Nice !
 
Regards,
 
Toninho.
 
Reply all
Reply to author
Forward
0 new messages