TreeCompare not ignoring whitespace

0 views
Skip to first unread message

John L. Clark

unread,
Jun 11, 2008, 9:19:25 AM6/11/08
to amar...@googlegroups.com
I am interested in using TreeCompare (from 4Suite XML) for comparing
two XML trees, modulo whitespace differences. However, when I compare
the following two documents ('foo.xml' and 'foo.xml.2', respectively):

<foo>
<bar/></foo>

and

<foo>
<bar/>
</foo>

using `TreeCompare.TreeCompare(foo, foo2, ignoreWhitespace=True)`, I
get the following report of differences:

--- expected ---
#document
foo
node: <Element at 0xb79248ac: name u'foo', 0 attributes, 2 children>
node.childNodes:
[<Text at 0xb78773ec: u'\n '>,
<Element at 0xb792492c: name u'bar', 0 attributes, 0 children>]
--- compared ---
#document
foo
node: <Element at 0xb79249ac: name u'foo', 0 attributes, 3 children>
node.childNodes:
[<Text at 0xb787720c: u'\n '>,
<Element at 0xb79249ec: name u'bar', 0 attributes, 0 children>,
<Text at 0xb78771e4: u'\n'>]

Am I misinterpreting what the `ignoreWhitespace` parameter should do,
or is this a bug?

Take care,

John L. Clark

Jeremy Kloth

unread,
Jun 13, 2008, 11:11:37 AM6/13/08
to amar...@googlegroups.com, John L. Clark
On Wednesday, June 11, 2008 7:19:25 am John L. Clark wrote:
> I am interested in using TreeCompare (from 4Suite XML) for comparing
> two XML trees, modulo whitespace differences.

Nice. Actual usage outside of the test suites!

You've encountered an under-implemented feature. ;) As implemented, it handled
differences in the *amount* of whitespace (read length > 0) between two
documents, not absence of it however. Attached is a patch that implements
the later behavior, as well.

Note, the fix is also available in the CCF branch hosted at hg.4suite.org.

Thanks,
Jeremy Kloth

whitespace.patch

John L. Clark

unread,
Aug 6, 2008, 12:20:50 PM8/6/08
to Jeremy Kloth, amar...@googlegroups.com
The fix is wrong. When it is supposed to filter out whitespace nodes,
it filters out all nodes except for non-whitespace text nodes:

+ children1 = [ c for c in children1
+ if c.nodeType == Node.TEXT_NODE
+ and not IsXmlSpace(c.data) ]

I have attached a Mercurial bundle with the fix from my repository.

Take care,

John L. Clark

4Suite-TreeCompare-whitespace_2008-08-06.bundle
Reply all
Reply to author
Forward
0 new messages