Issue 25 in daisydiff: The HtmlDiffer comparison does not work well with  

81 views
Skip to first unread message

codesite...@google.com

unread,
Aug 2, 2010, 12:33:12 PM8/2/10
to dais...@googlegroups.com
Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 25 by dominic.net: The HtmlDiffer comparison does not work well
with  
http://code.google.com/p/daisydiff/issues/detail?id=25

What steps will reproduce the problem?
1. Use programmatic comparison using the HtmlDiffer API.
2. The Strings in comparison have spacing characters i.e.  
3. The output result replaces &nbsp with a special character - > á

What is the expected output? What do you see instead?

The output result replaces &nbsp with a special character - > á
It should consider   as just a space and display it properly ( )
I basically used the Unit test code for comparison HtmlTestFixture.java

What version of the product are you using? On what operating system?


Please provide any additional information below.


codesite...@google.com

unread,
Aug 10, 2010, 11:21:51 AM8/10/10
to dais...@googlegroups.com

Comment #1 on issue 25 by kkapelon: The HtmlDiffer comparison does not work

I tried to reproduce this with no success. Can you post the exact strings
you are trying to diff with HtmlTestFixture.java ?

Also you could always do some pre-preprocessing before passing the input
strings to DaisyDiff. I am actually using : input =
input.replaceAll(" "," "); in production code. Maybe this might solve
your problem as well.

codesite...@google.com

unread,
Aug 11, 2010, 1:50:44 PM8/11/10
to dais...@googlegroups.com

Comment #2 on issue 25 by dominic.net: The HtmlDiffer comparison does not
The code that I am using is:

HtmlTestFixture d = new HtmlTestFixture();
String one = "<p>Style sheets represent a major breakthrough for
&nbsp;&nbsp;&nbsp;\n Web page designers,expanding their ability to improve
the appearance of their pages. </p>";
String two = "<p>Style sheets represent a major breakthrough for
&nbsp;&nbsp;&nbsp; Web page designers,expanding their ability to improve
the appedfarance oops i am new of their . </p>";
String result = d.diff(one, two);
System.out.println(result);

And the output I get is:

<?xml version="1.0" encoding="UTF-8"?><p>Style sheets represent a major
breakthrough forááá Web page designers,expanding their ability to improve
the <span class="diff-html-removed" id="removed-diff-0"
previous="first-diff" changeId="removed-diff-0"
next="added-diff-0">appearance </span><span class="diff-html-added"
id="added-diff-0" previous="removed-diff-0" changeId="added-diff-0"
next="removed-diff-1">appedfarance oops i am new </span>of their <span
class="diff-html-removed" id="removed-diff-1" previous="added-diff-0"
changeId="removed-diff-1" next="last-diff">pages</span> . </p>

which is almost perfect except for the á characters instead of &nbsp;

input = input.replaceAll("&nbsp;"," "); will not solve the problem as you
will lose the data about how much space is present between two words or
sections unless the text is between quotes.



codesite...@google.com

unread,
Aug 16, 2010, 9:07:56 AM8/16/10
to dais...@googlegroups.com

Comment #3 on issue 25 by kkapelon: The HtmlDiffer comparison does not work

3 points.

1. I tried your example with HtmlTestFixture and got normal spaces (not
nsbp but not strange characters either).

2. The HtmlTestFixture is very simple (just for unit tests). For production
quality code I would advise you to look at the main method that performs
several other cleanups. Normal DaisyDiff does exactly what you want (see
attached screenshot)

3. Can you clarify what data is lost by the "replaceAll" method? In your
example if I run this method then I still have the information that 3
spaces exist before newline. What data is lost? What is the difference
if the text is in quotes or not?

Attachments:
nbsp.png 77.2 KB

codesite...@google.com

unread,
Aug 16, 2010, 11:57:26 AM8/16/10
to dais...@googlegroups.com

Comment #4 on issue 25 by dominic.net: The HtmlDiffer comparison does not

I really dont understand how this is working at you end..could be a JVM
issue?

May be I could try some other code as you suggested..

What I meant by you cant use input.replaceAll("&nbsp;"," ") can be
explained by viewing the below code in a browser.

<p>hello how are you</p>
<p>hello how are you</p>

The output will be the same.

codesite...@google.com

unread,
Nov 19, 2010, 2:53:09 AM11/19/10
to dais...@googlegroups.com

Comment #5 on issue 25 by mcdoctore: The HtmlDiffer comparison does not

I had the same issue with the &nbsp;
In my case, htmldiff was replacing the &nbsp; correctly to ' ', in UTF-8.
On the other hand, my browser was configured to char encoding != UTF-8.
Solution: configure your browser char encoding to UTF-8.

codesite...@google.com

unread,
Nov 19, 2010, 10:20:36 AM11/19/10
to dais...@googlegroups.com

Comment #6 on issue 25 by kkapelon: The HtmlDiffer comparison does not work

dominic, can you check your browser settings?

Maybe what mcdoctore is suggesting is a solution?

codesite...@google.com

unread,
Nov 19, 2010, 11:31:37 AM11/19/10
to dais...@googlegroups.com

Comment #7 on issue 25 by dominic.net: The HtmlDiffer comparison does not

It is working now..Thanks

codesite...@google.com

unread,
Nov 20, 2010, 5:51:17 AM11/20/10
to dais...@googlegroups.com
Updates:
Status: Done

Comment #8 on issue 25 by kkapelon: The HtmlDiffer comparison does not work

Closed since it was apparently a browser issue.

Reply all
Reply to author
Forward
0 new messages