Suggestion: improvement of HTML reports

zpcspm

unread,

Sep 8, 2008, 5:26:36 AM9/8/08

to Clone Digger general

By looking at the HTML source of CD reports I can see that CD gets a
file part that contains the clone and inserts some HTML markup for
code highlight reasons. I think it would be handy if it would also
replace leading whitespaces with ' ' in code lines. This would
keep proper code indentation in HTML reports (which is important
especially for Python). Of course, this requires the analyzed code to
be indented properly with whitespaces (not tabs or mixes of
whitespaces and tabs). So if a coder would see a piece of badly
indented code in the report, that could mean that his original code is
not indented properly.

Of course, the main idea is not to warn the coder about bad code
indentation (there are other tools for that), but to improve the
readability of HTML reports.

Peter Bulychev

unread,

Sep 8, 2008, 6:05:03 AM9/8/08

to clonedigg...@googlegroups.com

Hi.

There was an indentation support in the beginning of the project, but than it became broken. I noticed that but didn't fixed that because I thought it was not so important. Hopefully I'll fix it in future.

2008/9/8 zpcspm <zpc...@gmail.com>

--
Best regards,
Peter Bulychev.

zpcspm

unread,

Sep 9, 2008, 2:10:58 PM9/9/08

to Clone Digger general

Here is a quick hack that keeps indentation in reports. I have tested
it only on one example, so I suggest further tests against snippets
that are known to have clones containing code with different levels of
indentation.

Basically the algorithm does the following: it splits a multiline
string (one that contains newline chars) into lines, replaces leading
spaces with   on every line and joins the lines back to a
multiline string.

--- cut here ---
Index: html_report.py
===================================================================
--- html_report.py (revision 187)
+++ html_report.py (working copy)
@@ -176,8 +176,25 @@
rec_correct_as_string(s1, s2,
u.getSubstitutions()[0].getMap().values(), u.getSubstitutions()
[1].getMap().values() )
d = [None, None]
for j in (0,1):
- d[j] =
statements[j].ast_node.as_string().replace('\n', '<BR>\n')
+ d[j] =
statements[j].ast_node.as_string()

+ lines = d[j].split('\n')
+ for ii in range(len(lines)):
+ temp_line = ''
+ jj = 0
+ try:
+ while lines[ii][jj] == ' ':
+ temp_line += ' '
+ jj += 1
+ except IndexError:
+ # suppress errors if line has
no leading spaces
+ pass
+ temp_line += lines[ii][jj:]
+ lines[ii] = temp_line
+ d[j] = '\n'.join(lines)
+
+ d[j] = d[j].replace('\n', '<BR>\n')
+
except:
print 'The following error occured during
highlighting of differences on the AST level:'

traceback.print_exc()
--- cut here ---

Peter Bulychev

unread,

Sep 12, 2008, 7:14:06 AM9/12/08

to clonedigg...@googlegroups.com

Thank you for your patch. I've put it to SVN.

p.s. I think, it's better to attach patch to letter rather than to put it to letter body, because gmail performs some modifications (wrapping long lines and etc)

2008/9/9 zpcspm <zpc...@gmail.com>

zpcspm

unread,

Sep 12, 2008, 8:23:13 AM9/12/08

to Clone Digger general

On Sep 12, 2:14 pm, "Peter Bulychev" <peter.bulyc...@gmail.com> wrote:
> ... gmail performs some modifications (wrapping long lines and etc)

This is also a sign that the code has too much indentation levels,
which makes me think about refactoring .
And I've added even more indentation levels when patching inline
instead of writing a class method or a function :(

However, your point is taken. I must try to break my habit and start
using http://groups.google.com/group/clonedigger_general/files to
upload patches as files.

Peter Bulychev

unread,

Sep 13, 2008, 11:44:10 AM9/13/08

to clonedigg...@googlegroups.com

Yes, the html_report.py was not coded in a very good style :)

2008/9/12 zpcspm <zpc...@gmail.com>

Reply all

Reply to author

Forward