clonedigger (r184) is failing to detect a trivial clone

16 views
Skip to first unread message

zpcspm

unread,
Sep 8, 2008, 3:53:30 AM9/8/08
to Clone Digger general
I've tried to run clonedigger on this simple Python example:

--- cut here ---
#!/usr/bin/env python

class A:

def aaa(self):

self.a1 = 1
self.a2 = 2
self.a3 = 3
self.a4 = 4
self.a5 = 5
self.a6 = 6
self.a7 = 7
self.a8 = 8
self.a9 = 9

class B:

def bbb(self):

self.b1 = 1
self.b2 = 2
self.b3 = 3
self.b4 = 4
self.b5 = 5
self.b6 = 6
self.b7 = 7
self.b8 = 8
self.b9 = 9
--- cut here ---

Here is the log:

$ python clonedigger.py /tmp/example1.py
Parsing /tmp/example1.py ... done
5 sequences
average sequence length: 4.400000
maximum sequence length: 9
Number of statements: 22
Calculating size for each statement... done
Building statement hash... done
Number of different hash values: 3
Building patterns... 4 patterns were discovered
Choosing pattern for each statement... done
Finding similar sequences of statements... 18 sequences were found
Refining candidates... 0 clones were found
Removing dominated clones... 0 clones were removed

I have also tried a variation of command-line options:

$ python clonedigger.py --fast /tmp/example1.py
Parsing /tmp/example1.py ... done
5 sequences
average sequence length: 4.400000
maximum sequence length: 9
Number of statements: 22
Calculating size for each statement... done
Building statement hash... done
Number of different hash values: 3
Marking each statement with its hash value
Finding similar sequences of statements... 19 sequences were found
Refining candidates... 0 clones were found
Removing dominated clones... 0 clones were removed

$ python clonedigger.py --clusterize-using-dcup --hashing-depth=0 /tmp/
example1.py
Parsing /tmp/example1.py ... done
5 sequences
average sequence length: 4.400000
maximum sequence length: 9
Number of statements: 22
Calculating size for each statement... done
Building statement hash... done
Number of different hash values: 3
Marking each statement with its hash value
Finding similar sequences of statements... 19 sequences were found
Refining candidates... 0 clones were found
Removing dominated clones... 0 clones were removed

I have also tried to run clonedigger against a variation of the
original example:

--- cut here ---
#!/usr/bin/env python

class A:

def aaa(self):

self.a1 = '1'
self.a2 = '2'
self.a3 = '3'
self.a4 = '4'
self.a5 = '5'
self.a6 = '6'
self.a7 = '7'
self.a8 = '8'
self.a9 = '9'

class B:

def bbb(self):

self.b1 = '1'
self.b2 = '2'
self.b3 = '3'
self.b4 = '4'
self.b5 = '5'
self.b6 = '6'
self.b7 = '7'
self.b8 = '8'
self.b9 = '9'
--- cut here ---

The logs for running "python clonedigger.py", "python clonedigger.py --
fast" and "python clonedigger.py --clusterize-using-dcup --hashing-
depth=0" against this code snippet are identical with the above ones:
0 clones are found.

I wonder why clonedigger is failing to detect a clone here, since I'm
pretty sure that the definition "two sequences of statements form a
clone if one of them can be obtained from the other by replacing some
subtrees" applies to these code snippets.

Peter Bulychev

unread,
Sep 8, 2008, 3:58:32 AM9/8/08
to clonedigg...@googlegroups.com
Hi.
I wonder why clonedigger is failing to detect a clone here, since I'm
pretty sure that the definition "two sequences of statements form a
clone if one of them can be obtained from the other by replacing some
subtrees" applies to these code snippets.
There are also thresholds.  CD is looking for large enough and similar enough clone pairs. Here you have a lot of differences in names and constants.

If you run it with --distance-threshold=30 (or smth like this) argument, probably you'll detect this clone.

--
Best regards,
Peter Bulychev.

zpcspm

unread,
Sep 8, 2008, 4:23:17 AM9/8/08
to Clone Digger general
On Sep 8, 10:58 am, "Peter Bulychev" <peter.bulyc...@gmail.com> wrote:
> If you run it with --distance-threshold=30 (or smth like this) argument,
> probably you'll detect this clone.

This works, thank you. CD is reporting false positives (like the help
promises), but fortunately the first clone is the biggest one. So
after a code refactoring all those false positives would vanish.

Peter Bulychev

unread,
Sep 8, 2008, 4:25:31 AM9/8/08
to clonedigg...@googlegroups.com
You are welcome :)

2008/9/8 zpcspm <zpc...@gmail.com>

zpcspm

unread,
Sep 8, 2008, 4:35:28 AM9/8/08
to Clone Digger general
I've just realized that many future threads of this group could be
summarized into typical Frequently Asked Questions. Peter, what do you
think about the idea of CD having a FAQ file?

Summary for this thread:

Q: I'm running CloneDigger against a code snippet and it detects 0
clones, even if I can see that there are clones in the code.
A: To make CloneDigger detect more clones, try variations of command-
line options. Try "clonedigger.py --clusterize-using-dcup --hashing-
depth=0". If this doesn't help, also add the --distance-threshold
option with an explicit value bigger than the default one. Increase
the value of --distance-threshold until you are satisfied with the
result.

Peter Bulychev

unread,
Sep 8, 2008, 4:38:50 AM9/8/08
to clonedigg...@googlegroups.com
Good idea.

Hopefully I'll do that later and surely your QA will be there :)

2008/9/8 zpcspm <zpc...@gmail.com>
Reply all
Reply to author
Forward
0 new messages