On Sep 22, 11:03 pm, "Peter Bulychev" <
peter.bulyc...@gmail.com>
wrote:
> At what point did Clone Digger stopped?
Like I said, it happened when CloneDigger was still parsing source
files. As soon as the python process ate all available virtual memory,
I didn't see any progress (CloneDigger proceeding to parse the next
file) for about one hour so I've just pressed Ctrl-C.
> Depending on the phase, on which the problem arised, I can suggest the
> following workarounds:
> using --fast option
> increasing --hashing-depth option
> increasing --size-threshold option
I'd be happy to see CloneDigger succeeding at this task with the
default set of options. Because I agree that the difference between
12MB and 1GB is very big. Since python frees memory for objects as
soon as there are 0 references to them, I'd have a blind shot and try
to assume that some objects just "live" longer than needed. I'm a bit
skeptical about the usage of inner functions and recursion in
CloneDigger, but I can't prove these are the bottlenecks.
> Also I suggest removing automatically generated sources and tests from the
> source tree of your project.
This advice has a practical point in terms of performance. Sometimes
I'd run CloneDigger against tests on purpose. Test code is still code,
so redundancy in test code is bad as well. And CloneDigger helps
finding it. I'm not very enthusiastic about excluding tests just for
the sake of masking a possible performance issue that eventually can
be fixed.