diff performance

40 views
Skip to first unread message

Eric Huss

unread,
Dec 6, 2007, 7:36:18 PM12/6/07
to revie...@googlegroups.com
We have a few large files (the largest is about 1.5 megabytes) that cause
reviewboard to take a long time to display. I've narrowed it down to the
differ code (everything else just takes less than a second). Has anyone
looked at the differ performance, or have any hints what to look at? It
takes about 15 seconds to process that larger file which seems a little on
the slow side (especially compared to gnu diff which takes less than a
second to process the same file).

On a similar issue, when looking at the comments, it appears to run diff
on the same file multiple times if multiple users comment on the same
file. I haven't looked at this part of the code very closely yet, but I'm
wondering if there would be some way to cache the results.

Thanks,
-Eric

David Trowbridge

unread,
Dec 9, 2007, 8:28:49 PM12/9/07
to revie...@googlegroups.com
I don't think anyone's really looked at the performance of the differ
code. It's
mostly a port of the algorithm in GNU diff, but python has some
strange performance
characteristics. Since you've got some good test cases, care to do
some profiling? :)

-David

jsyjr

unread,
Dec 10, 2007, 8:45:45 AM12/10/07
to reviewboard
On Dec 9, 8:28 pm, "David Trowbridge" <trowb...@gmail.com> wrote:
> I don't think anyone's really looked at the performance of the differ
> code. It's mostly a port of the algorithm in GNU diff, but python
> has some strange performance characteristics. Since you've got some
> good test cases, care to do some profiling? :)

Given that reviewboard incorporates its own diff implementation and
that it is written in Python you might want to look at the bzr
project.
They too have an implementation of diff written in Python. They have
invested significant time and energy on performance. And they have
replaced the classic matching logic with an algorithm know as patience
diff. The claim is that it is available as a standalone program:

http://bramcohen.livejournal.com/37690.html

/john
Reply all
Reply to author
Forward
0 new messages