Gerrit currently has a hard coded limit to not display whole file diff
on files larger than 9000 lines (?, actually looking at the code it
compares the value against the returned value of the .size() method on
a org.eclipse.jgit.diff.RawText object which makes me think it's
checking character size not lines but I could not find online
documentation for the JGit API to check further).
We have many users doing development on kernel/sched.c in the Linux
kernel source tree. This file is not being displayed wholly by Gerrit
diff page because of the limit. I have increased it in the code (for
testing) and did some tests with Chrome6 and Firefox3.6 and the
numbers are (on my machine):
- Chrome6: with/without syntax highlighting it takes about 55 seconds
to display the file and while doing that it prompts the user twice
with a warning that the script possibly hanged
- Firefox3.6: with syntax highlighting it takes about 30 seconds (and
prompts the user once) and without it takes about 7 seconds (almost
acceptable)
Our internal code review tool (our users are used to that) displays
the same file in 3-4 seconds so that never was an issue for us before
Gerrit.
I need to have Gerrit display full size contents for that file in
acceptable time. Unfortunately it seems to be a more complex issue,
probably because Gerrit is doing (too) much on the client side. If so,
wouldn't it be possible to disable some of this processing for larger
files (loosing some features like syntax highlighting)?
Thanks!
--
Mihai Rusu
Its lines. RawText breaks the byte array input into lines by looking
for ASCII LFs to delimit the line records. size() returns the number
of records that were identified. (And it can be 1 more than the
number of LFs in the file because the last "line" might be missing a
trailing LF.)
> We have many users doing development on kernel/sched.c in the Linux
> kernel source tree. This file is not being displayed wholly by Gerrit
> diff page because of the limit. I have increased it in the code (for
> testing) and did some tests with Chrome6 and Firefox3.6 and the
> numbers are (on my machine):
> - Chrome6: with/without syntax highlighting it takes about 55 seconds
> to display the file and while doing that it prompts the user twice
> with a warning that the script possibly hanged
> - Firefox3.6: with syntax highlighting it takes about 30 seconds (and
> prompts the user once) and without it takes about 7 seconds (almost
> acceptable)
We should tell the Chrome folks that Firefox 3.6 is doing this
particular script faster. Maybe they can fix something that is busted
in WebKit or V8 that is causing Chrome to run this badly. But either
way, yes, even 7 seconds in Firefox just isn't acceptable.
The issue is we are building up the entire table as a single string
inside of JavaScript. This isn't an efficient operation to do in
JavaScript. And its really not nice to the browser because we are
doing it on the event loop, without yielding. That is the cause of
the long running script prompts, its because we really aren't yielding
back to the UI. I have noticed that in most browsers if you don't
yield back to the UI every so often JavaScript execution slows to a
crawl. It might be the case that they aren't doing GC until you yield
back... and we are building a lot of garbage as we go. By not
yielding we are forcing the JavaScript runtime to increase its heap to
a massive size, and then do gigantic garbage collection sweeps rather
than smaller incremental units.
We already try to handle "large" tables in the patch set display by
using a progress meter and an incremental command that processes like
50 rows at a time, and then yields to the browser and runs the next
group a few milliseconds later. This has resulted in faster display
for patch sets containing thousands of files. Like gigantic merges or
imports from upstream repositories. But I didn't bother to do it with
the patch display code itself because its rather ugly to code, and I'm
just plain lazy.
One work around for now is to try and implement the callback approach
to file display too. That way we can display a larger file and make
the user wait, with a cute^Wannoying little progress meter to let them
know the browser is still chugging. We wouldn't need to lose syntax
highlighting right away either, we could still permit that to occur...
and we might see display times in Chrome for sched.c drop from 55
seconds to something more reasonable to wait for.
> Our internal code review tool (our users are used to that) displays
> the same file in 3-4 seconds so that never was an issue for us before
> Gerrit.
Another solution is to render the huge files on the server and ship
back a giant HTML block rather than a big JSON structure. That is
what the internal code review system is doing. Its at least 2x the
amount of data to send back, because we have to send the common
context twice (rather than the once that we do now in the JSON
structure), not to mention the HTML encoding tags. But the server can
more efficiently produce that giant string.
But, doing this on the server means we lose syntax highlighting. The
current highlighting package we use runs only in JavaScript, and
Mozilla Rhino to run it on the server is simply too slow (I know, I've
tried). Though there may be another option. I need to talk to
another Googler, but we might have a library that he started and
hasn't yet open sourced that can do syntax highlighting in Java. Its
laying in the internal repository, and might just be abandoned at this
point. The biggest downside to open sourcing it is, I'll probably
have to do the work, and then will need to maintain it myself. :-\
But it claims to be pretty efficient Java implementation of syntax
highlighting, which would run well on the server side JVM.
Anyway, sorry you have trouble with sched.c in the kernel. But there
isn't a whole lot we can do right now without redoing at least some of
this rendering logic. Someone needs to dig into that code and do
something to it. :-)
There exists will to finish that library...
That may be true. But I really don't need yet another package to be
maintainer of. I had to give up EGit, git-gui and git fast-import
over the past year because I just can't do everything. I'm trying
hard just to keep up with JGit and Gerrit Code Review. :-\
Anyway, I'll look into it. There's no reason we can't open source the
library, other than the fact that we have to put it into a new version
control system, make sure the copyright notices are accurate, and get
over our internal dislike for throwing code over the wall. (We
typically don't open source something unless we are actively using it
ourselves and will maintain it. I don't think this project was ever
finished, let alone is in active use within Google.)
Thanks for the explanations, this is very useful.
> Anyway, sorry you have trouble with sched.c in the kernel. But there
> isn't a whole lot we can do right now without redoing at least some of
> this rendering logic. Someone needs to dig into that code and do
> something to it. :-)
I have a patch ready that adds 2 configurable server options, to set a
limit for syntax highlighting and another one to set a limit on whole
file transfers. I'm setting the first to 5000 (the default used by
Gerrit right now) and the second to a huge value. We are currently
using this on the live server as an acceptable workaround. Let me know
if you want this in Gerrit too (according to your explanation above,
that is unlikely).
Apparently the issue of >50 seconds for Chrome is something that
happens only on my Chrome6/latest stable browser on Linux, in Chrome6
on OS X laptop it works a little faster than Firefox (and for other
people on their Chrome6/7 on their Linux systems). So I think it'a
safe to say that Chrome is generally faster than Firefox (with my
system being the exception). Also, you may be right about it being so
slow because of it not yielding to the browser and/or not collecting
the memory, while loading the page the memory usage explodes (both
Chrome and Firefox) to over 170mb, and this is for kernel/sched.c a
224kb file.
--
Mihai Rusu
I'd be willing to include this in gerrit. I would like to release
2.1.6 in early November after I get back, without doing too many more
changes to the code. Fixing large file display for real is probably
going to take a lot more work than this configuration option, so maybe
we should do this as a short-term resolution while we work on server
side highlighting.
And we have now open-sourced that library:
http://code.google.com/p/jgments/
Its a Java port of the Python Pygments package, which is a rather
popular syntax highlighting library for Python. The port works by
using Python to evaluate many of the language parsers and spit out
Java code that supposedly does the same thing. Consequently there
isn't much of a port, but more a thing to help you do the port at
build time.
I think Jacob said there is still some work to be done on this
library, but its at least available.
Very cool! Thanks for the update. I'll try to get it to do something before the Git Together.