TortoiseHg and the statdaemon extension

52 views
Skip to first unread message

Martin Geisler

unread,
Aug 28, 2012, 11:58:37 AM8/28/12
to thg...@googlegroups.com
Hi everybody,

I'm working on an extension that will speedup 'hg status' in large
working copies on Windows:

https://bitbucket.org/aragost/statdaemon

The extension works similar to the inotify extension: a daemon sits in
the background and listens for filesystem events. The daemon keeps an
up-to-date view of the filesystem and 'hg status' will ask the daemon
for stat information as needed -- see the README above.

The hope was that the extension would also transparently speed up
TortoiseHg. My client sees very slow status refreshes -- clicking the
"Working Directory" line in the workbench takes maybe 10 seconds. They
have around 65k files in the working copy.

However, TortoiseHg and statdaemon don't like each other. When I run the
Workbench in a repository where the extension is enabled, I first see
that dirstate.status is called twice. But when I click the "Working
Directory" line I see "child process failed to start" in the status bar.

That message comes from Mercurial: cmdutil.service will raise Abort with
that message if it cannot start the service. However, thg should not
start any service -- I was running 'hg stdaemon' manually in another
Command Prompt and I can see that thg is talking to the daemon fine at
the beginning.

The extension wraps distate.status and osutils.listdir. Does anybody
have any ideas for how I can debug this or know of any caveats in this
area?

--
Martin Geisler

Angel Ezquerra

unread,
Aug 28, 2012, 2:58:46 PM8/28/12
to thg...@googlegroups.com
Hi Martin,

I did try your extension with TortoiseHg and I did not see any such
problems. However I only used it with a medium sized repo (around 1000
files or so). Also I did not manually start the daemon, because I
thought that the extension would do it automatically for me (was I
wrong?). I did not notice a big change in the performance though, but
perhaps that is because the repository was not very big?

Cheers,

Angel

Steve Borho

unread,
Aug 28, 2012, 5:50:06 PM8/28/12
to thg...@googlegroups.com
Setting THG_DEBUG=1 in your environment may give you backtrace info on
stdout. Setting traceback=1 in your ini file may help as well.

--
Steve Borho

Martin Geisler

unread,
Aug 29, 2012, 10:54:02 AM8/29/12
to thg...@googlegroups.com


On Tuesday, 28 August 2012 20:58:47 UTC+2, Angel Ezquerra wrote:
On Tue, Aug 28, 2012 at 5:58 PM, Martin Geisler <mar...@geisler.net> wrote:
> Hi everybody,
>
> I'm working on an extension that will speedup 'hg status' in large
> working copies on Windows:
>
>   https://bitbucket.org/aragost/statdaemon
>

Hi Martin,

I did try your extension with TortoiseHg and I did not see any such
problems.

Excellent! When I tested with TortoiseHg 2.4.3 I could not get it working.
 
However I only used it with a medium sized repo (around 1000
files or so). Also I did not manually start the daemon, because I
thought that the extension would do it automatically for me (was I
wrong?).

You were correct, the extension will start the daemon on demand. That mechanism is still a bit crude in that it doesn't wait a bit to let the daemon start -- it just launches it and tries to connect right away. If that fails you get a second daemon started. There's also no automatic shutdown like the inotify extension had.
 
I did not notice a big change in the performance though, but
perhaps that is because the repository was not very big?

I guess so. I see "hg status" go from 3 sec to 2 sec on a repository with 70,000 files in the working copy. My client saw a drop from 3.5 sec to 3 sec on the same repository, but they use SSDs and so I expect that the savings would be smaller.

Martin Geisler

unread,
Aug 30, 2012, 7:22:16 AM8/30/12
to thg...@googlegroups.com
Martin Geisler <mar...@geisler.net> writes:

> On Tuesday, 28 August 2012 20:58:47 UTC+2, Angel Ezquerra wrote:
>>
>> On Tue, Aug 28, 2012 at 5:58 PM, Martin Geisler <mar...@geisler.net<javascript:>>
>> wrote:
>> > Hi everybody,
>> >
>> > I'm working on an extension that will speedup 'hg status' in large
>> > working copies on Windows:
>> >
>> > https://bitbucket.org/aragost/statdaemon
>> >
>>
>> Hi Martin,
>>
>> I did try your extension with TortoiseHg and I did not see any such
>> problems.
>
>
> Excellent! When I tested with TortoiseHg 2.4.3 I could not get it
> working.

I found the problem! The server could only talk with a single client at
a time and TortoiseHg (at least here) liked to issue multiple status
requests in parallel. The fixes are on Bitbucket.

--
Martin Geisler

aragost Trifork
Commercial Mercurial support
http://aragost.com/mercurial/

Angel Ezquerra

unread,
Aug 30, 2012, 7:39:43 AM8/30/12
to thg...@googlegroups.com
Martin,

given that your client seems to use TortoiseHg and not just bare
mercurial, and since he is having trouble using TortoiseHg with repos
that have a huge number of files, did you check whether a sizeable
amount of time is spent on the "GUI side", perhaps displaying the
manifest or the revision details? Maybe those areas could be
optimized? *hint* *hint* ;-)

Cheers,

Angel

Martin Geisler

unread,
Aug 30, 2012, 2:14:14 PM8/30/12
to thg...@googlegroups.com
Angel Ezquerra <angel.e...@gmail.com> writes:

> On Thu, Aug 30, 2012 at 1:22 PM, Martin Geisler <m...@aragost.com> wrote:
>> Martin Geisler <mar...@geisler.net> writes:
>>
>>> On Tuesday, 28 August 2012 20:58:47 UTC+2, Angel Ezquerra wrote:
>>>>
>>>> On Tue, Aug 28, 2012 at 5:58 PM, Martin Geisler <mar...@geisler.net<javascript:>>
>>>> wrote:
>>>> > Hi everybody,
>>>> >
>>>> > I'm working on an extension that will speedup 'hg status' in
>>>> > large working copies on Windows:
>>>> >
>>>> > https://bitbucket.org/aragost/statdaemon
>>>> >
>>>>
>>>> Hi Martin,
>>>>
>>>> I did try your extension with TortoiseHg and I did not see any such
>>>> problems.
>>>
>>>
>>> Excellent! When I tested with TortoiseHg 2.4.3 I could not get it
>>> working.
>>
>> I found the problem! The server could only talk with a single client
>> at a time and TortoiseHg (at least here) liked to issue multiple
>> status requests in parallel. The fixes are on Bitbucket.
>
> Martin,
>
> given that your client seems to use TortoiseHg and not just bare
> mercurial, and since he is having trouble using TortoiseHg with repos
> that have a huge number of files,

Btw, is 60-70k files really a huge amount these days? The OpenOffice
project had 69k files and Mozilla has 65k files.

The company I'm talking about has some Java code, some libraries, some
PDFs, etc and I don't think they're very unusual in their repository
structure. Sure, the repository is bloated and I've asked them to clean
it up, but it worked fine with Subversion :-/

> did you check whether a sizeable amount of time is spent on the "GUI
> side", perhaps displaying the manifest or the revision details?

I only looked a little at this since I don't know the TortoiseHg
codebase that well. I also found it difficult to do profiling since the
the profile tells me how much time there was spent in various function
since 'thg' was started until I quit it.

So I started with a simpler case: optimizing 'hg status'.

> Maybe those areas could be optimized? *hint* *hint* ;-)

I'm sure they can! I've noticed that fetchall is called twice in the
statdaemon (and thus status is called twice) when I click the Working
Directory line.

My clint also complain that it takes a very long time for them to switch
between different tabs when they have several different clones open.
I've asked them to come here and explain the problems directly.

--
Martin Geisler

Angel Ezquerra

unread,
Aug 31, 2012, 3:35:57 AM8/31/12
to thg...@googlegroups.com
Perhaps I should not have said huge, but I think it is definitely
"big". I don't think the average software product is as big as Mozilla
or OpenOffice. Those are two complex pieces of software, don't you
think?

> The company I'm talking about has some Java code, some libraries, some
> PDFs, etc and I don't think they're very unusual in their repository
> structure. Sure, the repository is bloated and I've asked them to clean
> it up, but it worked fine with Subversion :-/

By some definition of "worked fine" I guess :-P

Was their SVN client much faster? It's been a while since I looked at
TortoiseSVN but I seem to remember that TortoiseSVN lacks a
"workbench" and relies much more heavily on the Windows Explorer
context menus, doesn't it? If that is the case, and since SVN does not
give you access to other revisions I can imagine that they would see a
better performance while working with the current revision.

>> did you check whether a sizeable amount of time is spent on the "GUI
>> side", perhaps displaying the manifest or the revision details?
>
> I only looked a little at this since I don't know the TortoiseHg
> codebase that well. I also found it difficult to do profiling since the
> the profile tells me how much time there was spent in various function
> since 'thg' was started until I quit it.
>
> So I started with a simpler case: optimizing 'hg status'.
>
>> Maybe those areas could be optimized? *hint* *hint* ;-)
>
> I'm sure they can! I've noticed that fetchall is called twice in the
> statdaemon (and thus status is called twice) when I click the Working
> Directory line.

I think it would be awesome if you got a deeper understanding of our
codebase. Having another mercurial core developer on board would be
super-amazing. Our code base is not that complex, I think,
particularly if you understand the mercurial API as well as you do.
Since you now have a client that cares about it you have a great
excuse to look at our code and help us improve it further :-D

> My clint also complain that it takes a very long time for them to switch
> between different tabs when they have several different clones open.
> I've asked them to come here and explain the problems directly.

That'd be nice. I have not seen much problems with that myself, but as
I said I don't usually work with such big repositories.

Cheers,

Angel

Martin Geisler

unread,
Aug 31, 2012, 5:33:51 AM8/31/12
to thg...@googlegroups.com
Angel Ezquerra <angel.e...@gmail.com> writes:

> On Thu, Aug 30, 2012 at 8:14 PM, Martin Geisler <mar...@geisler.net> wrote:
>>> Martin,
>>>
>>> given that your client seems to use TortoiseHg and not just bare
>>> mercurial, and since he is having trouble using TortoiseHg with
>>> repos that have a huge number of files,
>>
>> Btw, is 60-70k files really a huge amount these days? The OpenOffice
>> project had 69k files and Mozilla has 65k files.
>
> Perhaps I should not have said huge, but I think it is definitely
> "big". I don't think the average software product is as big as Mozilla
> or OpenOffice. Those are two complex pieces of software, don't you
> think?

What's concerning me is that they're only 10-15 guys writing this
software. They've probably been writing on it for a decade or longer and
now they have 70k files in the working directory.

This must happen all the time -- a company works on a system for 5-10
years and build up a lot of cruft. If Mercurial and TortoiseHg cannot
scale to handle such a normal case, well then we have a problem.

>> The company I'm talking about has some Java code, some libraries,
>> some PDFs, etc and I don't think they're very unusual in their
>> repository structure. Sure, the repository is bloated and I've asked
>> them to clean it up, but it worked fine with Subversion :-/
>
> By some definition of "worked fine" I guess :-P

All I know what they they think Mercurial is slower than what they had
before. Sure merges are much, much better and there are other benefits
that make them use Mercurial, but the day-to-day experience is worse.

> I think it would be awesome if you got a deeper understanding of our
> codebase. Having another mercurial core developer on board would be
> super-amazing. Our code base is not that complex, I think,
> particularly if you understand the mercurial API as well as you do.
> Since you now have a client that cares about it you have a great
> excuse to look at our code and help us improve it further :-D

Yes, that would be great for everybody if they would pay for that...

But the problem is that I'm stopping at aragost and today is my last
day. Mercurial consulting was great and I've met some great people
around the world -- but there were also long periods with little to do
and this was very boring. I'll now be doing Python web development for a
small startup called Dealini here in Zurich. I'm really looking forward
to getting some good colleguages and learning new technologies.

--
Martin Geisler

Angel Ezquerra

unread,
Aug 31, 2012, 7:26:45 AM8/31/12
to thg...@googlegroups.com
That is awesome Martin, congratulations. I'm sure Aragost is sad to
see you go, but it is nice to have a big professional change from time
to time.

I hope, however, that you do not disappear from the mercurial "map"!
You are certainly one of the most friendly and knowledgeable faces in
the community...

Perhaps now that you will not be hacking on mercurial for work you'll
be able to start doing it for fun again? And if you want to learn new
technologies, I'm sure everybody here would love you to learn some QT
;-)

Cheers,

Angel

Steve Borho

unread,
Aug 31, 2012, 1:03:15 PM8/31/12
to thg...@googlegroups.com
good luck with the new work, Martin

--
Steve Borho

Martin Geisler

unread,
Sep 2, 2012, 10:30:16 AM9/2/12
to thg...@googlegroups.com
Thanks a lot! I don't plan on disappearing from the map, though I must
say that it's been disappointing to contribute to Mercurial the last
couple of years because of the endless and quite ugly discussions I've
had with Matt.

> Perhaps now that you will not be hacking on mercurial for work you'll
> be able to start doing it for fun again? And if you want to learn new
> technologies, I'm sure everybody here would love you to learn some QT
> ;-)

Yeah, I can imagine :) However, I actually only use TortoiseHg for
viewing the log and for annotate and there the GTK version (hgtk)
maintained by Henrik Stuart works fine for me.

--
Martin Geisler
Reply all
Reply to author
Forward
0 new messages