[I'll just note in passing the irony of the author of patchbomb using a
pastebin (on github of all places).]
Well performance is good (though I had to tweak perf.py to bypass our
new filecache scheme):
$ hgs perfindex
! wall 0.168411 comb 0.160000 user 0.160000 sys 0.000000 (best of 54)
$ hg perfindex
! wall 0.006424 comb 0.010000 user 0.000000 sys 0.010000 (best of 441)
But it doesn't quite actually work for me, something is screwy with the
offset handling that's causing a seek to get an invalid arg. With the
debugger I'm seeing:
(Pdb) self.index[rev]
(-669319168L, 331, 431, 16259, 16260, 16259, -1, '\x02\xb5p\x13?\xfe
\xa8B\xbfZ\xcfr\xb6\xf1;\xdd\x9e\\}+')
and with contrib/debugshell.py against stable, I'm seeing:
>>> repo.changelog.index[16260]
(205489111040L, 331, 431, 16259, 16260, 16259, -1, '\x02\xb5p\x13?\xfe
\xa8B\xbfZ\xcfr\xb6\xf1;\xdd\x9e\\}+')
>>> hex(-669319168L)
'-0x27e50000L'
>>> hex(205489111040L)
'0x2fd81b0000L'
>>> hex(-669319168L & 0xffffffff)
'0xd81b0000L'
I see: ntohl is getting implicitly defined.
Ok, now tip is still not much faster: it goes from 0.401s to 0.320s on
my test repo. Turns out the bulk of the time here is spent checking that
270 tags point to valid nodes. And doing node->rev mapping is the other
thing that could really stand to be optimized. Disable the valid node
check and it now takes .069s (compared to perfstartup of .025s).
..and that's enough tinkering for this evening.
--
Mathematics is the supreme nostalgia of our time.
_______________________________________________
Mercurial-devel mailing list
Mercuri...@selenic.com
http://selenic.com/mailman/listinfo/mercurial-devel
[I'll just note in passing the irony of the author of patchbomb using a
pastebin (on github of all places).]
But it doesn't quite actually work for me, something is screwy with the
offset handling that's causing a seek to get an invalid arg.
Ok, now tip is still not much faster: it goes from 0.401s to 0.320s on
my test repo. Turns out the bulk of the time here is spent checking that
270 tags point to valid nodes. And doing node->rev mapping is the other
thing that could really stand to be optimized. Disable the valid node
check and it now takes .069s (compared to perfstartup of .025s).
Yeah, that's probably not enough to be portable. I copied the bits from
the top of parsers.c. But now I have a slightly different thought..
It turns out to be a bit of a bother to bump our extension APIs due to
the need for everyone running from source to do rebuilds/repackaging.
It's especially annoying when bisecting across the change. The last time
we did it was for parse_index2 and I think we should aim to avoid it
when it's easy to do so. Ok, so how would we do that? Simple: roll the
index.c code into parsers.c and have parse_index2 return our new index
class rather than a Python list object.
> Ok, now tip is still not much faster: it goes from 0.401s to 0.320s on
> > my test repo. Turns out the bulk of the time here is spent checking that
> > 270 tags point to valid nodes. And doing node->rev mapping is the other
> > thing that could really stand to be optimized. Disable the valid node
> > check and it now takes .069s (compared to perfstartup of .025s).
> >
>
> I could synthesize a repo and look into that, or perhaps you could send me
> a pointer to your carefully tweaked one?
I'm testing on my linux-kernel mirror:
http://selenic.com/repos/linux-2.6
But really all that matters is that you have a tag that points to a node
early in history. Discovering the revision of an early node will cause
the nodemap to be nearly fully populated:
http://www.selenic.com/hg/file/7b9bf72430ba/mercurial/revlog.py#l291
And we look up each tag node to sanity-filter the tag list, so any time
we do anything with tags (including tip!), we'll look for that old node:
http://www.selenic.com/hg/file/7b9bf72430ba/mercurial/localrepo.py#l432
Comment out that line and it's fast.
> As far as approach is concerned, the obvious thing to do is build a dict
> and look it up, but that has poor performance if there are lots of revs.
That's mostly because there are tons of Python objects that need to be
dynamically allocated and the table ends up getting resized a few times
too. If we created our own pre-sized hash table that was just an array
of ints (check key = index[hash[f(node)]), we could build it in one
allocation.
But it turns out a dict isn't really the data structure we want at all,
because it can't do prefix matching. That means looking up "7b9bf72"
degrades to linear search!
> The next obvious approach is to dump all the (node ID,rev) pairs into a
> single big malloced buffer, sort, and binary search for lookups. That's
> almost certain to be faster, but it's still a lot of work up front, so the
> startup cost might be unpleasant. If it was found to matter, cost of
> sorting could be amortized over multiple lookups (as the current revlog.rev
> implementation does) using a gapped insertion sort, maybe.
I think the ideal solution is a base-16 radix tree:
struct nodetree {
struct *nodetree[16];
}
#define leafnode(ptr) ((ptr & 1))
#define torev(ptr) (((int)ptr) >> 1)
#define toptr(rev) ((struct *nodetree)((rev << 1) + 1))
With 1M nodes, it'll have a depth of ~6 and a search has a cache
footprint of about 6 cachelines. And it's obviously a perfect match for
prefix search. Downside is allocating/freeing ~N/16 nodetree structures
of up to 128 bytes.
Alternately, we could use a heap in a pre-allocated index-length array
of revs, and using node(rev) as the heap key. Rather than actually doing
a full heap-sort, we can leave it heapified and do prefix search. Search
depth is more like 20 for 1M nodes and a search has a cache footprint of
more like 40 cachelines (ouch!).
But either way, we can build it incrementally as we search back from tip
for a node (much like the current code revlog.rev() code).
Simple: roll the
index.c code into parsers.c and have parse_index2 return our new index
class rather than a Python list object.
I think the ideal solution is a base-16 radix tree:
Argh, you beat me to it. I started hacking around on a readonly pure C
revlog library as a vacation project a couple of summers ago. It
stalled when I came back to the real world *sigh*. It's here, for the
record:
The good stuff's in
http://hg.gerg.ca/xrevlog/file/default/lib/revlog.c
And now I have to go read up on radix trees... geez it's hard to keep
up around here. ;-)
Greg
--
Greg Ward http://www.gerg.ca/
Quick!! Act as if nothing has happened!
For the record, Wikipedia's idea of what a radix tree is is completely
different than mine. Mine is more like this:
https://lwn.net/Articles/175432/
In other words, take a binary tree and make it 16-way. Then given a hash
prefix like "abcde", we do at most 5 steps before we find a tree leaf
node, discover it's not present, or discover it's ambiguous.
--
Mathematics is the supreme nostalgia of our time.
Also for the record. If I understood correctly, those are called tries:
http://en.wikipedia.org/wiki/Trie
Radix trees (or PATRICIA tries) are an space optimization of tries. The
structure mentioned in the LWN article is just a trie with a 64 symbol
alphabet.
Cheers.
--
Isaac Jurado
"The noblest pleasure is the joy of understanding"
Leonardo da Vinci