bro....@gmail.com
unread,Aug 7, 2018, 10:18:29 AM8/7/18Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to librsync
any idea how the delta stuff performs on really large files?
with earlier versions of librsync, it was really bad. like exponentially bad. a 1 GB file could be done in maybe an hour, but a 20 GB file would take a couple days, and a 40 GB file ... i killed it after 10 days so i don't know how long it would have taken.
i'm looking at the hashtable code and it doesn't look to me like it will handle collisions well. but what do i know. i can't figure out this:
reserving zero for empty buckets, and iterating with index i and entry hash
h, terminating at an empty bucket. */
# define _for_probe(t, k, hk, i, h) \
const unsigned mask = t->size - 1;\
unsigned hk = KEY_HASH((KEY_T *)k), i, s, h;\
hk = hk ? hk : -1;\
for (i = mix32(hk) & mask, s = 0; (h = t->ktable[i]); i = (i + ++s) & mask)
/* Loop macro for probing table t for key k, setting hk to the hash for k
i think when there's collisions it just puts them into the table in the order they arrive, duplicates too. if the hashtable is full, it seems to me like it will loop, cause i think the add just looks for an empty entry. lookups appear sequential.
but i don't understand the consequences of the incrementer in the for. i = (i + ++s) & mask. if i is zero we get 0,1,3,6,10,15,21 etc. other i's are the same but shifted.
is it possible this method of skipping through the hashtable is a way of dealing with collisions?