Merge functionality

7 views
Skip to first unread message

Colin Fleming

unread,
Jun 18, 2010, 6:47:55 AM6/18/10
to ning-tr...@googlegroups.com
Hi all,

Slightly edited post from my comment at
http://www.cowtowncoder.com/blog/archives/2010/06/entry_401.html:

This is curiously exactly what I need right now. This is for a large
migration from a series of huge xml files, they contain a series of
objects with relations between them so being able to check which IDs
are present and their position in the file will be really useful (trie
from byte[] key to long position in mmap'ed file).

My use case is slightly different in that I have to build the trie at
the start of the process, which in a naive implementation would mean
that we'd need the whole mapping in memory. The merge would be
extremely useful for this. This would mean that I could scan once over
the file, create a trie of mappings in blocks of 10k elements and then
merge them into the final trie. This would avoid the need to maintain
the whole map sorted in memory at once, at the cost of having to
create and throw away a series of persistent data structures.

Cheers,
Colin

Tatu Saloranta

unread,
Jun 18, 2010, 12:00:32 PM6/18/10
to ning-tr...@googlegroups.com

Yet another funny thing is this: I have also half-cleaned-up code for
doing standard on-disk merge sort
(http://github.com/sunnygleason/g414-sort).
I have used base code for building large data sets where ordering is
needed or helps significantly improve performance (turns out it's
faster to sort on client side than let oracle do it -- server is
shared, client has all cpu and i/o it can use). I plan to use this
myself when building tries.

But this may not help you of course. So merging functionality is the
next thing to work on, once I find little bit more time to work on
open source things, maybe next week.
Iteration over data sort of exist in 'DumpTrie' tool, but it's not
general purpose, as I wrote it during development to test/verify
resulting structure.

-+ Tatu +-

Reply all
Reply to author
Forward
0 new messages