I have most of the work done for wikipedia rank.
Really happy how this is working out because this is a GOOD sizable data set with real world applications.
I wonder if Spinn3r should release another dataset similar to this but for blogs.
Anyway, when I was testing it I found another memory leak that I can now easily duplicate..
I'm not sure what's causing it because the JVM's memory keeps growing so maybe I'm doing something native and not returning resource.
I can easily duplicate it so I'm going to track down exactly what's happening and hopefully have a patch soon.
This wikipedia snapshot is a good test case because it's 35GB uncompressed which is a good chunk of data to play with.