OK. Looks like the version of Peregrine in burton-structreader-release is executing for larger graphs.
I fixed a bunch of bugs in the last three weeks. This is about a 40G job ... but I think it should support larger jobs now.
Was thinking that calling this 0.5.5 doesn't really articulate how much work went into this release and I might call it something like 0.9.0 or just 1.0 ...
The next major step is to support globally sorted files by fields OTHER than the key.
This way the final step of page rank would be to sort the by the 'rank' field and then you can run fscat to get say the top 1000 high ranking nodes.
I also bumped up the HTTP chunk size to 256k and it's a configuration directive now.
Right now if you emit() with a value > 256k it gets fragmented and peregrine breaks.
I need to figure out a way around this.