Do I understand you correctly that PB2 does not support random access?
That seems odd.
-Matt
>> One drawback that I am seeing is that PB2 expects a complete
structure when decoding (e.g. you need the entire encoded file read
into memory to decode it back to arrays of doubles. )
Oof - that probably doesn't scale for mass spec data.
>> But all that zipping and encoding take CPU.
I've generally found that data compression is better than free in terms of
overall performance. CPUs just continue to get faster but disk and network
speeds are more or less stalled out, so reducing bandwidth usage at the
expense of a bit more CPU usage is usually a win.
I've been steadily working my way through the TPP tools (and pwiz, and
LabKey's CPAS), to make .mzXML.gz, .mzdata.gz, and .mzML.gz behave as
additional native input/output formats (also .pep.xml.gz, .prot.xml.gz,
.fasta.gz, etc). This is of interest since I'm working on making TPP and
CPAS amenable to use in the Amazon compute cloud, where network bandwidth
and disk storage are metered, so less is better (but that's true everywhere,
really).
Brian