There's a few new-ish features that I thought folks might be interested in or at least want to be warned about.
1.
using heavy/light pairing information (e.g. from 10x data). Mostly to improve clustering: if you know which igh goes with your light chain sequences, you can avoid the gross overmerging that's inherent to single-chain light repertoires. You also get a significant, although smaller, improvement in heavy chain clustering. It's also useful/necessary for
choosing abs. Paper in preparation.
2. subcluster annotation. Annotations for individual clusters in partis have implicitly assumed a "star-tree" phylogeny, in the sense that the hmm calculates emission probabilities for each (position in) each sequence independently. This is obviously a bad assumption for some trees, and its most salient effect was occasional insertions/deletions that were clearly much too long (although inferred naive sequences were accurate). The most accurate/obvious way to fix it, doing full phylogenetics (i.e.
linearham), works fantastically, but speed-wise can really only be run on individual lineages, not full repertoires. Partis's new[ish] "subcluster annotation" is basically a middle ground -- it breaks each family up into smaller sub-families, calculates annotations for each sub-family, then iteratively merges the resulting sub-family inferred naive sequences into new sub-families, until you've effectively built a heuristic (non-star) tree to use for annotation. Another way of describing it is that instead of taking the dumb/plain consensus at each position, it effectively weights the consensus in such a way as to down-weight mutations that are shared by many sequences (since they likely stem from a single mutation event).
3. LB ratio tau value. The performance metric that we used to evaluate affinity-change metrics like lbr turns out to have some issues. The fix is
described here. Basically, the fix brings lb ratios in line with lb indices: tau should be 1/seq_len for both, and the amino acid versions of both (aa-lbi and aa-lbr) substantially outperform the nucleotide versions. This thus changes the default behavior of selection metric calculation.
4. LB metrics multiplicity. Described at the end of the
previous note, we switched to a more sensible implementation of multiplicity in the lb metric calculations.