Perfect, thanks very much! I'm trying all-in-one partitioning now. I think this sounds promising based on the paper, particularly the bits about unbalanced trees and the subcluster annotation. (That figure S2 captures a lot of what's so hard about all this, and why I keep pushing back against ideas like "group based on >=80% CDR3 similarity" and the like when talking with colleagues!) I won't worry about seeded partitioning for the moment since it's easy enough to just let it run on everything and come back when it's ready.
From my one-timepoint case earlier, I got the default set of ten partitions, each labeled with logprob of -inf. I'd first assumed that was just an issue with writing a bunch of really tiny (pre-log) probabilities to YAML, but, is seeing "-inf" there a red flag? (Though if that's not a problem, are they ordered according to probability, or can I not assume a particular order?)
I'll check out those overclustering options too. (You're right, my approach here is to grab hold of any sequences plausibly related and then rule out the ones that aren't, so those options definitely look relevant.) With those options, is the idea that the output list of partitions gets as long as needed until at least one of those criteria are met in one partition in the list?
Jesse