Kind of crazy to think this was almost a year ago! I just wanted to give everyone an update and possibly recruit some "fresh blood".
We've fully built out Serratus, and it's capable of aligning in excess of 1 million NGS datasets per day for a cost of under 1 cent per dataset. With this we've aligned 5.7 million libraries in the Sequence Read Archive (10.2 petabases) to uncover in excess of 100,000 novel species of RNA viruses (defined by RDRP identity >10% diverged). This is about an order of magnitude increase in the number of known RNA virus species that are available in GenBank and other public databases.
We now have a massive dataset of viruses to characterize, and an additional 11,200 assemblies of Coronaviruses (half are non-SARS-CoV-2). I've been doing some work on the evolutionary conservation of splice variants in CoV but if someone would be interested in helping out with this, or doing a systematic recombination analysis we're always looking for more collaborators.
If you have any interest in _any_ RNA virus, we also have literally thousands of uncharacterized viruses that need some TLC. Please do reach out!
Cheers,
Artem