Hi Pjotr,
I have a question/comment about a possible additional sorting criterion.
In our RNA-seq pipeline, we are using RSEM for expression level quantification. The program uses as input a BAM file with alignments to reference transcripts and has specific requirements for the input files. From RSEM readme:
RSEM requires the alignments of a read to be adjacent. For paired-end reads, RSEM also requires the two mates of any alignment be adjacent.
This means the HI
tag has to be taken into account when sorting, and mates with the same HI
has to be put together.
As far as I could see, sambamba sorting by read name currently does not perform this check. I think it would be nice to include it, either as a new sort option or an additional flag to the -{n,N}
one. WHat do you think? Would you consider a contribution for it?
This feature would be quite important for us to be able to use sambamba as the main SAM/BAM tool in our RNA-seq pipeline.
Sorry for the long mail.
Best,
Emilio