Sorting method for RSEM

10 views
Skip to first unread message

Emilio

unread,
Dec 17, 2018, 5:23:42 AM12/17/18
to sambamba-discussion

Hi Pjotr,
I have a question/comment about a possible additional sorting criterion.

In our RNA-seq pipeline, we are using RSEM for expression level quantification. The program uses as input a BAM file with alignments to reference transcripts and has specific requirements for the input files. From RSEM readme:

RSEM requires the alignments of a read to be adjacent. For paired-end reads, RSEM also requires the two mates of any alignment be adjacent.

This means the HI tag has to be taken into account when sorting, and mates with the same HI has to be put together.

As far as I could see, sambamba sorting by read name currently does not perform this check. I think it would be nice to include it, either as a new sort option or an additional flag to the -{n,N} one. WHat do you think? Would you consider a contribution for it?

This feature would be quite important for us to be able to use sambamba as the main SAM/BAM tool in our RNA-seq pipeline.

Sorry for the long mail.

Best,
Emilio

Pjotr Prins

unread,
Dec 17, 2018, 5:54:37 AM12/17/18
to Emilio, sambamba-discussion
On Mon, Dec 17, 2018 at 11:23:29AM +0100, Emilio wrote:
> Hi Pjotr,
> I have a question/comment about a possible additional sorting
> criterion.
>
> In our RNA-seq pipeline, we are using [1]RSEM for expression level
> quantification. The program uses as input a BAM file with alignments to
> reference transcripts and has specific requirements for the input
> files. From RSEM readme:
>
> RSEM requires the alignments of a read to be adjacent. For
> paired-end reads, RSEM also requires the two mates of any alignment
> be adjacent.
>
> This means the HI tag has to be taken into account when sorting, and
> mates with the same HI has to be put together.
>
> As far as I could see, sambamba sorting by read name currently does not
> perform this check. I think it would be nice to include it, either as a
> new sort option or an additional flag to the -{n,N} one. WHat do you
> think? Would you consider a contribution for it?
>
> This feature would be quite important for us to be able to use sambamba
> as the main SAM/BAM tool in our RNA-seq pipeline.

Yes, that would be a good addition for sambamba. It should be
reasonably easy to add because we have name sorting already.

Pj.

Emilio

unread,
Dec 17, 2018, 6:04:53 AM12/17/18
to Pjotr Prins, sambamba-discussion
Ok, thanks!

Should it be a new comparator? I would need it with natural sorting, so I might have to implement it for both standard name and natural sorting?

Best,
Emilio
Reply all
Reply to author
Forward
0 new messages