Calculation of TLEN Field

202 views
Skip to first unread message

Dario Strbenac

unread,
Jul 5, 2017, 3:00:09 AM7/5/17
to rna-star
If there are are pair of reads such as

@NS500605:5:HGLVHBGXX:4:13506:19146:1669 1:N:0:TAGGCATG+CTCCTTAC
GCGCCTGCCCGCCCGTGCGGCCCTCACTCCCCGAGGCTATCCAGGTCTGTGGGAAACATTCAAAGTCATAAAGTTT
+
AAAAAEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEEEEAEEEEEEEEEE/EEEEEEEAEEEEEEEEEEEEEE

@NS500605:5:HGLVHBGXX:4:13506:19146:1669 2:N:0:TAGGCATG+CTCCTTAC
ATCCACAGCACCTCGAATGTCCATCTTGGTCATCTAAACTTTATGACTTTGAATGTTTCCCACAGACCTGGATAG
+
AAAA6EEEE66EEEEEEEEEEE/AEEEEEEEAEEEEEEEEEEEEE6EEEEEEEEEAEE<EEEEEE/EEEEEAEEE

and are mapped to mm10 genome, why is the TLEN value 1249? Shouldn't the intronic length which both reads span be subtracted to calculate the TLEN? I suppose this is impossible to do for read pairs that don't overlap, though. Perhaps TLEN is only meaningful for data types based on genomic DNA, like ChIP-seq but not RNA-seq. The reason I ask is that STAR-generated BAM files cause problems if used with the featureCounts tool in subread because it treats TLEN as the RNA fragment length and does filtering based on minimum and maximum length thresholds. Read pairs like the one shown above are discarded based on the default filtering thresholds for being unrealistically long (default -D is 600 nucleotides).

Alexander Dobin

unread,
Jul 5, 2017, 4:28:07 PM7/5/17
to rna-star
Hi Dario,

I think TLEN in the SAM fromatting has a very specific definition - it is genomic distance between the ends of the read.
Granted, SAM formatting was designed for DNA reads and often it is not convenient for RNA-seq.
I would not want to re-define it for RNA-seq as it will be confusing and will break the standard conventions,e.g. Picard will certainly complain about it.

The solution for featureCounts would be to specify -D parameter as the max intron size. This will take care of both explicitly (i.e. with N in CIGAR) and implicitly (i.e. junction in -between mates) spliced reads.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages