Getting all SAM attributes printed to transcriptome BAM

358 views
Skip to first unread message
Assigned to ado...@gmail.com by me

rubi

unread,
Dec 20, 2016, 3:03:15 PM12/20/16
to rna-star
Hi Alex,


I'm using version 2.5.2a and am running STAR requesting:
--outSAMattributes All --outSAMstrandField intronMotif



as well as
--quantMode TranscriptomeSAM --quantTranscriptomeBan Singled



Although I see the SAM attributes:


NH, HI, AS, nM, NM, MD, jM, and jI in the genomic BAM I don't see all of them in the transcriptomic bam which only has:



NH and HI.


Am I missing some parameter to get all the attributes in the transcriptomic BAM?

Thanks a lot,

rubi

unread,
Dec 20, 2016, 5:16:20 PM12/20/16
to rna-star
Just to give all the information, these data are geared by running STAR as a second pass.

Here's the full command I'm running STAR with:

STAR --runMode alignReads --readFilesCommand zcat --runThreadN 8 --genomeDir <dir> --readFilesIn <fastq.gz.1> <fastq.gz.2> --outSAMprimaryFlag AllBestScore --outFilterMismatchNoverLmax 0.05 --limitSjdbInsertNsj 4339995 --outFilterIntronMotifs RemoveNoncanonical --quantMode TranscriptomeSAM --quantTranscriptomeBan Singleend --outSAMattributes All --outSAMstrandField intronMotif --sjdbFileChrStartEnd <space_delimited_list_of SJ_files_from_first_STAR_run> --outSAMtype BAM Unsorted --outFileNamePrefix <output_prefix>

Alexander Dobin

unread,
Dec 21, 2016, 12:31:02 PM12/21/16
to rna-star
Hi Rubi,

only HI and IH output for transcriptomic alignments is implemented at the moment.
Note some attributes do not make sense for transcriptomic alignments, e.g. jM, jI (no junctions) or AS (scoring system is designed for genomic alignnments).
NM and MD could be useful - for now, you can use samtools calmd option to add them.

Cheers
Alex

rubi

unread,
Dec 23, 2016, 12:51:14 PM12/23/16
to rna-star
Thanks a lot Alex.

So I guess the best thing to do will be to run STAR like this:


STAR --runMode alignReads --genomeDir <genome_dir> --readFilesIn <fastq_1> <fastq_2> --outSAMattributes All --outSAMstrandField intronMotif --outFileNamePrefix <prefix.> --quantMode TranscriptomeSAM --quantTranscriptomeBan Singleend --outSAMtype BAM --outStd BAM_Quant | samtools calmd -b - <reference_fasta> > <out_bam>

Since I'm running STAR in a second pass mode, reading the SJ files from the first iteration, and am also interested in obtaining gene counts as well as both sorted and unsorted genomic bam I'm running STAR this way: 

STAR --runMode alignReads --readFilesCommand zcat --runThreadN 8 --genomeDir <genome_dir> --readFilesIn <fastq_1> <fastq_2> --outSAMprimaryFlag AllBestScore --outFilterMismatchNoverLmax 0.05 --outSAMattributes All --outSAMstrandField intronMotif --outFilterIntronMotifs RemoveNoncanonical --limitSjdbInsertNsj 4339995 --sjdbFileChrStartEnd <SJ_files> --outFileNamePrefix <prefix.> --quantMode TranscriptomeSAM GeneCounts --quantTranscriptomeBan Singleend --outSAMtype BAM Unsorted SortedByCoordinate --outStd BAM_Quant | samtools calmd -b - <reference_fasta> > <out_bam>

But this crashes with a segmentation fault shortly after starting to map. Actually, this happens even without piping to samtools.

Any idea?

Alexander Dobin

unread,
Jan 5, 2017, 5:08:56 PM1/5/17
to rna-star
Hi @Ruby,

I can confirm the seg-fault which is caused by --outStd BAM_Quant option, I will fix this bug in the next few days.
In the meantime, you can try using fifo files instead of stdout piping:

$ rm Aligned.toTranscriptome.out.bam
$ mkfifo Aligned.toTranscriptome.out.bam
$ STAR {--quantMode TranscriptomeSAM + all other options except --outStd BAM_Quant ...}  &  samtools calmd -b Aligned.toTranscriptome.out.bam /path/to/transcripts.fa > A.transcriptome.md.bam

Note that you need to feed samtools calmd the transcripts sequences, not the reference genomic fasta.
You can generate this, for instance, with RSEM tool:
$ rsem-prepare-reference --gtf /path/to/annot.gtf    reference.genome.fa    RSEMref
RSEMref.idx.fa can then be used for samtools calmd.

Cheers
Alex




/sonas-hs/gingeras/nlsas_norepl/user/dobin/STAR/Releases/FromGitHub/STAR-2.5.2a/bin/Linux_x86_64/STAR --genomeDir /home/dobin/STARruns/STARtests/TestSuite/Insert_chr22_FI_1M_PE/new/GTF --readFilesCommand zcat --outSAMattributes All --outSAMstrandField intronMotif --quantMode TranscriptomeSAM --quantTranscriptomeBan Singleend --outSAMtype BAM Unsorted --runThreadN 12 --readFilesIn /sonas-hs/gingeras/nlsas_norepl/user/dobin/STAR/TestSuite//chr22_1.fq.gz /sonas-hs/gingeras/nlsas_norepl/user/dobin/STAR/TestSuite//chr22_2.fq.gz & samtools calmd -b Aligned.toTranscriptome.out.bam /sonas-hs/gingeras/nlsas_norepl/user/dobin/STAR/TestSuite/RSEM/RSEMref.idx.fa

rubi

unread,
Jan 9, 2017, 12:55:51 PM1/9/17
to rna-star
Thanks a lot Alex.

Sorry about being unclear with <reference_fasta>, I meant <transcriptome_reference_fasta> 

F Geng

unread,
Aug 13, 2019, 11:54:24 AM8/13/19
to rna-star
Hi Alex,

I understand AS is alignment score to the genome, but I am finding it inconvenient to filter bam file now without a alignment score flag. Would it be possible to propagate the AS score into the TranscriptomeSAM? To avoid confusion, maybe it can be renamed to "GAS" or "GS" instead of "AS"?

Let me know.

Kind regards
Feng Geng

Alexander Dobin

unread,
Aug 22, 2019, 3:10:15 PM8/22/19
to rna-star
Hi Feng,

I will try to implement this output in the next release (1-2 weeks).

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages