StringTie, not cufflinks: --outSAMattributes XS without --outSAMstrandField intronMotif

matt....@gladstone.ucsf.edu

unread,

Nov 2, 2016, 9:23:06 PM11/2/16

to rna-star

Hi Alex,

I have PE75 stranded reads and am trying to generate the XS attribute for StringTie without the --outSAMstrandField intronMotif option. I'm hoping this will force RNA strandedness by R1 orientation for all reads (spliced and unspliced). Will this work, or, if not, is there another way to accomplish this? I don't want to lose the benefit of creating a stranded library (also trying to use StringTie instead of Cufflinks).

Thank you!

Matt

matt....@gladstone.ucsf.edu

unread,

Nov 2, 2016, 9:24:07 PM11/2/16

to rna-star

To clarify I am using --outSAMattributes NH HI AS nM NM XS.

Thanks again!

Alexander Dobin

unread,

Nov 4, 2016, 4:00:05 PM11/4/16

to rna-star

Hi Matt,

the XS strand is only added to spliced alignments based on their intron motifs.

It does not take into account the strandedness information.

For stranded RNA-seq, Cufflinks can be run with the --library-type parmameter, while StringTie for some reason lacks this option, and requires XS to be supplied for each alignment. I have a simple script that can add XS flag to all alignments for stranded data:

http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.sandbox/extras/scripts/tagXSstrandedData.awk

It works on a SAM file:

$ awk -v strType=2 -f tagXSstrandedData.awk Aligned.out.sam > Aligned.XS.sam

For a BAM file:

samtools view -h Aligned.out.bam | awk -v strType=2 -f tagXSstrandedData.awk | samtools view -bS - > Aligned.XS.bam

strType defines the strandedness of the RNA-seq library prep, and specifies the read (1 or 2) which strand agrees with the RNA strand.

For the Illumina Tru-Seq dUTP protocol strType=2.

If the reads are unpaired, they are assumed to be “read 1”.

I have not tested this code with StringTie - please let me know whether it works for you, and I will add it to the official STAR release.

Cheers

Alex

Neelanjan Mukherjee

unread,

Nov 10, 2016, 3:02:57 PM11/10/16

to rna-star

Dear Matt and Alex,

This really solved by related problem and I think it would be great if it could be included in the official release. I have paired-end strand-specific data and wanted to use a 2 pass STAR strategy (rather than HISAT/2) as an input for stringtie. After running stringtie I had many single intron transcripts without any strand because stringtie was expecting the XS tag. I did use the "intronMotif" option, but the unspliced reads were not receiving XS tags leading to the problem in stringtie. I tried to resolve it by counting the reads on either strand of the "unstranded" gtf entries, but it was not always clear what to change the strand to. Anyway, I really appreciate the script and the ability to add the XS tags was quite helpful.

Thanks very much.

Neel

Alexander Dobin

unread,

Nov 17, 2016, 12:51:35 PM11/17/16

to rna-star

Hi Neel,

thanks a lot for testing the script. I have included it in the STAR distribution package.

Please let me know if you see any issues.

Cheers

Alex

matt....@gladstone.ucsf.edu

unread,

Nov 17, 2016, 8:13:47 PM11/17/16

to rna-star

Thanks so much, Alex!

I'll report back once I get through the pipeline.

Best,

Matt

Dorit Hockman

unread,

Nov 22, 2016, 10:06:39 AM11/22/16

to rna-star

Dear Alex,

Will this script work on bam files that have already been mapped with STAR using the --outSAMstrandField intronMotif option?

Thank you for help!

be well

Dorit

Alexander Dobin

unread,

Nov 22, 2016, 11:27:10 AM11/22/16

to rna-star

Hi Dorit,

it will not work, since it does not check for the XS tag, and will add 2nd XS tag for those alignments that already have.

I think the best way to strip the old XS tag, e.g.

$ samtools view -h Aligned.sortedByCoord.out.bam | sed 's/\tXS:A:[-+]//' | awk -f tagXSstrandedData.awk | samtools view -bS - > A.bam

Cheers

Alex

Charlotte Capitanchik

unread,

Jul 4, 2017, 11:11:32 PM7/4/17

to rna-star

Hi Alex,

I'm trying to use the awk script to add the XS attribute to stranded data for use with a downstream splicing analysis software, but I get this error:

9: (FILENAME=- FNR=2821) fatal: not enough arguments to satisfy format string

`SRR594434.48536661 419 1 1210619 3 80M = 1218933 8394 ACTAACTCCATCACAGTGACTTCTAATCTAGACCATAGACATCATACTGGAGGGTTAAGTGAGACCAGCCAAAGTCACCA HHHHHHHHHHHHHHHHFHBHHHDHHHHHHHHFEHHHHHHHHHHHFHFHHHGIIG%GDEEDFFFADHFF;GBCEEAEHHHH NH:i:2 HI:i:2 AS:i:157 nM:i:0'

^ ran out for this one

The bam file I'm using was generated in the STAR mapping step with the --outSAMtype BAM SortedByCoordinate argument.

Any help is much appreciated, thank you!

Best,

Charlotte

Alexander Dobin

unread,

Jul 5, 2017, 4:15:04 PM7/5/17

to rna-star

Hi Charlotte,

I have fixed a non-canonical form for the awk printf statement - the problem should go away.

Please try the new version of the script from github master: https://github.com/alexdobin/STAR/blob/master/extras/scripts/tagXSstrandedData.awk

Cheers

Alex

jxf...@case.edu

unread,

Mar 14, 2018, 10:32:09 AM3/14/18

to rna-star

Hi there! I just attempted to use this script with the following shell command:

samtools view -h A1Aligned.sortedByCoord.out.bam | awk -v strType=2 -f tagXSstrandedData.awk | samtools view -bS - > A1Aligned.XS.bam

I received back the following errors from my Ubuntu VM:

awk: tagXStrandedData.awk: line 35: function and never defined
awk: tagXStrandedData.awk: line 35: function and never defined
awk: tagXStrandedData.awk: line 35: function and never defined
awk: tagXStrandedData.awk: line 35: function and never defined
[samopen] no @SQ lines in the header.
[sam_read1] missing header? Abort!

Did I align my reads incorrectly? My mapping input settings were:

./STAR --runMode alignReads --runThreadN 8 --limitGenomeGenerateRAM 50000000000 --outSAMtype BAM SortedByCoordinate --twopassMode Basic --sjdbOverhang 149 --outFileNamePrefix ~/Documents/Data/CLP1MN/TotalRNASeq/Aligned/A1 --genomeDir ~/Documents/Reference/Index/STAR/hg38_noref --sjdbGTFfile ~/Documents/Reference/GTF/UCSC_Gencodev27_Comprehensive.gtf --readFilesIn ~/Documents/Data/CLP1MN/TotalRNASeq/Raw/A1_1.fq ~/Documents/Data/CLP1MN/TotalRNASeq/Raw/A1_2.fq

I received back ~93% uniquely mapped reads in my final log file for the run. Maybe there's just some extra formatting that needs to be done before this command can take the BAM file?

Thanks,

Jordan

Alexander Dobin

unread,

Mar 14, 2018, 5:11:04 PM3/14/18

to rna-star

Hi Jordan,

I think the problem is with the awk installation on your server which prevents the script from running properly. Can you try to run:

$ awk 'BEGIN {print and(1,1)}'

You may also try to use gawk instead of awk.

If it does not work, you would need to ask your sys-admin to install awk with the "and" function compiled.

Cheers

Alex

Jordan Farr

unread,

Mar 15, 2018, 10:09:34 AM3/15/18

to rna-star

Changing awk to gawk worked like a charm!

Thanks a bunch Alex.

Jordan

Reply all

Reply to author

Forward