ReadsPerGene.out.tab does not contain reads per gene

867 views
Skip to first unread message

Matthew Cooper

unread,
Mar 10, 2016, 9:14:29 AM3/10/16
to rna-...@googlegroups.com
Hi all,

I've used STAR to successfully align A. thaliana RNA-seq reads to their genome, but I'm having trouble doing the same with M. domestica. The STAR process will run and apparently successfully align against the genome file, but the ReadsPerGene.out.tab will only contain the following 4 lines instead of complete gene IDs and counts as I got with A. thaliana:

N_unmapped         5807474    5807474    5807474
N_multimapping    10985296    10985296    10985296
N_noFeature         7772941    24925340    23975008
N_ambiguous            0                0                 0

Weirdly, the splice junction part of the process has seemed to work. The first couple of lines of the large SJ.out.tab read as:

MDC000002.223    3182    4248    1    1    0    16    0    13
MDC000002.325    252        528    1    1    1    180    0    50  

Which I think is what you should expect, any idea what I could be doing wrong? The code I'm using is as follows:

sudo [STAR process path] --genomeDir [genome path] --genomeLoad LoadAndKeep --readFilesIn [path to reads1]  [path to reads2] --readFilesCommand bzcat --runThreadN 30 --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 10000000000 --quantMode TranscriptomeSAM GeneCounts

I'm thinking it could be something to do with the formatting of the genome annotation gff3 file, which is as follows (first 6 lines):

##gff-version 3











##annot-version










MDC000002.223 phytozomev10
 gene 816 1094 0 + 0 ID=MDP0000856242.GDRv1.0 Name=MDP0000856242


MDC000002.223 phytozomev10  mRNA 816 1094 0 + 0 ID=MDP0000856242.GDRv1.0.196 Name=MDP0000856242 pacid=22625642 longest=1 Parent=MDP0000856242.GDRv1.0
MDC000002.223 phytozomev10  exon 816 1094 0 + 0 ID=MDP0000856242.GDRv1.0.196.exon.1 Parent=MDP0000856242.GDRv1.0.196 pacid=22625642

MDC000002.223 phytozomev10  CDS 816 1094 0 + 0 ID=MDP0000856242.GDRv1.0.196.CDS.1 Parent=MDP0000856242.GDRv1.0.196 pacid=22625642


And the code I used to build the genome is:

sudo [path to STAR process] --runMode genomeGenerate --genomeDir [genome directory path] --genomeFastaFiles [genome FASTA files path] --sjdbGTFfile [path to gff file] --runThreadN 16 --genomeChrBinNbits 12 --sjdbGTFtagExonParentTranscript Parent 

Thank you!
 

Alexander Dobin

unread,
Mar 14, 2016, 6:35:24 AM3/14/16
to rna-star
Hi Matthew,

for the ReadsPerGene counting to work, each "exon" in the GFF file needs to contain gene Name it belongs to.
Your file does not have this formatting - it only lists exon Parent (which is mRNA), and mRNA in turn has a Parent gene.
You can either trace this parentage yourself, or use a gff to gtf converter. For instance, you can use gffread tool from Cufflinks tool:
gffread -T In.gff3 -o Out.gtf

Cheers
Alex

Matthew Cooper

unread,
Mar 14, 2016, 12:18:16 PM3/14/16
to rna-star
Dear Alex,

Thank you very much for the personal reply, that's exactly what the problem is, working great now after following your advice. Great job with the program as well!

Matt C
Reply all
Reply to author
Forward
0 new messages