Hi all,
I've used STAR to successfully align
A. thaliana RNA-seq reads to their genome, but I'm having trouble doing the same with
M. domestica. The STAR process will run and apparently successfully align against the genome file, but the ReadsPerGene.out.tab will only contain the following 4 lines instead of complete gene IDs and counts as I got with
A. thaliana:
N_unmapped 5807474 5807474 5807474
N_multimapping 10985296 10985296 10985296
N_noFeature 7772941 24925340 23975008
N_ambiguous 0 0 0
Weirdly, the splice junction part of the process has seemed to work. The first couple of lines of the large SJ.out.tab read as:
MDC000002.223 3182 4248 1 1 0 16 0 13
MDC000002.325 252 528 1 1 1 180 0 50
Which I think is what you should expect, any idea what I could be doing wrong? The code I'm using is as follows:
sudo [STAR process path] --genomeDir [genome path] --genomeLoad LoadAndKeep --readFilesIn [path to reads1] [path to reads2] --readFilesCommand bzcat --runThreadN 30 --outSAMtype BAM SortedByCoordinate --limitBAMsortRAM 10000000000 --quantMode TranscriptomeSAM GeneCounts
I'm thinking it could be something to do with the formatting of the genome annotation gff3 file, which is as follows (first 6 lines):
| ##gff-version 3 |
|
|
|
|
|
|
|
|
|
|
|
|
| ##annot-version | |
|
|
|
|
|
|
|
|
|
|
|
| MDC000002.223 |
phytozomev10
|
gene |
816 |
1094 |
0 |
+ |
0 |
ID=MDP0000856242.GDRv1.0 |
Name=MDP0000856242 |
|
|
|
| MDC000002.223 |
phytozomev10 |
mRNA |
816 |
1094 |
0 |
+ |
0 |
ID=MDP0000856242.GDRv1.0.196 |
Name=MDP0000856242 |
pacid=22625642 |
longest=1 |
Parent=MDP0000856242.GDRv1.0 |
| MDC000002.223 |
phytozomev10 |
exon |
816 |
1094 |
0 |
+ |
0 |
ID=MDP0000856242.GDRv1.0.196.exon.1 |
Parent=MDP0000856242.GDRv1.0.196 |
pacid=22625642 |
|
|
| MDC000002.223 |
phytozomev10 |
CDS |
816 |
1094 |
0 |
+ |
0 |
ID=MDP0000856242.GDRv1.0.196.CDS.1 |
Parent=MDP0000856242.GDRv1.0.196 |
pacid=22625642 |
|
|
And the code I used to build the genome is:
sudo [path to STAR process] --runMode genomeGenerate --genomeDir [genome directory path] --genomeFastaFiles [genome FASTA files path] --sjdbGTFfile [path to gff file] --runThreadN 16 --genomeChrBinNbits 12 --sjdbGTFtagExonParentTranscript Parent
Thank you!