No gene counts using GFF with --quantMode GeneCounts option

83 views
Skip to first unread message

Tim Knutsen

unread,
Nov 9, 2016, 12:19:16 PM11/9/16
to rna-star
Dear developer.
I need to use a .gff3 file for obtaining gene counts in a cattle RNAseq-experiment. 

I created the STAR (version 2.5.2b) index using:
 
STAR \
        --runThreadN 32 --runMode genomeGenerate --genomeDir STAR_index_bovinegenome_Nameparent --genomeFastaFiles UMD3.1/bovine_genome/UMD3.1_chromosomes.fa \
        --sjdbGTFfile gff/RefSeq_UMD3.1.1_protein_coding.gff3 --sjdbOverhang 149 --sjdbGTFtagExonParentTranscript Parent \
        --sjdbGTFtagExonParentGene Name

The first lines of the gff looks like this:
##gff-version 3
GK000001.2      RefSeq  gene    122904  180145  .       -       .       ID=gene3;Name=507243;Dbxref=NCBI_Gene:507243,BGD:BT17749;symbol_ncbi=CLIC6;description=chloride intracellular channel 6;feature_type=Protein Coding
GK000001.2      RefSeq  mRNA    122904  180145  .       -       .       ID=rna1;Parent=gene3;Name=XM_002684585.4;Dbxref=NCBI_Gene:507243,BGD:BT17749,RefSeq_NA:XM_002684585.4,RefSeq_Prot:XP_002684631.1;ncbi_desc=chloride intracellular channel 6;Note=chloride intracellular channel 6 prot_id:XP_002684631.1 Symbol:CLIC6;symbol_ncbi=CLIC6;feature_type=Protein Coding
GK000001.2      RefSeq  exon    178433  180145  .       -       .       ID=exon5;Parent=rna1
GK000001.2      RefSeq  exon    138875  138984  .       -       .       ID=exon6;Parent=rna1
GK000001.2      RefSeq  exon    137834  137959  .       -       .       ID=exon7;Parent=rna1
GK000001.2      RefSeq  exon    137158  137264  .       -       .       ID=exon8;Parent=rna1
GK000001.2      RefSeq  exon    135820  136001  .       -       .       ID=exon9;Parent=rna1
GK000001.2      RefSeq  exon    122904  125014  .       -       .       ID=exon10;Parent=rna1
GK000001.2      RefSeq  CDS     178433  179713  .       -       0       ID=cds2;Parent=rna1
GK000001.2      RefSeq  CDS     138875  138984  .       -       0       ID=cds3;Parent=rna1
GK000001.2      RefSeq  CDS     137834  137959  .       -       1       ID=cds4;Parent=rna1
GK000001.2      RefSeq  CDS     137158  137264  .       -       1       ID=cds5;Parent=rna1
GK000001.2      RefSeq  CDS     135820  136001  .       -       2       ID=cds6;Parent=rna1
GK000001.2      RefSeq  CDS     124853  125014  .       -       0       ID=cds7;Parent=rna1
GK000001.2      RefSeq  gene    242246  362910  .       +       .       ID=gene4;Name=539640;Dbxref=NCBI_Gene:539640,BGD:BT26136;symbol_ncbi=RCAN1;description=regulator of calcineurin 1;gene_synonym=DSCR1;feature_type=Protein Coding
GK000001.2      RefSeq  mRNA    242246  362910  .       +       .       ID=rna2;Parent=gene4;Name=XM_005201110.2;Dbxref=NCBI_Gene:539640,BGD:BT26136,RefSeq_NA:XM_005201110.2,RefSeq_Prot:XP_005201167.1;ncbi_desc=regulator of calcineurin 1%2C transcript variant X1;Note=regulator of calcineurin 1%2C transcript variant X1 prot_id:XP_005201167.1 Symbol:RCAN1;symbol_ncbi=RCAN1;feature_type=Protein Coding
GK000001.2      RefSeq  exon    242246  242646  .       +       .       ID=exon11;Parent=rna2
GK000001.2      RefSeq  exon    355064  355237  .       +       .       ID=exon12;Parent=rna2
GK000001.2      RefSeq  exon    357793  357952  .       +       .       ID=exon13;Parent=rna2



My STAR mapping command was:
STAR --limitGenomeGenerateRAM 62000000000  --quantMode GeneCounts --genomeDir STAR_index_bovinegenome_Nameparent --readFilesCommand zcat --readFilesIn $R1 $R2 --outFileNamePrefix $OUT --outSAMmode Full --outSAMtype BAM SortedByCoordinate --outSAMstrandField intronMotif --runThreadN 5 --readMatesLengthsIn NotEqual --outSAMattrRGline ID:$FILEBASE PL:illumina LB:$SAMPLE SM:$SAMPLE

This resulted in the ReadsPerGene.out.tab file looking like this:
N_unmapped      656786  656786  656786
N_multimapping  374657  374657  374657
N_noFeature     7421298 20913001        7536365
N_ambiguous     0       0       0


I am guessing that the problems is related to the index creation step, and that I am not setting the "--sjdbGTFtagExonParentGene" and "--sjdbGTFtagExonParentTranscript" -options correctly.

Could you please offer some advice.

Best regards,
Tim Knutsen.

Alexander Dobin

unread,
Nov 10, 2016, 2:46:11 PM11/10/16
to rna-star
Hi Tim,

your GFF file does not seem to contain gene IDs for exons - actually GFF files seldom have that.
I think the best option for you is to convert the GFF into GTF.
For instance, you can use gffread tool from Cufflinks tool:
$ gffread -T small.gff3 -o small.gtf

If it does not work, please send me a few exonic lines of the converted GTF.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages