Help with gff file and --quantMode GeneCounts

353 views
Skip to first unread message

Julin Maloof

unread,
Dec 7, 2015, 12:42:05 PM12/7/15
to rna-star
Hello,

I am having trouble getting my settings correct to enable --quantMode GeneCounts to output per gene counts.  I get a SJ.out.tab file but nothing with per gene counts.
I have attached a log file and also the first 100 lines from the gff file.

For building the genome and specifying the GFF tags I have tried both

STAR --runThreadN 6 --runMode genomeGenerate --genomeDir ~/Sequences/ref_genomes/tomato/ITAG2.4_Chromo2.5/STAR_genome --genomeFastaFiles ~/Sequences/ref_genomes/tomato/ITAG2.4_Chromo2.5/S_lycopersicum_chromosomes.2.50.fa --sjdbGTFfile ~/Sequences/ref_genomes/tomato/ITAG2.4_Chromo2.5/ITAG2.4_gene_models.gff3 --sjdbGTFtagExonParentTranscript Parent

and

STAR --runThreadN 6 --runMode genomeGenerate --genomeDir ~/Sequences/ref_genomes/tomato/ITAG2.4_Chromo2.5/STAR_genome --genomeFastaFiles ~/Sequences/ref_genomes/tomato/ITAG2.4_Chromo2.5/S_lycopersicum_chromosomes.2.50.fa --sjdbGTFfile ~/Sequences/ref_genomes/tomato/ITAG2.4_Chromo2.5/ITAG2.4_gene_models.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbGTFtagExonParentGene Parent

thoughts? advice?

Thanks,

Julin
STAR_1Log.out
small.gff3

Alexander Dobin

unread,
Dec 7, 2015, 6:07:34 PM12/7/15
to rna-star
Hi Julin,

I will check whether your gff file can be processed with these options. 

However, using --sjdbGTFtagExonParentGene Parent on gff3 files will count reads per transcripts, not per gene. 
I would recommend converting the gff3 file into gtf file. For instance, you can use gffread tool from Cufflinks tool:
gffread -T small.gff3 -o small.gtf

It creates the gtf file with proper transcript_id and gene_id tags, which you can supply as --sjdbGTFfile without any Parent options.

Cheers
Alex

Julin Maloof

unread,
Dec 7, 2015, 6:13:43 PM12/7/15
to rna-star
Hi Alex.  

I'll try the cufflinks option and report back.  In this case I am pretty sure that there is only one transcript per gene in the gff, so that particular issue may not be  big concern.

Best,

Julin

Julin Maloof

unread,
Dec 7, 2015, 11:33:22 PM12/7/15
to rna-star
Hi Alex,

I created the .gtf file and rebuilt the genome index files.  Alas I still do not get GeneCounts output.

first two genes in gtf:

SL2.50ch00      ITAG_eugene     exon    16437   17275   .       +       .       transcript_id "mRNA:Solyc00g005000.2.1"; gene_id "gene:Solyc00g005000.2"; gene_name "Solyc00g005000.2";
SL2.50ch00      ITAG_eugene     exon    17336   18189   .       +       .       transcript_id "mRNA:Solyc00g005000.2.1"; gene_id "gene:Solyc00g005000.2"; gene_name "Solyc00g005000.2";
SL2.50ch00      ITAG_eugene     CDS     16480   17275   .       +       0       transcript_id "mRNA:Solyc00g005000.2.1"; gene_id "gene:Solyc00g005000.2"; gene_name "Solyc00g005000.2";
SL2.50ch00      ITAG_eugene     CDS     17336   17940   .       +       2       transcript_id "mRNA:Solyc00g005000.2.1"; gene_id "gene:Solyc00g005000.2"; gene_name "Solyc00g005000.2";
SL2.50ch00      ITAG_eugene     exon    68062   68211   .       +       .       transcript_id "mRNA:Solyc00g005020.1.1"; gene_id "gene:Solyc00g005020.1"; gene_name "Solyc00g005020.1";
SL2.50ch00      ITAG_eugene     exon    68344   68568   .       +       .       transcript_id "mRNA:Solyc00g005020.1.1"; gene_id "gene:Solyc00g005020.1"; gene_name "Solyc00g005020.1";
SL2.50ch00      ITAG_eugene     exon    68654   68764   .       +       .       transcript_id "mRNA:Solyc00g005020.1.1"; gene_id "gene:Solyc00g005020.1"; gene_name "Solyc00g005020.1";
SL2.50ch00      ITAG_eugene     CDS     68062   68211   .       +       0       transcript_id "mRNA:Solyc00g005020.1.1"; gene_id "gene:Solyc00g005020.1"; gene_name "Solyc00g005020.1";
SL2.50ch00      ITAG_eugene     CDS     68344   68568   .       +       0       transcript_id "mRNA:Solyc00g005020.1.1"; gene_id "gene:Solyc00g005020.1"; gene_name "Solyc00g005020.1";
SL2.50ch00      ITAG_eugene     CDS     68654   68764   .       +       0       transcript_id "mRNA:Solyc00g005020.1.1"; gene_id "gene:Solyc00g005020.1"; gene_name "Solyc00g005020.1";
STAR_1Log.out

Alexander Dobin

unread,
Dec 8, 2015, 12:24:52 PM12/8/15
to rna-star
Hi Julin,

have you re-generated the genome with the GTF file?
From the Log.out, it seems you are still using the gff3 file at the genome generation step.

Could you please use the latest release of STAR:

After you re-generate the genome with the new version, could you send me the Log.out file?

Cheers
Alex

Julin Maloof

unread,
Dec 8, 2015, 2:44:15 PM12/8/15
to rna-star
Hi Alex,

I am confused.  I thought I had double-checked to make sure that I was using the correct (re-generated) genome file.  I just re-downloaded the Log file that was attached to my last message and it says 

(line 310) sjdbGTFfile                   /Network/Servers/avalanche.plb.ucdavis.edu/Volumes/Mammoth/Users/jmaloof/Sequences/ref_genomes/tomato/ITAG2.4_Chromo2.5/ITAG2.4_gene_models.gtf     ~RE-DEFINED

Which I thought meant that I *was* using the new genome generated with the GTF file.  Am I missing something?

In any case I'll try the 2.5 version and report back.

Alexander Dobin

unread,
Dec 8, 2015, 2:48:56 PM12/8/15
to rna-star
Hi Julin,

you are right, you have used GTF in the last Log.out, I guess I was still looking at your old Log.out .
Let us see how the new version behaves, it should also print more diagnostic messages into Log.out.

Cheers
Alex

Julin Maloof

unread,
Dec 8, 2015, 2:57:23 PM12/8/15
to rna-star
Thanks.  Segmentation fault with the compile version.  I am now compiling from source.

Julin Maloof

unread,
Dec 8, 2015, 4:10:56 PM12/8/15
to rna-star
Compilation error:

/usr/local/bin/gcc-5 -c  -O3 -pipe -Wall -Wextra  bam_cat.c
bam_cat.c:57:19: fatal error: cstring: No such file or directory
compilation terminated.

We are running an rather old version of OSX on that server.  I will try compiling elsewhere...

Julin Maloof

unread,
Dec 8, 2015, 4:32:11 PM12/8/15
to rna-star
Hi

Version 2.5 on our linux machine works.

Thanks for working though this with me.

Julin
Reply all
Reply to author
Forward
0 new messages