Alignment stuck at "....started mapping". Issue with .gff?

941 views
Skip to first unread message

Joseph Mudd

unread,
Nov 10, 2016, 12:49:43 PM11/10/16
to rna-star
Hello,

I am trying to align reads to a rhesus macaque .fna reference and .gff annotation.  The genome builds properly,  the command used:

/hpcdata/lmm/lmm_data/muddjc/STAR-2.5.2b/bin/Linux_x86_64/STAR 
--runMode genomeGenerate 
--runThreadN 24 
--genomeDir <Mmul_8.0.1_genomic.fna> 
--sjdbGTFfile <GCF_000772875.2_Mmul_8.0.1_genomic.gff> 
--sjdbGTFtagExonParentTranscript Parent 
--sjdbOverhang 100 
--genomeChrBinNbits min 
--genomeSAindexNbases 13

I am attaching the log.out for this.

I'm encountering a problem when I proceed to map:

Nov 09 15:29:40 ..... started STAR run
Nov 09 15:29:41 ..... loading genome
Nov 09 15:30:07 ..... processing annotations GTF
Nov 09 15:30:23 ..... inserting junctions into the genome indices
Nov 09 15:31:43 ..... started mapping

For these runs STAR is getting stuck at the mapping step.  It creates the header in the log.progress.out, but does not begin mapping to each chromosome.  I am wondering if it's due to the particular .gff format?  I had previously ran these reads against a different rhesus .fna and a .gtf file, which aligned successfully.  Below are a few lines of the .gff file:

NC_027893.1 RefSeq region 1 225584828 . + . ID=id0;Dbxref=taxon:9544;Name=1;chromosome=1;country=USA: Southwest National Primate Research Center at the Southwest Fou
ndation for Biomedical Research%2C San Antonio%2C TX;gbkey=Src;genome=chromosome;isolate=17573;mol_type=genomic DNA;note=derived from Indian origin rhesus;sex=female
NC_027893.1 Gnomon gene 15791 22125 . - . ID=gene0;Dbxref=GeneID:106999150;Name=LOC106999150;gbkey=Gene;gene=LOC106999150;gene_biotype=lncRNA
NC_027893.1 Gnomon ncRNA 15791 22125 . - . ID=rna0;Parent=gene0;Dbxref=GeneID:106999150,Genbank:XR_001445959.1;Name=XR_001445959.1;gbkey=ncRNA;gene=LOC106999150;model_evide
nce=Supporting evidence includes similarity to: 1 mRNA%2C 27 ESTs%2C and 99%25 coverage of the annotated genomic feature by RNAseq alignments%2C including 7 samples with support for all annotated intro
ns;ncrna_class=lncRNA;product=uncharacterized LOC106999150%2C transcript variant X2;transcript_id=XR_001445959.1
NC_027893.1 Gnomon exon 22088 22125 . - . ID=id1;Parent=rna0;Dbxref=GeneID:106999150,Genbank:XR_001445959.1;gbkey=ncRNA;gene=LOC106999150;ncrna_class=lncRNA;product=unchar
acterized LOC106999150%2C transcript variant X2;transcript_id=XR_001445959.1
NC_027893.1 Gnomon exon 17186 21651 . - . ID=id2;Parent=rna0;Dbxref=GeneID:106999150,Genbank:XR_001445959.1;gbkey=ncRNA;gene=LOC106999150;ncrna_class=lncRNA;product=unchar
acterized LOC106999150%2C transcript variant X2;transcript_id=XR_001445959.1


Alexander Dobin

unread,
Nov 10, 2016, 2:47:56 PM11/10/16
to rna-star
Hi Joseph,

please send me the Log.out file from this run.

Cheers
Alex

Joseph Mudd

unread,
Nov 10, 2016, 3:14:03 PM11/10/16
to rna-star
It is attached.

I actually just looked at the tail of log.out.

Fatal INPUT FILE error, no valid exon lines in the GTF file: /hpcdata/lmm/lmm_data/muddjc/STARindex_MacaM/macam_annot/MacaM_Rhesus_Genome_Annotation_v7.8.2.gtf
Solution: check the formatting of the GTF file. Most likely cause is the difference in chromosome naming between GTF and FASTA file.

Nov 10 10:42:45 ...... FATAL ERROR, exiting

I don't see any "Chr" annotations in the gff.  In my limited experienced I have only worked with .gtf files, but this would indicate a formatting issue?  What steps may I need to take to make this compatible?



STARalign.Log.out.txt

Joseph Mudd

unread,
Nov 10, 2016, 3:17:05 PM11/10/16
to rna-star
The log.out for --GenomeGenerate also...
--genomeGenerate Log.out.txt

Alexander Dobin

unread,
Nov 11, 2016, 12:19:24 PM11/11/16
to rna-star
Hi Joseph,

it seems like you are using the .gtf file (MacaM_Rhesus_Genome_Annotation_v7.8.2.gtf) at the mapping stage,
while you used the .gtf file at the genome generation step.
My guess is that the .gtf file contains different chromosome names, that's why STAR complains.
Please try to map omitting this option.
Typically you do not need to use --sjdbGTFfile option at the mapping stage, if you already used it at the mapping stage.
It's only useful if you need to add (more) annotations on the fly.

In general, I think the best option is to convert the .gff into the .gtf file - before the genome generation.
For instance, you can use gffread tool from Cufflinks tool:
$ gffread -T small.gff3 -o small.gtf

Cheers
Alex

Joseph Mudd

unread,
Nov 14, 2016, 2:31:36 PM11/14/16
to rna-star
Thank you for the response Alex, I really appreciate!

I have re-run the command omitting --sjdbGTFfile in the mapping stage. It's running without any error, although it seems to be stuck at the mapping stage. From tail of Log.out:

Processing splice junctions database sjdbN=235800, sjdbOverhang=100 alignIntronMax=alignMatesGapMax=0, the max intron size will be approximately determined by (2^winBinNbits)*winAnchorDistNbins=589824 winBinNbits=16 > genomeChrBinNbits=0 redefining: winBinNbits=0 Created thread # 1 Created thread # 2 Created thread # 3 Starting to map file # 0 mate 1: /hpcdata/bcbb/leerkesm/AGM/WGET_FTP_48/FTP_627_VGTI_download/JB37_ATTACTCG-GGCTCTGA_L001_R1_001.fastq mate 2: /hpcdata/bcbb/leerkesm/AGM/WGET_FTP_48/FTP_627_VGTI_download/JB37_ATTACTCG-GGCTCTGA_L001_R2_001.fastq

Attaching the complete Log.out. My guess would be that this is still a problem building the genome with a .gff, as the MacaM .gtf gives no problems with the mapping. As you suggested, I will try to convert with gff read and then start from scratch. If you think this may be an unrelated problem, please let me know. Otherwise I will update you when this step is completed.

Thanks again!

jc
JB37_log.out.txt

Alexander Dobin

unread,
Nov 17, 2016, 3:13:36 PM11/17/16
to rna-star
Hi Joseph,

I think there is a problem with the --genomeChrBinNbits parameter at the genome generation step. The command line looks like
--genomeChrBinNbits=min , which results in 0 value for this parameter. Please try re-generating genome indexes without this parameter - the default value should work fine.
If you look at the Log.out file from the mapping step, you can see the parameters that were used for the genome generation (see below).
--genomeChrBinNbits 0 is the only one that looks wrong.

Cheers
Alex

Reading genome generation parameters:
versionGenome                 20201        ~RE-DEFINED
genomeFastaFiles              /hpcdata/lmm/lmm_data/muddjc/Mmul_8.0.1/GCF_000772875.2_Mmul_8.0.1_genomic.fna        ~
RE-DEFINED
genomeSAindexNbases           13     ~RE-DEFINED
genomeChrBinNbits             0     ~RE-DEFINED
genomeSAsparseD               1     ~RE-DEFINED
sjdbOverhang                  100     ~RE-DEFINED
sjdbFileChrStartEnd           -        ~RE-DEFINED
sjdbGTFfile                   /hpcdata/lmm/lmm_data/muddjc/Mmul_8.0.1/GCF_000772875.2_Mmul_8.0.1_genomic.gff     ~RE-
DEFINED
sjdbGTFchrPrefix              -     ~RE-DEFINED
sjdbGTFfeatureExon            exon     ~RE-DEFINED
sjdbGTFtagExonParentTranscriptParent     ~RE-DEFINED
sjdbGTFtagExonParentGene      gene_id     ~RE-DEFINED
sjdbInsertSave                Basic     ~RE-DEFINED
Reply all
Reply to author
Forward
0 new messages