Cufflinks Error GFaSeqGet end coordinate cannot be larger than sequence length

1,120 views
Skip to first unread message

kdfe

unread,
Aug 29, 2013, 12:25:34 PM8/29/13
to tuxedo-to...@googlegroups.com
hi,
I am new to RNA seq and would like to seek some advice/assistance with my bioinformatics processing. I used RNA STARs on deer RNA (Illumina Hiseq 2x100bp paired end) with bovine genome (bosTau7) as reference assembly (since deer genome is not sequenced yet). RNA Star ran fine but when I tried to use cufflinks to assemble transcripts, I got an error. As far as I understand, it seems like my aligned reads are a little longer than the reference. How should I proceed? Thank you.

RNA STAR

-------------

I ran

/datastore/STAR_2.3.0e.Linux_x86_64/STAR --genomeDir /datastore/GenomeRef/BovineIndex2/ --readFilesIn Tube01-00PercentSerum_R1.fastq.gz Tube01-00PercentSerum_R2.fastq.gz --readFilesCommand zcat --outSAMstrandField intronMotif --runThreadN 8


Cufflinks Error Message

------------------------------

/usr/bin/time -v /datastore/cufflinks-2.1.1.Linux_x86_64/cufflinks -p 8 -N -u -q -b /datastore/GenomeRef/BovineSource/bosTau7.fa -g /datastore/GenomeRef/BovineSource/bosTau7refFlat -o /datastore/Tube01-00PercentSerum /datastore/Tube01-00PercentSerum/Aligned.out.bam &

[1] 28607

You are using Cufflinks v2.1.1, which is the most recent release.

[16:34:33] Loading reference annotation.

No fasta index found for /datastore/GenomeRef/BovineSource/bosTau7.fa. Rebuilding, please wait..

Fasta index rebuilt.

[16:35:34] Inspecting reads and determining fragment length distribution.

Processed 66191 loci.                      

> Map Properties:

>             Normalized Map Mass: 24776138.31

>             Raw Map Mass: 24776138.31

>             Number of Multi-Reads: 2082075 (with 5367674 total hits)

>             Fragment Length Distribution: Empirical (learned)

>                           Estimated Mean: 198.67

>                        Estimated Std Dev: 56.21

[16:52:25] Assembling transcripts and initializing abundances for multi-read correction.

 

chr1:44838216-49203158               Warning: Skipping large bundle.

 

chr1:64696209-68589586               Warning: Skipping large bundle.

 

chr1:126602955-130570477          Warning: Skipping large bundle.

 

chr1:136966186-142369470          Warning: Skipping large bundle.

 

chr1:154424001-160891092          Warning: Skipping large bundle.

 

chr10:119502-4994410    Warning: Skipping large bundle.

 

chr10:7266876-13444838               Warning: Skipping large bundle.

 

chr10:34487674-38637481             Warning: Skipping large bundle.

 

chr10:48616455-52159492             Warning: Skipping large bundle.

 

chr10:60042853-64221887             Warning: Skipping large bundle.

 

chr10:67918511-72610015             Warning: Skipping large bundle.

 

chr11:19612706-23684045             Warning: Skipping large bundle.

 

chr11:31989618-35735329             Warning: Skipping large bundle.

 

chr11:49174370-53260005             Warning: Skipping large bundle.

 

chr11:96478219-100171565          Warning: Skipping large bundle.

 

chr12:13903012-17766811             Warning: Skipping large bundle.

 

chr13:26872487-32302465             Warning: Skipping large bundle.

 

chr15:13200751-19250623             Warning: Skipping large bundle.

 

chr15:38155622-43172797             Warning: Skipping large bundle.

 

chr18:57122769-61670624             Warning: Skipping large bundle.

 

chr19:19332536-23662871             Warning: Skipping large bundle.

 

chr19:47123458-51241403             Warning: Skipping large bundle.

 

chr2:16952011-20739791               Warning: Skipping large bundle.

 

chr2:81240055-84803004               Warning: Skipping large bundle.

 

chr2:106659883-110339217          Warning: Skipping large bundle.

 

chr20:1405070-6089546 Warning: Skipping large bundle.

 

chr20:39984546-43940478             Warning: Skipping large bundle.

 

chr22:37426066-42390339             Warning: Skipping large bundle.

 

chr22:46226140-49833232             Warning: Skipping large bundle.

 

chr23:8345917-11996655               Warning: Skipping large bundle.

 

chr24:20229225-24144529             Warning: Skipping large bundle.

 

chr25:27189793-31843132             Warning: Skipping large bundle.

 

chr26:5705986-10711103               Warning: Skipping large bundle.

 

chr26:13191795-18210967             Warning: Skipping large bundle.

 

chr26:18225778-22104370             Warning: Skipping large bundle.

 

chr3:14476820-18232943               Warning: Skipping large bundle.

 

chr3:35337830-40025731               Warning: Skipping large bundle.

 

chr3:50665618-54525689               Warning: Skipping large bundle.

 

chr4:7085045-11000062 Warning: Skipping large bundle.

 

chr4:92425105-96729453               Warning: Skipping large bundle.

 

chr5:36693825-40968303               Warning: Skipping large bundle.

 

chr5:50734352-55462904               Warning: Skipping large bundle.

 

chr5:86337607-90802235               Warning: Skipping large bundle.

 

chr5:93188483-97522218               Warning: Skipping large bundle.

 

chr7:11765942-15994659               Warning: Skipping large bundle.

 

chr8:6269200-10085608 Warning: Skipping large bundle.

 

chr8:104090250-108060069          Warning: Skipping large bundle.

 

chr9:40264070-44730818               Warning: Skipping large bundle.

 

chr9:92199925-96337830               Warning: Skipping large bundle.

 

chrX:44163710-50402553               Warning: Skipping large bundle.

Processed 66141 loci.                      

[20:46:49] Loading reference annotation and sequence.

Error (GFaSeqGet): end coordinate (42748) cannot be larger than sequence length 42715

Error (GFaSeqGet): end coordinate (18544) cannot be larger than sequence length 18532

Error (GFaSeqGet): end coordinate (12842) cannot be larger than sequence length 12841

Error (GFaSeqGet): end coordinate (8427) cannot be larger than sequence length 8418

Error (GFaSeqGet): end coordinate (742) cannot be larger than sequence length 732

Error (GFaSeqGet): end coordinate (1480) cannot be larger than sequence length 1450

Error (GFaSeqGet): subsequence cannot be larger than 10788

Error getting subseq for CUFF.42840.3 (1..10799)!

Command exited with non-zero status 1


John Wu

unread,
Nov 9, 2013, 6:04:04 PM11/9/13
to tuxedo-to...@googlegroups.com
Hi, I've encountered the same problem.
I found that there's a compatibility issue when using RNA-Star alignment out as INPUT for cufflinks, coordinates of some transcripts would be wrongly assigned by cufflinks so that errors you mentioned would show up:


"Error (GFaSeqGet): end coordinate (42748) cannot be larger than sequence length 42715"

The reason for this is that some portion of transcripts is outside the the range of a chromosome ( or a contig).

RNA-STAR would output "soft-clip" bases in its SAM file, such as  70M20I15M15S, and cufflinks would incorrectly assemble the ending "15S" into a transcript and  get an transcript longer than a chromosome/contig at the end of the transcript. In the gtf file produced by cufflinks, you can see something like the following:

contig_123   protein_coding    exon    500    1015    .    +    .    gene_id "XXXXXXXXXXXXXXX"; transcript_id "XXXXXXXXXXXXXXXXX"; gene_name "XYZ";

however the length of contig is only 1000 and the coordinate info of this transcript is obviously wrong.

Right now, it seems that an working solution to this problem is to parse the SAM file output by RNA-star, remove all the soft-clip info and cufflinks would output correct gtf accordingly.

John Wu

mbourgey

unread,
Feb 27, 2015, 3:22:36 PM2/27/15
to tuxedo-to...@googlegroups.com
Hi John,

I get this issue and see you are aware of it since 2013.

As I understand soft-clips are part of the official SAM format, why does cufflinks is not able to manage them ? and why do you ask people to reformat their bam to remove an offical SAM features ?
It doesn't seems to me such a big deal to remove the soft-clipped bases fron your length and position computation.


Mathieu

Reply all
Reply to author
Forward
0 new messages