Cufflinks with Gencode Annotation v17

Dsoronellas

unread,

Jul 30, 2013, 11:43:41 AM7/30/13

to tuxedo-to...@googlegroups.com

Dear TopHat/Cufflinks developers,

I contact you because I am facing some kind of problem running differential expression which I wasn't able to resolve.

Situation: I have directional paired-end RNAseq data at 2 different timepoints without replicates.

I first mapped the reads with tophat 2 using the last reference gencode annotation (v17 989Mb)

After mapping the reads I proceed directly to differential expression using cuffdiff 2 and the same gencode annotation v17.

When "Testing for differential expression and regulation in locus" step started I noticed the program began to go very slow, and after 4 or 5 hours it crashed with a "Segmentation Fault" statement.

What also is somehow strange is that the program after 4 or 5 hours was analyzing locus on chromosome 1 yet.

Either tophat or cuffdiff have been run on SGE server cluster system (with 8 cores and 40Gb of RAM each), and below I show the log error I found:

[09:36:20] Loading reference annotation and sequence.

[09:37:43] Inspecting maps and determining fragment length distributions.

[10:17:00] Modeling fragment count overdispersion.

> Map Properties:

> Normalized Map Mass: 48063229.50

> Raw Map Mass: 43993718.00

> Number of Multi-Reads: 0 (with 0 total hits)

> Fragment Length Distribution: Empirical (learned)

> Estimated Mean: 112.41

> Estimated Std Dev: 28.38

> Map Properties:

> Normalized Map Mass: 48063229.50

> Raw Map Mass: 52132741.00

> Number of Multi-Reads: 0 (with 0 total hits)

> Fragment Length Distribution: Empirical (learned)

> Estimated Mean: 110.50

> Estimated Std Dev: 28.44

[10:18:08] Calculating preliminary abundance estimates

> Processed 33875 loci. [*************************] 100%

[11:19:48] Learning bias parameters.

[12:01:39] Testing for differential expression and regulation in locus.

> Processing Locus chr1:48962934-48963408 [ ] 2%

/var/spool/gridengine/node-hp0407/job_scripts/394883: line 26: 22141 Segmentation fault (core dumped) $cuffdiff --no-update-check -o CuffDiff/$norm/${name1}-vs-${name2} --library-type fr-secondstrand --library-norm-method $norm --dispersion-method pooled --FDR 0.1 -p 8 -L ${name1},${name2} -u -b $genomeFA $genes $f1 $f2

Could you please help me with this error in order to clarify what is going on?

I also keep denoting special attention to the gencode annotation v17 (989Mb) in contrast to iGenomes hg19 UCSC annotation (112Mb) which you provide, to know whether it could be the source problem or not. Would you recommend to use the GTF annotation that you provide instead of any other?

Thanks for your time and help!

Best

Dani.

A. Taylor Bright

unread,

Nov 9, 2013, 1:56:29 PM11/9/13

to tuxedo-to...@googlegroups.com

I have had the same issue. After much trouble shooting the issue seems to be the inclusion of the "gene" feature type in column 3 of the gtf downloaded from gencode. If you remove these rows cufflinks will complete.

I would be interested to hear from the developers why this is an issue but either way things seems to work as long as you remove this lines from the gtf.

I would recommend using gencode since the annotations are much richer and the dataset much better curated than anything else out there. In gencode 18 there are 57445 lines that need to be removed.

Pamela Wu

unread,

Aug 14, 2014, 6:59:05 PM8/14/14

to tuxedo-to...@googlegroups.com

I also had this problem, and for the moment, it seems to have been resolved by using A. Taylor Bright's trick of removing all of the "gene" annotations in the third column. The code I used in Bash was

awk -F '\t' '{if ($3 != "gene") print $0}' gencode.v19.annotation.gtf > gencode.v19.annotation.mod.gtf

Mike

unread,

Jan 14, 2015, 4:18:43 PM1/14/15

to tuxedo-to...@googlegroups.com

This solved my problem as well. I'm using mouse genome mm10 and gencode.v2.annotation.gtf. The Bash command as described above worked for me (haven't looked at the results in detail yet but it's the first time I've even had cuffdiff complete successfully using the -b option, without -b it would complete successfully). Are there any negative consequences of removing the "gene" annotations?

Reply all

Reply to author

Forward