Dear TopHat/Cufflinks developers,
I contact you because I am facing some kind of problem running differential expression which I wasn't able to resolve.
Situation: I have directional paired-end RNAseq data at 2 different timepoints without replicates.
I first mapped the reads with tophat 2 using the last reference gencode annotation (v17 989Mb)
After mapping the reads I proceed directly to differential expression using cuffdiff 2 and the same gencode annotation v17.
When "Testing for differential expression and regulation in locus" step started I noticed the program began to go very slow, and after 4 or 5 hours it crashed with a "Segmentation Fault" statement.
What also is somehow strange is that the program after 4 or 5 hours was analyzing locus on chromosome 1 yet.
Either tophat or cuffdiff have been run on SGE server cluster system (with 8 cores and 40Gb of RAM each), and below I show the log error I found:
[09:36:20] Loading reference annotation and sequence.
[09:37:43] Inspecting maps and determining fragment length distributions.
[10:17:00] Modeling fragment count overdispersion.
> Map Properties:
> Normalized Map Mass: 48063229.50
> Raw Map Mass: 43993718.00
> Number of Multi-Reads: 0 (with 0 total hits)
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 112.41
> Estimated Std Dev: 28.38
> Map Properties:
> Normalized Map Mass: 48063229.50
> Raw Map Mass: 52132741.00
> Number of Multi-Reads: 0 (with 0 total hits)
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 110.50
> Estimated Std Dev: 28.44
[10:18:08] Calculating preliminary abundance estimates
> Processed 33875 loci. [*************************] 100%
[11:19:48] Learning bias parameters.
[12:01:39] Testing for differential expression and regulation in locus.
> Processing Locus chr1:48962934-48963408 [ ] 2%
/var/spool/gridengine/node-hp0407/job_scripts/394883: line 26: 22141 Segmentation fault (core dumped) $cuffdiff --no-update-check -o CuffDiff/$norm/${name1}-vs-${name2} --library-type fr-secondstrand --library-norm-method $norm --dispersion-method pooled --FDR 0.1 -p 8 -L ${name1},${name2} -u -b $genomeFA $genes $f1 $f2
Could you please help me with this error in order to clarify what is going on?
I also keep denoting special attention to the gencode annotation v17 (989Mb) in contrast to iGenomes hg19 UCSC annotation (112Mb) which you provide, to know whether it could be the source problem or not. Would you recommend to use the GTF annotation that you provide instead of any other?
Thanks for your time and help!
Best
Dani.