Hi Alexandra,
thanks for a detailed report, it looks to me that you are doing everything right. One possible explanation (we observe this very often in A- or total RNA samples) is that there is a very highly expressed locus downstream of the place where Cufflinks is stuck. For some reason Cufflinks has a problem with loci that have more than a few hundred thousand reads mapping to them. Quite amazingly, this happens at the "determining fragment length distribution" stage, which is supposed to be quick and easy.
You can check that by making a .wig file and looking at the genome browser for a high signal locus downstream of chr1:16532938-16533067.
If this is the case you can mask this locus out for Cufflinks run. Unfortunately, you may have to do it for several loci in each dataset.
If this does not work, and you are willing to share a portion of your .bam file (say chr1:16000000-25000000), I could have a look at it to try to diagnose the problem.
Cheers
Alex
On Tuesday, February 19, 2013 10:33:29 AM UTC-5, Sasa Kornienko wrote:
Hi all!
I have problems with feeding in the output of STAR into Cufflinks. So, basically, Cufflinks starts but doesn't run.
Sorry, if the question is silly, I'm new to bioinformatics.
This is what I do (I only give the names of variables to save space):
## align with STAR combining 2 zipped fastq files per read
STAR_2.3 --genomeDir $hg19 --sjdbGTFfile $GTF_file \
--readFilesIn $read1_lane1,$read1_lane2 $read2_lane1,$read2_lane2 --readFilesCommand zcat \
--runThreadN 8 --genomeLoad NoSharedMemory --outFileNamePrefix $OUTPREFIX \
--outStd SAM --outSAMmode Full | samtools view -bS -> $STAR_out.bam
## then I sort the bam file:
samtools sort $STAR_out.bam $STAR_out.sorted.bam
## then I create indexed bam file:
samtools index $STAR_out.sorted.bam
##THEN I RUN CUFFLINKS using this sorted bam file like this:
cufflinks --multi-read-correct --output-dir $CFL_out \
-p 7 --library-type fr-firststrand \
$STAR_out.sorted.bam
## And then this is what cufflinks says:
You are using Cufflinks v2.0.2, which is the most recent release.
[13:39:07] Inspecting reads and determining fragment length distribution.
> Processing Locus chr1:16532938-16533067 [ ] 0% - at this point it stops and stays like this forever. But doesn't abort the job.
this is how one line the STAR output looks like
HWI-ST181:333:D1R5NACXX:5:1101:9697:3951 163 chr6 147698857 255 100M = 147698961 204 CTCCCAAAGGCCCCCACTTCTGACATTACATTAGGGAAGGATTAGATTTCAACATATGACTTGAGGGGAGAGGGTGGGGGGCATAACTGTTGAGTGTATA @@?DDBDAHFDHHJJIEGHGIG@CCGHIGAFF4DFH?BAH@>B?38DBA??=FHIEGGDEGII<CHHH5='55>;?B=BB&099@@CCDCACCD4>443> NH:i:1 HI:i:1 AS:i:196 nM:i:1
Could you may be help me as I am completely lost and don't know how to fix the problem.
Thank you!!!
Alexandra
PS STAR is super quick and great!!!