Re: STAR alignment as Cufflinks input- cufflinks doesn't work

2,993 views
Skip to first unread message

Alexander Dobin

unread,
Feb 20, 2013, 1:22:32 AM2/20/13
to rna-...@googlegroups.com
Hi Alexandra,

thanks for a detailed report, it looks to me that you are doing everything right. One possible explanation (we observe this very often in A- or total RNA samples) is that there is a very highly expressed locus downstream of the place where Cufflinks is stuck. For some reason Cufflinks has a problem with loci that have more than a few hundred thousand reads mapping to them. Quite amazingly, this happens at the "determining fragment length distribution" stage, which is supposed to be quick and easy.

You can check that by making a .wig file and looking at the genome browser for a high signal locus downstream of chr1:16532938-16533067.
If this is the case you can mask this locus out for Cufflinks run. Unfortunately, you may have to do it for several loci in each dataset.
If this does not work, and you are willing to share a portion of your .bam file (say chr1:16000000-25000000), I could have a look at it to try to diagnose the problem.

Cheers
Alex




On Tuesday, February 19, 2013 10:33:29 AM UTC-5, Sasa Kornienko wrote:
Hi all!

I have problems with feeding in the output of STAR into Cufflinks. So, basically, Cufflinks starts but doesn't run.
Sorry, if the question is silly, I'm new to bioinformatics.

This is what I do (I only give the names of variables to save space):

## align with STAR combining 2 zipped fastq files per read


STAR_2.3 --genomeDir $hg19 --sjdbGTFfile $GTF_file \
--readFilesIn $read1_lane1,$read1_lane2 $read2_lane1,$read2_lane2 --readFilesCommand zcat \
--runThreadN 8 --genomeLoad NoSharedMemory --outFileNamePrefix $OUTPREFIX \
--outStd SAM --outSAMmode Full | samtools view -bS -> $STAR_out.bam

## then I sort the bam file:

samtools sort $STAR_out.bam $STAR_out.sorted.bam

## then I create indexed bam file:


samtools index $STAR_out.sorted.bam

##THEN I RUN CUFFLINKS using this sorted bam file like this:


cufflinks --multi-read-correct --output-dir $CFL_out \
-p 7 --library-type fr-firststrand \
$STAR_out.sorted.bam

## And then this is what cufflinks says:
You are using Cufflinks v2.0.2, which is the most recent release.
[13:39:07] Inspecting reads and determining fragment length distribution.
> Processing Locus chr1:16532938-16533067      [                         ]   0%    -  at this point it stops and stays like this forever. But doesn't abort the job.

this is how one line the STAR output looks like
HWI-ST181:333:D1R5NACXX:5:1101:9697:3951     163 chr6   147698857       255  100M       =  147698961 204  CTCCCAAAGGCCCCCACTTCTGACATTACATTAGGGAAGGATTAGATTTCAACATATGACTTGAGGGGAGAGGGTGGGGGGCATAACTGTTGAGTGTATA  @@?DDBDAHFDHHJJIEGHGIG@CCGHIGAFF4DFH?BAH@>B?38DBA??=FHIEGGDEGII<CHHH5='55>;?B=BB&099@@CCDCACCD4>443>     NH:i:1  HI:i:1     AS:i:196        nM:i:1

Could you may be help me as I am completely lost and don't know how to fix the problem.

Thank you!!!
Alexandra

PS STAR is super quick and great!!!



Sasa Kornienko

unread,
Feb 21, 2013, 6:58:23 AM2/21/13
to rna-...@googlegroups.com
Hi all!

Thanks a lot for Alex for the advice!

I indeed have total RNA samples.

What helped me to get Cufflinks running in the end was to provide Cufflinks with a mask file:

From Cufflinks manual: -M/--mask-file <mask.(gtf/gff)> Tells Cufflinks to ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.

## So, I ran CUFFLINKS like this:

cufflinks --multi-read-correct --output-dir $CFL_out \
-p 7 --library-type fr-firststrand --mask-file hg19_mask.gtf \
$STAR_out.sorted.bam

I included rRNAs, snoRNAs, snRNAs, Mt_rRNAs, miRNAs and pseudogenes from Ensemble annotation into this hg19_mask.gtf file.


Alexandra

Alexander Dobin

unread,
Feb 21, 2013, 5:25:09 PM2/21/13
to rna-...@googlegroups.com
Hi Alexandra,

thanks a lot for sharing your nice idea!
It's great that you did not have to manually mask all the weird loci.
I will be recommending your approach from now on.

Cheers
Alex

Jamie Kwok

unread,
Apr 30, 2015, 11:07:09 AM4/30/15
to rna-...@googlegroups.com
Hi Sasa,

I would like to have the mask file too. Would you please share it with us, or tell us how you generated it?

Thanks a lot.
Jamie

On Thursday, March 21, 2013 at 10:54:06 PM UTC+8, Hannes Bretschneider wrote:
Hi Sasa,

I have a similar problem with using STAR-mapped reads in Cufflinks. Would you be able to share your mask file?

Thanks,
Hannes

Dina H.

unread,
Dec 17, 2015, 11:01:23 AM12/17/15
to rna-star, sasa.br...@gmail.com
Hey Sasa,
I just ran into your comment and it's the solution to a problem that am facing now.
I was wondering do you still have this hg19_mask.gtf file?
If yes could you please send it to me?

Thank you so much

Dina
Reply all
Reply to author
Forward
0 new messages