how to go from --quantMode GeneCounts to DESeq2

1390 views
Skip to first unread message

Nikelle Petrillo

unread,
Aug 8, 2017, 11:20:33 AM8/8/17
to rna-star
Hi all, 

I aligned my PE reads using STAR and I used the GeneCounts option to get a count matrix. I would like to use this count matrix with DESeq2, however, I'm having trouble formatting STAR's gene count into a count matrix that is recognized and works with DESeq2. Does anyone have any tips/ideas how I should be importing/formating this matrix into DESeq2?

Thanks, 
Nikelle 

Zeran Li

unread,
Aug 9, 2017, 10:21:08 AM8/9/17
to rna-star
dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group )

Zeran

Alexander Dobin

unread,
Aug 16, 2017, 4:13:16 PM8/16/17
to rna-star
Hi Nikelle,

the DESeq2 input is - as far as I understand - is a two-column file with first column being gene ID, and the second column -  the number of reads per gene.
To make such a file from the STAR's ReadsPerGene.out.tab file, you would need to 
1. Use the first column of ReadsPerGene.out.tab as the first column of the input to DEseq2
2. Use the 2nd, 3rd or 4th column of ReadsPerGene.out.tab as the 2nd column depending on the strandedness of your library:
        2nd column - for unstranded data
        3rd column  - for 1st read agreeing with RNA strand
        4th column  - for 2nd read agreeing with RNA strand (typical for Illumina stranded Tru-seq)
3. Cut out the first 4 lines of the ReadsPerGene.out.tab that contain counts for non-genic read (unmapped/multimappers/ambiguous/noFeature).

For instance for Illumina stranded Tru-seq you would use
$ awk 'NR>4 {print $1 "\t" $4}' ReadsPerGene.out.tab DEseq.input

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages