Nikelle Petrillo

Aug 8, 2017, 11:20:33 AM8/8/17
to rna-star
Hi all, 

I aligned my PE reads using STAR and I used the GeneCounts option to get a count matrix. I would like to use this count matrix with DESeq2, however, I'm having trouble formatting STAR's gene count into a count matrix that is recognized and works with DESeq2. Does anyone have any tips/ideas how I should be importing/formating this matrix into DESeq2?


Zeran Li

Aug 9, 2017, 10:21:08 AM8/9/17
to rna-star
dds = DESeqDataSetFromMatrix(countData = countMatrix, colData = coldata, design = ~ group )


Alexander Dobin

Aug 16, 2017, 4:13:16 PM8/16/17
to rna-star
Hi Nikelle,

the DESeq2 input is - as far as I understand - is a two-column file with first column being gene ID, and the second column -  the number of reads per gene.
To make such a file from the STAR's file, you would need to 
1. Use the first column of as the first column of the input to DEseq2
2. Use the 2nd, 3rd or 4th column of as the 2nd column depending on the strandedness of your library:
        2nd column - for unstranded data
        3rd column  - for 1st read agreeing with RNA strand
        4th column  - for 2nd read agreeing with RNA strand (typical for Illumina stranded Tru-seq)
3. Cut out the first 4 lines of the that contain counts for non-genic read (unmapped/multimappers/ambiguous/noFeature).

For instance for Illumina stranded Tru-seq you would use
$ awk 'NR>4 {print $1 "\t" $4}' DEseq.input

