generateGenome fasta output

24 views
Skip to first unread message

Matthew Hill

unread,
Aug 22, 2016, 12:02:24 PM8/22/16
to rna-star
Hello,

I generated a custom mm10 genome with several additional fasta files (trying to add markers to genome, eg EGFP). Mapping with STAR works fine.

However, the output directory contains a file named 'Genome' which I assume is the fasta genome in a binary octet-stream format. Is that correct? Also, Is there a way to convert this Genome into a readable fasta plain txt file, like those used as input?

I am trying to put this custom fasta into picard for generation of a dictionary, but need the fasta reference to be in a readable format. Any help with the Genome file would be very much appreciated. 

Take care
Matt

Alexander Dobin

unread,
Aug 22, 2016, 4:31:05 PM8/22/16
to rna-star
Matt,

I think it would be easier to concatenate the FASTA files you used as input for the genome generation.

To convert Genome into fasta you need the following:
Each byte in the Genome file corresponds to a base: 01234 -> ACGTN. The space between chromosomes is padded with 5.
chrStart.txt contains the starts (byte offsets) of the chromosomes in the Genome file.
chrLength.txt contains the lengths of the chromosomes
chrName.txt contains the names of the chromosomes

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages