Hello everyone,
I apologize in advance for what I believe is going to be a rather silly question, but better to be safe than sorry. A brief background: I am starting to use STAR for DE analysis of differently treated mouse cell lines. However, I am not quite sure which mouse genome sequence and annotation files to use to fully satisfy STAR manual recommendations. If I understood correctly, STAR Manual v2.4.0.1 suggests that sequence file should contain chromosomal DNA, mitochondrial DNA and scaffolds, and that GENCODE annotations of those sequences are recommended. Also, Alex suggests that patches and alternative haplotypes should not be included in the genome (Manual, and http://seqanswers.com/forums/showthread.php?t=27470&page=5). Since GENCODE has both the annotations and sequence files for mouse, the most obvious thing to do was to use those. It would seem that the latest mouse genome sequence at GENCODE is ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M4/GRCm38.p3.genome.fa.gz, and the corresponding GTF file is ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_mouse/release_M4/gencode.vM4.chr_patch_hapl_scaff.annotation.gtf.gz. However, in addition to chromosomal, mitochondrial and scaffold DNA, these files apparently also contain patches and haplotypes and by looking at either the sequence or corresponding annotation file I am unable to differentiate between them. I am sure I am missing something obvious – but how to know what is what, so that I can throw it out of the sequence and annotation files and keep only what is recommended for building the genome index.
Thanks to everyone in advance for reading and helping out.