Hi,
I've been using STAR since a few weeks because I needed a good splice-aware mapping tool. Thanks for making it available. I have a question regarding the genome loading and how to keep in memory between runs (if possible). Apologies if this is a totally trivial question.
I have multiple samples that I want to map against the same genome. It is actually multiple genomes, because the reads that do not hit the first genome are screened against a second one and so on. But let's keep the things simple and say that I have two fastq files: S1.fastq and S2.fastq. Then I have human genome indexed in directory HS (Homo sapiens) and bovine genome in directory BT (Bos taurus). What I currently do is (without giving all the options in detail)
STAR --genomeDir HS --readFilesIn S1.fastq
and I save the SAM file in S1.sam. Then I run
STAR --genomeDir HS --readFilesIn S2.fastq
and save the SAM file in S2.sam.
This actually takes time because the genome is loaded twice, and I would like to keep it in memory for S1 and S2. Then I would like to remove it, load Bos taurus genome, map S1 and S2 against it (actually, only reads that do not align to HG), remove Bos taurus genome from memory and so on.
I tried to play with the genomeLoad options running
STAR --genomeDir HS--readFilesIn S1.fastq --genomeLoad LoadAndKeep &
and then running in parallel another job. But the second one started loading the genome in memory and I stopped it.
Thanks again.
P.S. In case you are wondering, it is for a viral metagenomics project.