Hello,
I have been trying to use STAR to map human samples to the hg38 genome.
For generating genome index, I used -
/Users/chahat/Documents/DRG/STAR-master/bin/MacOSX_x86_64/STAR --runThreadN 3 --runMode genomeGenerate --genomeDir ./ --genomeFastaFiles ./Homo_sapiens.GRCh38.dna.primary_assembly.fa
Then for mapping I used -
for i in $(ls /Volumes/bam/DRG/fastq_75/PhenoInfoAvailable); do /Users/chahat/Documents/DRG/STAR-master/bin/MacOSX_x86_64/STAR --runThreadN 8 --genomeDir /Users/chahat/Documents/DRG/STAR-master/genome --readFilesIn /Volumes/bam/DRG/fastq_75/PhenoInfoAvailable/$i --outFileNamePrefix /Volumes/bam/DRG/STAR_outputs_redo/$i --limitBAMsortRAM 10000000000 --outSAMtype BAM SortedByCoordinate; done
for i in $(ls /Volumes/bam/DRG/fastq_50/PhenoInfoAvailable); do /Users/chahat/Documents/DRG/STAR-master/bin/MacOSX_x86_64/STAR --runThreadN 8 --genomeDir /Users/chahat/Documents/DRG/STAR-master/genome --readFilesIn /Volumes/bam/DRG/fastq_50/PhenoInfoAvailable/$i --outFileNamePrefix /Volumes/bam/DRG/STAR_outputs_redo/$i --limitBAMsortRAM 10000000000 --outSAMtype BAM SortedByCoordinate; done
Both folders have about a dozen samples (average filesize ~10 GB). I am running this on a 48 GB system.
After running this code for 5 days, the output was that I got 'SampleX.fastqAligned.sortedByCoord.out.bam' files for most of the samples in the second folder, but for none of the samples in the first folder (.bam files were empty). For the samples for which the .bam files were empty, the corresponding _STARtmp folders were present, and they were huge, so I guess the processing aborted during BAM sorting.
My question is, is there a reason why my STAR runs are taking so long, and if there is anything I can do to make it more efficient/faster?
Adding the `--genomeLoad LoadandKeep` option to the run, immediately gives the error -
Nov 17 18:48:50 ..... started STAR run
Nov 17 18:48:50 ..... loading genome
./runSTARonAllSamples.sh: line 1: 36287 Abort trap: 6 /Users/chahat/Documents/DRG/STAR-master/bin/MacOSX_x86_64/STAR --runThreadN 8 --genomeDir /Users/chahat/Documents/DRG/STAR-master/genome --readFilesIn /Volumes/bam/DRG/fastq_75/PhenoInfoAvailable/$i --genomeLoad LoadAndKeep --outFileNamePrefix /Volumes/bam/DRG/STAR_outputs_redo/redo/$i --limitBAMsortRAM 10000000000 --outSAMtype BAM SortedByCoordinate
Any ideas?