Dear Alex,
I am currently using STAR (020201 version) to align ribosome profiling data (single read, stranded).
To assess also rRNA contamination I allow up to 50 alignments but would like to report only one alignment per read (randomly chosen from the top scoring alignment), so I use the following command:
/Users/claudiascheckel/src/STAR/bin/MacOSX_x86_64/STAR --alignSJoverhangMin 8 --runThreadN 4 --outFilterMultimapNmax 50 --quantMode GeneCounts TranscriptomeSAM --outMultimapperOrder Random --outSAMmultNmax 1 --outSAMtype BAM SortedByCoordinate --genomeDir /Users/claudiascheckel/src/Index_STAR/mm10_ENSEMBL --readFilesIn CAD5wt_collapsed_rm3l5l.fasta --outFileNamePrefix CAD5wt_STAR_2/
Adding the --outMultimapperOrder Random --outSAMmultNmax 1 flags reduced the number of output reads for the Genome file as it should, but not for the Transcriptome BAM file: both Transcriptome BAM files (+/- --outMultimapperOrder Random --outSAMmultNmax 1) look identical. I was under the impression that STAR first aligns against the Genome and then uses those reads to output the Transcriptome BAM? Is there another flag I can add that will only report one alignment per read for the Transcriptome BAM? I guess alternatively I could transform the Transcriptome BAM to SAM and filter using SAM tools?
And I would have another unrelated question: I'm currently running the alignments on an iMAC with 32GB memory (which I think it pushing it?) and it works for files with ~30Mio reads but not files with ~70Mio reads. I have tried to align without sorting the BAM file and just outputting a SAM file - but the job is still getting killed. It works if I parse the file into 2 files and align those separately. But are there any flags I could add that would allow me to align larger files?
Sorry in case any of these questions don't make sense - I'm still sort of new to the field and couldn't find the answers online.
Thank you so much for your help!
Best,
Claudia