How to do with RNA-seq alignment for multi-species samples

289 צפיות
מעבר להודעה הראשונה שלא נקראה

Sophie Poon

לא נקראה,
7 בספט׳ 2021, 9:53:157.9.2021
עד rna-star
Hi there,

I have single-cell RNA-seq data from samples mixed with four different species'.

What I did is, generated a huge reference genome merged by the four references, and do the STAR alignment for read 1 and read 2 separately. However it turned out that not all reads were mapped correctly since many of the aligned reads cannot be associated with a gene (based on exon sequence similarity).

Here is my align command: 

STAR --runThreadN 30 --outSAMtype BAM SortedByCoordinate \
     --genomeDir star.mix.index/ \
     --outFilterMultimapNmax 1 --outFilterIntronMotifs RemoveNoncanonical --outFilterMismatchNmax 5 \
     --alignSJDBoverhangMin 6 --alignSJoverhangMin 6 --outFilterType BySJout --alignIntronMin 25 \
     --alignIntronMax 1000000 --outSAMstrandField intronMotif --outSAMunmapped Within --alignMatesGapMax 1000000 \
     --readFilesIn R1_001.fastq.gz \
     --readFilesCommand zcat --outFileNamePrefix R1/

So I am thinking it's either 
1) to map the reads against one single reference genome at a time, and then record four 'exact match rate' for each read, select the highest one; 
2) or to change the param --outFilterMultimapNmax to a higher number and do filtering later on.

My questions are, 
a) is there any measure that can be used as the 'exact match rate'  in 1)?
b) how can I select the best alignment if I allow reads to be multi-mapped in 2)?

Thanks,
Sophie

Alexander Dobin

לא נקראה,
8 בספט׳ 2021, 17:30:588.9.2021
עד rna-star
Hi Sophie,

generally, mapping to the combined genome of all species is the most straightforward approach.
One issue could the multimappers, are you getting a lot of reads mapping to "too many loci"? Please post your Log.final.out file.
This will be alleviated by increasing --outFilterMultimapNmax

If you see reads mapping to introns or intergenic space - this is unlikely to change if you map to separate genomes.
Most likely is the property of the protocol that was used to generate these libraries.

Cheers
Alex
השב לכולם
השב למחבר
העבר לנמענים
0 הודעות חדשות