mapping problem - many mismatches

150 views
Skip to first unread message

Piotr Gawroński

unread,
Sep 25, 2015, 9:36:07 AM9/25/15
to rna-star

Dear all,
I am new to STAR software. I have mapped 100bp paired-end transcriptome reads from Illumina to Arabidopsis thaliana genome Ensembl 28 annotation (ftp://ftp.ensemblgenomes.org/pub/plants/release-28/gtf/arabidopsis_thaliana/Arabidopsis_thaliana.TAIR10.28.gtf.gz). When visualized in IGV, mapping of most of genes looks fairly good (example picture_1). However, some genes contain reads with many mismatches mainly in the proximity of intron-exon junctions (example picture_2). I am wondering if any of you had similar results and how to discard this kind of reads?
bests,
Piotr



Code used for mapping:
STAR --runThreadN 12 \
--genomeDir genome.28 \
--readFilesIn reads/Transcriptome/W0_3_1.fq reads/Transcriptome/W0_3_2.fq \
--outFilterMismatchNoverLmax 0.02 \
--alignIntronMax 11000 \
--alignSJoverhangMin 8 \
--outSAMtype BAM  SortedByCoordinate






Kirill Tsyganov

unread,
Sep 27, 2015, 6:35:04 PM9/27/15
to Piotr Gawroński, rna-star
Hi Piotr, 

This question seems to be more biological rather than specificity about STAR aligner, other forums might be more appropriate for you question https://www.biostars.org/.

Firstly I don't think you should relay on IGV to draw your conclusion about your data alignment or any variance in your genes. You should call variants, perhaps, to see what is going on.. Here is BioStars post that might help with variant calling https://www.biostars.org/p/147672/ 

Secondly you might want to try another aligner maybe TopHat to see if your result would look different..

Last but not least I think you might have stumbled onto something interesting here..Two possibilities is either TAIR 28 reference isn't accurate (at least for that particular gene that you are showing) or that gene (and maybe many other) have some sort of unique/different mechanism for alternative intron splicing.. either way I think this got little to do with aligner itself..

Cheers, 

--
You received this message because you are subscribed to the Google Groups "rna-star" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+u...@googlegroups.com.
Visit this group at http://groups.google.com/group/rna-star.

Alexander Dobin

unread,
Oct 1, 2015, 4:19:45 PM10/1/15
to rna-star, peter.g...@gmail.com
Hi Piotr,

in addition to Kirill suggestions, please check whether the reads that appear to have many mismatches are actually soft-clipped by STAR.
It seems to me (the picture is too small to be sure) that the mismatches mostly occur at the end of the reads. 
With --outFilterMismatchNoverLmax 0.02 few mismatches should be allowed (e.g. 4 for 2x100 PE reads).
My recolllection is that IGV "expands" the soft-clipped alignments, and so they appear to have many mismatches.
Why you have so many reads that have to be clipped to map, it's an interesting question. I would check the clipped portions for adapter sequences.

Cheers
Alex
To unsubscribe from this group and stop receiving emails from it, send an email to rna-star+unsubscribe@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages