influence of parameterization on translated transcript alignments

20 views
Skip to first unread message

Olivia

unread,
Apr 25, 2024, 1:53:14 PMApr 25
to rna-star
Hi Alex,

I'm testing how parameters impact STAR alignment and have a question about how genome alignments are translated into transcript alignments when using the --quantMode TranscriptomeSAM option. As an example, the following read has the same genome alignment in Aligned.sortedByCoord.out.bam files generated from three STAR runs with different parameterizations.  Unfortunately I did not output the alignment score in test B, this appears to be the only part that differs between A and C.

Genome alignment Aligned.sortedByCoord.out.bam
Parameter test set A (basically ENCODE)
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        163     SIRV2   4695    255     7S106M988N37M   =       4765    1196    CCTACTGGTGTCTGTCGCAATGTAAATGGGGGTGACAAGTTTTACCATTTGGTATGTTTTAGTTTACACATCACACACTATTTCAACTAAACTCGCTACAACGTAGTGAACTCTCGGCATGATATGCTACCTTCTACAATTATTGCTGTT        FFFFFFFFFFFFFF:FFFFF,F:FF:FFFFFFFFFFF:FF:FF,FFF:FFFFFFFFFFFFFFFF::FFFF,,:FF,,FFFFFFFFF:FF::FFFFFFFFF:FFFFFFFFF,F:F,FFFFFFF::,,FFF:F,FFFF:FFFFFFFFFFF:F        NH:i:1  HI:i:1  AS:i:276        NM:i:2  MD:Z:68A31A42
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        83      SIRV2   4765    255     36M988N102M     =       4695    -1196   CTATTTCAACTAAACTCGCTACAACGTAGTGAACTCTCGGCATGATATGCTACCTTCTACAATTATTGCTGTTTCGGTAGGGTTGGATATCATTGCGTATATTTCGAAATGTCGTTCGCATCCATGTTCATCCACCAC    :FFFF,FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFF:FFFFFF   NH:i:1   HI:i:1  AS:i:276        NM:i:1  MD:Z:30A107

Parameter test set B (high multimap, high mismatch)
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        163     SIRV2   4695    255     7S106M988N37M   =       4765    1196    CCTACTGGTGTCTGTCGCAATGTAAATGGGGGTGACAAGTTTTACCATTTGGTATGTTTTAGTTTACACATCACACACTATTTCAACTAAACTCGCTACAACGTAGTGAACTCTCGGCATGATATGCTACCTTCTACAATTATTGCTGTT        FFFFFFFFFFFFFF:FFFFF,F:FF:FFFFFFFFFFF:FF:FF,FFF:FFFFFFFFFFFFFFFF::FFFF,,:FF,,FFFFFFFFF:FF::FFFFFFFFF:FFFFFFFFF,F:F,FFFFFFF::,,FFF:F,FFFF:FFFFFFFFFFF:F        NH:i:1  HI:i:1  NM:i:2  MD:Z:68A31A42
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        83      SIRV2   4765    255     36M988N102M     =       4695    -1196   CTATTTCAACTAAACTCGCTACAACGTAGTGAACTCTCGGCATGATATGCTACCTTCTACAATTATTGCTGTTTCGGTAGGGTTGGATATCATTGCGTATATTTCGAAATGTCGTTCGCATCCATGTTCATCCACCAC    :FFFF,FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFF:FFFFFF   NH:i:1   HI:i:1  NM:i:1  MD:Z:30A107

Parameter test set C (high multimap, med. mismatch, short window)
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        163     SIRV2   4695    255     7S106M988N37M   =       4765    1196    CCTACTGGTGTCTGTCGCAATGTAAATGGGGGTGACAAGTTTTACCATTTGGTATGTTTTAGTTTACACATCACACACTATTTCAACTAAACTCGCTACAACGTAGTGAACTCTCGGCATGATATGCTACCTTCTACAATTATTGCTGTT        FFFFFFFFFFFFFF:FFFFF,F:FF:FFFFFFFFFFF:FF:FF,FFF:FFFFFFFFFFFFFFFF::FFFF,,:FF,,FFFFFFFFF:FF::FFFFFFFFF:FFFFFFFFF,F:F,FFFFFFF::,,FFF:F,FFFF:FFFFFFFFFFF:F        NH:i:1  HI:i:1  AS:i:274        NM:i:2  MD:Z:68A31A42
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        83      SIRV2   4765    255     36M988N102M     =       4695    -1196   CTATTTCAACTAAACTCGCTACAACGTAGTGAACTCTCGGCATGATATGCTACCTTCTACAATTATTGCTGTTTCGGTAGGGTTGGATATCATTGCGTATATTTCGAAATGTCGTTCGCATCCATGTTCATCCACCAC    :FFFF,FFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF:F:FFFFFFF:FFFFFF   NH:i:1   HI:i:1  AS:i:274        NM:i:1  MD:Z:30A107

This read appears to derive from the internal standard SIRV gene 2 in a region identical between transcripts SIRV201 and SIRV202. The transcript alignments differ between Aligned.toTranscriptome.out.bam files generated from the three parameter tests. 

Transcriptome alignments Aligned.toTranscriptome.out.bam
Parameter test set A
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        355     SIRV202 22      3       138M    =       87      215     GTGGTGGATGAACATGGATGCGAACGACATTTCGAAATATACGCAATGATATCCAACCCTACCGAAACAGCAATAATTGTAGAAGGTAGCATATCATGCCGAGAGTTCACTACGTTGTAGCGAGTTTAGTTGAAATAG    FFFFFF:FFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFF:      NH:i:2HI:i:1
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        403     SIRV202 87      3       150M    =       22      -215    AACAGCAATAATTGTAGAAGGTAGCATATCATGCCGAGAGTTCACTACGTTGTAGCGAGTTTAGTTGAAATAGTGTGTGATGTGTAAACTAAAACATACCAAATGGTAAAACTTGTCACCCCCATTTACATTGCGACAGACACCAGTAGG        F:FFFFFFFFFFF:FFFF,F:FFF,,::FFFFFFF,F:F,FFFFFFFFF:FFFFFFFFF::FF:FFFFFFFFF,,FF:,,FFFF::FFFFFFFFFFFFFFFF:FFF,FF:FF:FFFFFFFFFFF:FF:F,FFFFF:FFFFFFFFFFFFFF        NH:i:2  HI:i:1

Parameter test set B
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        355     SIRV201 18      3       138M    =       83      215     GTGGTGGATGAACATGGATGCGAACGACATTTCGAAATATACGCAATGATATCCAACCCTACCGAAACAGCAATAATTGTAGAAGGTAGCATATCATGCCGAGAGTTCACTACGTTGTAGCGAGTTTAGTTGAAATAG    FFFFFF:FFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFF:      NH:i:2HI:i:2
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        403     SIRV201 83      3       150M    =       18      -215    AACAGCAATAATTGTAGAAGGTAGCATATCATGCCGAGAGTTCACTACGTTGTAGCGAGTTTAGTTGAAATAGTGTGTGATGTGTAAACTAAAACATACCAAATGGTAAAACTTGTCACCCCCATTTACATTGCGACAGACACCAGTAGG        F:FFFFFFFFFFF:FFFF,F:FFF,,::FFFFFFF,F:F,FFFFFFFFF:FFFFFFFFF::FF:FFFFFFFFF,,FF:,,FFFF::FFFFFFFFFFFFFFFF:FFF,FF:FF:FFFFFFFFFFF:FF:F,FFFFF:FFFFFFFFFFFFFF        NH:i:2  HI:i:2

Parameter test set C
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        99      SIRV202 22      3       138M    =       87      215     GTGGTGGATGAACATGGATGCGAACGACATTTCGAAATATACGCAATGATATCCAACCCTACCGAAACAGCAATAATTGTAGAAGGTAGCATATCATGCCGAGAGTTCACTACGTTGTAGCGAGTTTAGTTGAAATAG    FFFFFF:FFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFF:      NH:i:2HI:i:1
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        147     SIRV202 87      3       150M    =       22      -215    AACAGCAATAATTGTAGAAGGTAGCATATCATGCCGAGAGTTCACTACGTTGTAGCGAGTTTAGTTGAAATAGTGTGTGATGTGTAAACTAAAACATACCAAATGGTAAAACTTGTCACCCCCATTTACATTGCGACAGACACCAGTAGG        F:FFFFFFFFFFF:FFFF,F:FFF,,::FFFFFFF,F:F,FFFFFFFFF:FFFFFFFFF::FF:FFFFFFFFF,,FF:,,FFFF::FFFFFFFFFFFFFFFF:FFF,FF:FF:FFFFFFFFFFF:FF:F,FFFFF:FFFFFFFFFFFFFF        NH:i:2  HI:i:1
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        355     SIRV201 18      3       138M    =       83      215     GTGGTGGATGAACATGGATGCGAACGACATTTCGAAATATACGCAATGATATCCAACCCTACCGAAACAGCAATAATTGTAGAAGGTAGCATATCATGCCGAGAGTTCACTACGTTGTAGCGAGTTTAGTTGAAATAG    FFFFFF:FFFFFFF:F:FFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFF:      NH:i:2HI:i:2
A01488:219:HCC2LDSX7:1:2216:5602:2268:AGACGTTTAT        403     SIRV201 83      3       150M    =       18      -215    AACAGCAATAATTGTAGAAGGTAGCATATCATGCCGAGAGTTCACTACGTTGTAGCGAGTTTAGTTGAAATAGTGTGTGATGTGTAAACTAAAACATACCAAATGGTAAAACTTGTCACCCCCATTTACATTGCGACAGACACCAGTAGG        F:FFFFFFFFFFF:FFFF,F:FFF,,::FFFFFFF,F:F,FFFFFFFFF:FFFFFFFFF::FF:FFFFFFFFF,,FF:,,FFFF::FFFFFFFFFFFFFFFF:FFF,FF:FF:FFFFFFFFFFF:FF:F,FFFFF:FFFFFFFFFFFFFF        NH:i:2  HI:i:2

What information is being used to determine the transcript alignment of this read? Is it dependent on the parameterization of the STAR call? I came across this question while trying to better understand how RSEM interprets STAR alignment and quality information to generate counts. Thank you for any advice you can offer.

Olivia
Reply all
Reply to author
Forward
0 new messages