Merging two SAM entries?

31 views
Skip to first unread message

Aleksandra Galitsyna

unread,
Feb 15, 2024, 3:08:18 PM2/15/24
to pysam-us...@googlegroups.com
Hi pysam developers and the team! 

I am a developer of Hi-C library pairtools for parsing SAM entries of Hi-C mapped data into pairs of 3D genome interactions. 
Some time ago we switched to the pysam backend, and found it very fast and simple - it's amazing! 

While it serves our needs perfectly, I have recently faced a need in merging two SAM entries into one, which is one of the logistic bottlenecks in the pairtools upgrade that I'm working on.
In other words, there are alignments in the genome originating from the same chromosome, and I want to convert that into a single gapped alignment; or as a single entry with additional information about the "mate".

Do you think this is something that can be done within the pysam framework? 
I understand there can be complications due to mapper-specific formatting choices (we typically parse the output of bwa mem), but any case of alignment merging will be a good starting point for me. 

I would appreciate any thoughts or suggestions!

Thanks!

Kind regards,
Aleksandra

----------------
Dr. Aleksandra Galitsyna
IMES MIT
77 Mass Ave, Cambridge, MA

Andreas Heger

unread,
Feb 16, 2024, 2:48:17 PM2/16/24
to pysam-us...@googlegroups.com
Hi Aleksandra,

unfortunately there is no direct functionality in pysam. However,
SAM are easily constructed from scratch, see for example here:

https://github.com/pysam-developers/pysam/blob/cdc0ed12fbe2d7633b8fa47534ab2c2547f66b84/tests/AlignedSegment_test.py#L35

It would be a matter of taking the "cigartuples" of the two alignments,
adding a linking gap tuple in the middle and then creating the alignment.

A more principled approach is to convert both SAM encoded alignments
into "aligned residue pairs".

https://github.com/pysam-developers/pysam/blob/cdc0ed12fbe2d7633b8fa47534ab2c2547f66b84/pysam/libcalignedsegment.pyx#L1976

You could then simply merge the lists, sort the positions and
make sure that both sequences are increasing (i.e. there are
no conflicts). You would then need to convert these pairs back
to SAM format. Unfortunately I don't think we have a function in
pysam for this, but there should be plenty of implementations out
there as this is basically the final step any mapper needs to do
when writing their alignments to SAM. Maybe there is even a function
in htslib.

Best wishes,
Andreas
> --
> You received this message because you are subscribed to the Google
> Groups "Pysam User group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to pysam-user-gro...@googlegroups.com
> <mailto:pysam-user-gro...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pysam-user-group/CAFpgnODBLSB6zbD6Q_9Sr_a_TT5Vy3daid_aRcYDts1%2B5%2BnNNQ%40mail.gmail.com <https://groups.google.com/d/msgid/pysam-user-group/CAFpgnODBLSB6zbD6Q_9Sr_a_TT5Vy3daid_aRcYDts1%2B5%2BnNNQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages