Parallel manipulation of bam file ends with Error

601 views
Skip to first unread message

Dario Romagnoli

unread,
Jan 19, 2018, 6:58:41 AM1/19/18
to Pysam User group
Sorry, I thought the provious title could have led to some confusion.
I'm trying to manipulate the query sequence in order to generate a set of tags but I want to speed up the process using parallel computation.


import multiprocessing as mp

def make_tags(pysam_segment)
    read = Seq(pysam_segment.query_sequence)
    # do stuff
    return(tags)

bam_file = pysam.AlignmentFile(in_file, "rb", check_sq=False)
sam_iter = islice(bam_file.fetch(until_eof=True), None, None, 2)
with mp.Pool(4) as pool:                  
    results = pool.map(make_tags, sam_iter)


The program stops with this error:

TypeError: self._delegate cannot be converted to a Python object for pickling 

Note that:
- it works without parallelization
- using imap doesn't throw error but when I try to iterate over results the program lags indefinitely.

Dario Romagnoli

unread,
Jan 19, 2018, 7:13:40 AM1/19/18
to Pysam User group
The lag in imap is resolved with 

pool = mp.Pool(1)
results
= pool.imap(make_tags, sam_iter)

But then I got the same error.

Dennis Simpson

unread,
Jan 19, 2018, 7:46:24 AM1/19/18
to Pysam User group
You cannot pickle a function with the Python built in pickle.  Try using the Pathos package for the multiprocessing. 

Dario Romagnoli

unread,
Jan 19, 2018, 8:53:27 AM1/19/18
to Pysam User group
I somehow circumvent the problem using a generator

reads = (x.query_sequence for x in in_bam_iter)
with Pool(cores) as pool:
    tags = pool.map(make_tags, reads)

Now the problem lies within adding the tags and writing a new bam file.

Peter LoVerso

unread,
May 23, 2018, 1:42:55 PM5/23/18
to Pysam User group
I'm running into this same problem. I've swapped to using pathos and dill, but running into the same error.

I'm using a multiprocess Queue, and the issue happens when I put an object into the Queue for processing.

  File "/opt/venv/specter/lib/python3.6/site-packages/dill/dill.py", line 871, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
  File "stringsource", line 2, in pysam.libcalignedsegment.AlignedSegment.__reduce_cython__
TypeError: self._delegate cannot be converted to a Python object for pickling

Peter LoVerso

unread,
May 23, 2018, 1:56:56 PM5/23/18
to Pysam User group
To give a minimal example using the code mentioned above:

from pathos.multiprocessing import Pool
from pathos.helpers import mp
import pysam
from itertools import islice

def make_tags(pysam_segment):
    read = Seq(pysam_segment.query_sequence)
    # do stuff
    return(read)

in_file = "test.bam"

bam_file = pysam.AlignmentFile(in_file, "rb", check_sq=False)
sam_iter = islice(bam_file.fetch(until_eof=True), None, None, 2)
with Pool(4) as pool:                  
    results = pool.map(make_tags, sam_iter)


Produces:

Traceback (most recent call last):
  File "./test_mp.py", line 16, in <module>
    results = pool.map(make_tags, sam_iter)
  File "/opt/venv/specter/lib/python3.6/site-packages/multiprocess/pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/opt/venv/specter/lib/python3.6/site-packages/multiprocess/pool.py", line 608, in get
    raise self._value
  File "/opt/venv/specter/lib/python3.6/site-packages/multiprocess/pool.py", line 385, in _handle_tasks
    put(task)
  File "/opt/venv/specter/lib/python3.6/site-packages/multiprocess/connection.py", line 209, in send
    self._send_bytes(ForkingPickler.dumps(obj))
  File "/opt/venv/specter/lib/python3.6/site-packages/multiprocess/reduction.py", line 53, in dumps
    cls(buf, protocol).dump(obj)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 751, in save_tuple
    save(element)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 736, in save_tuple
    save(element)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
  File "stringsource", line 2, in pysam.libcalignedsegment.AlignedSegment.__reduce_cython__
TypeError: self._delegate cannot be converted to a Python object for pickling

Dennis Simpson

unread,
May 23, 2018, 2:08:27 PM5/23/18
to Pysam User group
I have tried this approach as well using Pathos in place of the built in dill and multiprocessing library.   There is a discussion on the Pathos blog about this type of issue.  Basically the class from pysam is too complex to pickle.  There was a possible fix posted that would require you to edit pysam.  I suggest you do some reading over at the Pathos site.





On Friday, January 19, 2018 at 6:58:41 AM UTC-5, Dario Romagnoli wrote:

Peter LoVerso

unread,
May 23, 2018, 2:09:46 PM5/23/18
to pysam-us...@googlegroups.com
Cool, do you have a link to the discussion please?

--
You received this message because you are subscribed to the Google Groups "Pysam User group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pysam-user-group+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Peter LoVerso

unread,
May 23, 2018, 2:19:45 PM5/23/18
to pysam-us...@googlegroups.com
A quick google search for "pysam" "pathos" turned up nothing except this discussion plus actual code results from Github that don't do what we're trying to do. No blogs I could see.

Dennis Simpson

unread,
May 23, 2018, 2:20:16 PM5/23/18
to Pysam User group
This is their generic discussions.  I am digging through my emails looking for the specific one right now.

To unsubscribe from this group and stop receiving emails from it, send an email to pysam-user-gro...@googlegroups.com.

Dennis Simpson

unread,
May 23, 2018, 2:21:23 PM5/23/18
to Pysam User group
The pathos group uses github for discussions.  The one I am looking for was not specifically pysam but it gave the same error.
To unsubscribe from this group and stop receiving emails from it, send an email to pysam-user-gro...@googlegroups.com.

Peter LoVerso

unread,
May 23, 2018, 2:25:46 PM5/23/18
to pysam-us...@googlegroups.com
Ah, are you referring to this one?
I guess my only option is to load the pysam values I need into another class prior to parallelization similar to the other guy above did with query_sequence. Alas.

To unsubscribe from this group and stop receiving emails from it, send an email to pysam-user-group+unsubscribe@googlegroups.com.

g2.che...@gmail.com

unread,
May 23, 2018, 2:30:36 PM5/23/18
to pysam-us...@googlegroups.com

Yes, that is the one.  I was specifically thinking of the last comment.  I did not try it though.  I changed my code so that each job gets the region coordinates and then opens the bam file and fetches the data.  It does slow everything down when doing this for hundreds of regions.

Peter LoVerso

unread,
May 23, 2018, 2:39:22 PM5/23/18
to pysam-us...@googlegroups.com
Yeah, that's what my code does currently- I'm swapping to this method because the performance is indeed terrible with many regions.

To unsubscribe from this group and stop receiving emails from it, send an email to pysam-user-group+unsubscribe@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Pysam User group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pysam-user-group+unsubscribe@googlegroups.com.


For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Pysam User group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pysam-user-group+unsubscribe@googlegroups.com.

Sergei Iakhnin

unread,
Jan 10, 2019, 11:24:46 AM1/10/19
to Pysam User group
Came here because of an issue pickling a list of AlignedSegment objects for processing with Spark. My issue was solved by using dill.

Michele Tinti

unread,
Apr 18, 2020, 7:29:56 PM4/18/20
to Pysam User group
Hi, i have same issue, do you mind say how did you use dill?
Reply all
Reply to author
Forward
0 new messages