bagit python-- error when attempting to use multiple processes option to validate bag

55 views
Skip to first unread message

mirthblaster36

unread,
Nov 15, 2016, 3:49:50 PM11/15/16
to Digital Curation
Hi again everyone,

Thanks to Michael Shallcross, I got my initial problem sorted, but now I have run into another. Everything seems to work ok as long as I don't get fancy and attempt to use the option to calculate checksums in parallel. I'm on a quad core machine, so this should work, right? I'll paste the error below-- any insights greatly appreciated!

Thanks,
Mary Willoughby

Digital Library of Georgia

_____________

Microsoft Windows [Version 6.1.7601]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\mwilloug>cd c:\bagit_python_test

c:\bagit_python_test>python bagit.py --processes 2 --validate bucket
2016-11-15 14:33:00,171 - ERROR - unable to calculate file hashes for c:\bagit_python_test\bucket
Traceback (most recent call last):
  File "bagit.py", line 518, in _validate_entries
    pool = multiprocessing.Pool(processes if processes else None, _init_worker)
  File "C:\Python27\lib\multiprocessing\__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "C:\Python27\lib\multiprocessing\pool.py", line 159, in __init__
    self._repopulate_pool()
  File "C:\Python27\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
    w.start()
  File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Python27\lib\multiprocessing\forking.py", line 277, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Python27\lib\multiprocessing\forking.py", line 199, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 568, in save_tuple
    save(element)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 754, in save_global
    (obj, module, name))
PicklingError: Can't pickle <function _init_worker at 0x0000000002E20EB8>: it's not found as __main__._init_worker
Traceback (most recent call last):
  File "bagit.py", line 945, in <module>
    valid = bag.validate(processes=opts.processes, fast=opts.fast)
  File "bagit.py", line 363, in validate
    self._validate_contents(processes=processes, fast=fast)
  File "bagit.py", line 443, in _validate_contents
    self._validate_entries(processes)  # *SLOW*
  File "bagit.py", line 518, in _validate_entries
    pool = multiprocessing.Pool(processes if processes else None, _init_worker)
  File "C:\Python27\lib\multiprocessing\__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "C:\Python27\lib\multiprocessing\pool.py", line 159, in __init__
    self._repopulate_pool()
  File "C:\Python27\lib\multiprocessing\pool.py", line 223, in _repopulate_pool
    w.start()
  File "C:\Python27\lib\multiprocessing\process.py", line 130, in start
    self._popen = Popen(self)
  File "C:\Python27\lib\multiprocessing\forking.py", line 277, in __init__
    dump(process_obj, to_child, HIGHEST_PROTOCOL)
  File "C:\Python27\lib\multiprocessing\forking.py", line 199, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "C:\Python27\lib\pickle.py", line 224, in dump
    self.save(obj)
  File "C:\Python27\lib\pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\Python27\lib\pickle.py", line 425, in save_reduce
    save(state)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 655, in save_dict
    self._batch_setitems(obj.iteritems())
  File "C:\Python27\lib\pickle.py", line 687, in _batch_setitems
    save(v)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 568, in save_tuple
    save(element)
  File "C:\Python27\lib\pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\Python27\lib\pickle.py", line 754, in save_global
    (obj, module, name))
pickle.PicklingError: Can't pickle <function _init_worker at 0x0000000002E20EB8>: it's not found as __main__._init_worker

c:\bagit_python_test>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\multiprocessing\forking.py", line 381, in main
    self = load(from_parent)
  File "C:\Python27\lib\pickle.py", line 1384, in load
    return Unpickler(file).load()
  File "C:\Python27\lib\pickle.py", line 864, in load
    dispatch[key](self)
  File "C:\Python27\lib\pickle.py", line 886, in load_eof
    raise EOFError
EOFError

Ed Summers

unread,
Nov 15, 2016, 4:21:01 PM11/15/16
to digital-...@googlegroups.com
Hi Mary,

I don't have an immediate solution, but I think you've found a bug when using --processes on Windows. I opened an issue ticket here:

https://github.com/LibraryOfCongress/bagit-python/issues/79

//Ed

Chris Adams

unread,
Nov 15, 2016, 4:21:47 PM11/15/16
to digital-...@googlegroups.com
I believe this was fixed in https://github.com/LibraryOfCongress/bagit-python/pull/56, which has not yet been released. Work has been progressing internally on another release.

Chris

mirthblaster36

unread,
Nov 16, 2016, 11:14:37 AM11/16/16
to Digital Curation
Thanks for the information-- I'll move ahead with using this version minus that feature and keep an eye out for the next release.

Best,

Mary

Reply all
Reply to author
Forward
0 new messages