Errors in the Spectral Unmixing guide

Luc Lajaunie

unread,

Nov 27, 2019, 8:07:10 AM11/27/19

to pycroscopy

Hello,

I am a new user ad or now I am trying to reproduce the spectral unmixing guide described in https://pycroscopy.github.io/pycroscopy/auto_examples/plot_spectral_unmixing.html#spectral-unmixing

I have succesfluyy intsalled pycroscopy and pyusdi, I can load the data using wget but I have the following error ben doing the SVD:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-35-b2bac0686780> in <module>
----> 1 decomposer = px.processing.svd_utils.SVD(h5_main, num_components=100)
      2 h5_svd_group = decomposer.compute()
      3 
      4 h5_u = h5_svd_group['U']
      5 h5_v = h5_svd_group['V']

C:\ProgramData\Anaconda3\lib\site-packages\pycroscopy\processing\svd_utils.py in __init__(self, h5_main, num_components, **kwargs)
     59 
     60         # Check that we can actually compute the SVD with the selected number of components
---> 61         self._check_available_mem()
     62 
     63         self.parms_dict = {'num_components': num_components}

C:\ProgramData\Anaconda3\lib\site-packages\pycroscopy\processing\svd_utils.py in _check_available_mem(self)
    233 
    234         mem_per_comp = s_mem_per_comp + u_mem_per_comp + v_mem_per_comp
--> 235         avail_mem = 0.75 * self._max_mem_mb * 1024 ** 2
    236         free_mem = avail_mem - self.h5_main.__sizeof__()
    237 

AttributeError: 'SVD' object has no attribute '_max_mem_mb'

An when doing the kmean I have the following problem:

Consider calling test() to check results before calling compute() which computes on the entire dataset and writes back to the HDF5 file
Performing clustering on /Measurement_000/Channel_000/Raw_Data.
Took 7.21 sec to compute KMeans
Calculated the Mean Response of each cluster.

---------------------------------------------------------------------------
_RemoteTraceback                          Traceback (most recent call last)
_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "C:\ProgramData\Anaconda3\lib\site-packages\joblib\externals\loky\backend\queues.py", line 150, in _feed
    obj_ = dumps(obj, reducers=reducers)
  File "C:\ProgramData\Anaconda3\lib\site-packages\joblib\externals\loky\backend\reduction.py", line 243, in dumps
    dump(obj, buf, reducers=reducers, protocol=protocol)
  File "C:\ProgramData\Anaconda3\lib\site-packages\joblib\externals\loky\backend\reduction.py", line 236, in dump
    _LokyPickler(file, reducers=reducers, protocol=protocol).dump(obj)
  File "C:\ProgramData\Anaconda3\lib\site-packages\joblib\externals\cloudpickle\cloudpickle.py", line 267, in dump
    return Pickler.dump(self, obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 437, in dump
    self.save(obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 662, in save_reduce
    save(state)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 856, in save_dict
    self._batch_setitems(obj.items())
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 882, in _batch_setitems
    save(v)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 662, in save_reduce
    save(state)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 856, in save_dict
    self._batch_setitems(obj.items())
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 887, in _batch_setitems
    save(v)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 662, in save_reduce
    save(state)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 856, in save_dict
    self._batch_setitems(obj.items())
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 882, in _batch_setitems
    save(v)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 816, in save_list
    self._batch_appends(obj)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 843, in _batch_appends
    save(tmp[0])
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 771, in save_tuple
    save(element)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 786, in save_tuple
    save(element)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 549, in save
    self.save_reduce(obj=obj, *rv)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 662, in save_reduce
    save(state)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 504, in save
    f(self, obj) # Call unbound method with explicit self
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 856, in save_dict
    self._batch_setitems(obj.items())
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 882, in _batch_setitems
    save(v)
  File "C:\ProgramData\Anaconda3\lib\pickle.py", line 524, in save
    rv = reduce(self.proto)
TypeError: can't pickle _thread._local objects
"""

The above exception was the direct cause of the following exception:

PicklingError                             Traceback (most recent call last)
<ipython-input-4-ac50ab686b4a> in <module>
      2 
      3 estimator = px.processing.Cluster(h5_main, KMeans(n_clusters=num_clusters))
----> 4 h5_kmeans_grp = estimator.compute(h5_main)
      5 h5_kmeans_labels = h5_kmeans_grp['Labels']
      6 h5_kmeans_mean_resp = h5_kmeans_grp['Mean_Response']

C:\ProgramData\Anaconda3\lib\site-packages\pycroscopy\processing\cluster.py in compute(self, rearrange_clusters, override)
    206         """
    207         if self.__labels is None and self.__mean_resp is None:
--> 208             _ = self.test(rearrange_clusters=rearrange_clusters, override=override)
    209 
    210         if self.h5_results_grp is None:

C:\ProgramData\Anaconda3\lib\site-packages\pycroscopy\processing\cluster.py in test(self, rearrange_clusters, override)
    153 
    154         t1 = time.time()
--> 155         self.__mean_resp = self._get_mean_response(results.labels_)
    156         print('Took {} to calculate mean response per cluster'.format(format_time(time.time() - t1)))
    157 

C:\ProgramData\Anaconda3\lib\site-packages\pycroscopy\processing\cluster.py in _get_mean_response(self, labels)
    247                                               func_args=[self.h5_main, labels, self.data_slice,
    248                                                          self.data_transform_func], lengthy_computation=True,
--> 249                                               verbose=self.verbose))
    250 
    251         return mean_resp

C:\ProgramData\Anaconda3\lib\site-packages\pyUSID\processing\comp_utils.py in parallel_compute(data, func, cores, lengthy_computation, func_args, func_kwargs, verbose)
    153     if cores > 1:
    154         values = [joblib.delayed(func)(x, *func_args, **func_kwargs) for x in data]
--> 155         results = joblib.Parallel(n_jobs=cores)(values)
    156 
    157         # Finished reading the entire data set

C:\ProgramData\Anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
    932 
    933             with self._backend.retrieval_context():
--> 934                 self.retrieve()
    935             # Make sure that we get a last message telling us we are done
    936             elapsed_time = time.time() - self._start_time

C:\ProgramData\Anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
    831             try:
    832                 if getattr(self._backend, 'supports_timeout', False):
--> 833                     self._output.extend(job.get(timeout=self.timeout))
    834                 else:
    835                     self._output.extend(job.get())

C:\ProgramData\Anaconda3\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
    519         AsyncResults.get from multiprocessing."""
    520         try:
--> 521             return future.result(timeout=timeout)
    522         except LokyTimeoutError:
    523             raise TimeoutError()

C:\ProgramData\Anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

C:\ProgramData\Anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

PicklingError: Could not pickle the task to send it to the workers

Any input would be appreciated.

Thanks

Luc

Raj Giridharagopal

unread,

Nov 27, 2019, 12:10:12 PM11/27/19

to pycroscopy

Hi Luc,

Thanks for bringing that up! For the first error, I actually brought up this issue in August and thought somehow it was fixed:

https://github.com/pycroscopy/pyUSID/issues/26

In the SVD call on the notebook, I guess we need to edit to have an explicit max_mem call in the SVD function call in that notebook.

I think Suhas or Rama can comment on the second error in the kmeans part.

Raj

Suhas Somnath

unread,

Nov 27, 2019, 3:37:27 PM11/27/19

to pycro...@googlegroups.com

Luc - It appears that you may not be using the latest version of pycroscopy (the error line number for the memory is not matching with what is on the latest version). Could you please update to the latest version and try again?

Raj / Rama - Dask has an SVD function that may work better than numpy's since it inherently takes care of memory management and parallel computing - both of which are concerns in the current implementation of the SVD class. I would suggest swapping out sklearn methods to / adding compatibility for Dask equivalents for the Cluster and Decomposition classes for the same reasons. I have provided links to these alternatives. This might be a relatively easy project for a hackathon.

On a related note, I have been thinking of using Dask.array instead of numpy as (one of) the backbone of pyUSID to work around memory and parallel computation issues. I have began the transition last year but need help to complete the transition.

Luc Lajaunie

unread,

Nov 27, 2019, 3:53:06 PM11/27/19

to pycroscopy

Suhas, you're right, for some reason pip installed 0.60.3 and not 0.60.4, i will try to update.

Luc Lajaunie

unread,

Nov 28, 2019, 6:52:10 AM11/28/19

to pycroscopy

The last version is 0.60.4 right? For now, I can only update to 0.60.3 with conda.

For some reason, "pip install -U git+https://github.com/pycroscopy/pycroscopy@dev"

give me:

Successfully built pycroscopy
Installing collected packages: pycroscopy
Found existing installation: pycroscopy 0.60.2
Uninstalling pycroscopy-0.60.2:
Successfully uninstalled pycroscopy-0.60.2
Successfully installed pycroscopy-0.60.2

Could please provide some input on how o update to 0.60.4?

Luc Lajaunie

unread,

Nov 30, 2019, 9:11:52 AM11/30/19

to pycroscopy

Hi all,

I manage to get the last version by using "pip install -U git+https://github.com/pycroscopy/pycroscopy". It seems that the dev version is outdating and the version on the conda repo is only 0.60.3.

And now SVD and kmeans are working!

Thanks

Rama Vasudevan

unread,

Dec 9, 2019, 11:27:21 AM12/9/19

to pycroscopy

Thanks Luc. I updated the package now so standard pip install should work for the latest.

Reply all

Reply to author

Forward