EDS translator -- integration question

32 views
Skip to first unread message

pari...@ornl.gov

unread,
Nov 9, 2018, 9:59:26 AM11/9/18
to pycroscopy

These are more of philosophical questions than coding questions.

 

I've got a SEM / STEM X-ray spectrum imaging EDS package up and running *very* well in Python, incorporating both the state-of-the-art published algorithms and one or two of my own.

 

My first question is, how best to integrate into Pycroscopy? At the moment I have a single file to import, with a base class that is inherited by vendor-specific classes. You instantiate an instance of the vendor-specific class handle the import, and their inheritance from the base class handles the actual spectrum image analysis.

 

And there's the rub -- because EDS is so noisy, methods like K-means or SVD fail and require physics-informed derivatives.

 

At present I've loaded those in as methods on the base class, so that you could invoke, for instance, `U, S, V = my_eds_object.scaled_svd()`.


Because these are physics-based, I'm tempted to leave them hanging off the class that also handles imports, but should they move to `pycroscopy.processing`? I would hate to break up the EDS specific stuff across multiple pre-existing modules, because that would, I expect, make maintenance and enhancement difficult, but I could also see that having processing and I/O lumped together away from `processing` and `io` could cause different problems.


#########



My second question is, how to handle multiple streams of analysis in the USID file? I've got this, which is easy enough using pyUSID.NumpyTranslator:

```

/

├ Measurement_000

  ---------------

  ├ Channel_000

    -----------

    ├ Position_Indices

    ├ Position_Values

    ├ Raw_Data

    ├ Spectroscopic_Indices

    ├ Spectroscopic_Values

```

However, I'd like to be able to stash multiple sets of analyses. Let's say I bin pixels 8x8 and the spectral channels x2 and perform SVD (analysis 000), then try again with 4x4, x4 binning and try SVD (analysis 001). How would I handle this? My thought of the most logical approach is `/Measurement_000/Chanel_000/Analysis_000` with `/Binned_Data_SparseCSC`, `/S`, `/U`, `/V`, etc. inside of the group `/Analysis_000` but I don't want to break the pyUSID or Pycroscopy API by getting clever. Advice?

 

Thanks

Chad 

suhas....@gmail.com

unread,
Nov 9, 2018, 1:12:32 PM11/9/18
to pycroscopy
Chad,

Thanks for choosing to integrate your code into pycroscopy!

Please see our guidelines for contribution. This document was written for pyUSID but the guidelines hold true for pycroscopy as well in general.

We envisioned people following such pipelines when dealing with similar data from different instruments:
  1. Translate - from .proprietary to .hdf5 
  2. Preprocess - perhaps one class would standardize data such that the end result looks the same regardless of the instrument for a given measurement modality. If the standardization were always performed and relatively fast, this could be integrated into the translation step.
  3. Any analysis - physical or statistical
This pipeline would map to separate sub packages : io (translators), processing, analysis. The whole idea of modularizing / separating things out was to encourage reuse of code. 

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The generalized answer to your second question is captured in this document (look for the subheading "Tool (analysis / processing)"). Briefly, the philosophy is to have your analysis / processing class write out results to new groups each time you apply a process to the dataset. You can see what we mean by this by looking at our pycroscopy.processing.SVD, Decomposition, and Cluster classes. 
Essentially, each time you run the computation, the class should write out a new group: Source_Dataset-Process_Name_00x. Where 'x' is the counter for the Nth time that this computation was performed. For example, you would write out the U, S, V datasets (and any necessary ancillary datasets) from SVD into this newly created "results" group.

Hope this helps. Let us know if you have any further questions. We would be happy to discuss integration strategies for your specific case once we have a better idea about your code - we could meet / do BlueJeans calls.

pari...@ornl.gov

unread,
Nov 9, 2018, 1:36:01 PM11/9/18
to pycroscopy
Thanks! That will help. I'll work on refactoring my code and we'll think about meeting in a few weeks, perhaps.

MoniFram

unread,
Sep 10, 2019, 10:40:55 AM9/10/19
to pycroscopy
Hi,
I am looking at getting started with pycroscopy with SEM, EDS data. Chad are you planning to port your code to the project?
Thanks,

Chad Parish

unread,
Sep 10, 2019, 11:56:43 AM9/10/19
to pycroscopy
Hi,

I have a Python library written that will read Bruker and Oxford EDS data, and could write an EDAX converter in an afternoon if I needed it. I can also perform the needed analyses: optimally scaled PCA/SVD, binning, factor rotation, independent component analysis, NNMF, K-means, etc.

However, this is not yet ported into Microscopy. I need to get a good undergraduate or graduate student who has the time to sit and code the Pycroscopy API plug in. The API integration is all that remains; the analysis codes are all fully functional.

As soon as I can get a good student to sit and do the work, we'll be ready to go.

Chad
Reply all
Reply to author
Forward
0 new messages