Dear all,
My name is Marco, I am doing a PhD in Geology at Trinity College Dublin (acev...@tcd.ie). It has been an intensive 1 year and a half of progress working with a large database and MatLab data management and image analysis tools that I have newly developed for researchers working with Big Data from mapping experiments of rock samples. I know it is unusual to hear a geologist asking for deconvolution issues but I have chosen this mailing list (referred by Joshua Taillon) and I am seeking to collaborate with Computational Microscopy efforts as I am open-source as well.
I was working with the proprietary software AZtec (Oxford Instruments)
doing ‘auto PhaseMaps’ and 'QuantMaps' (elements wt.%). Acquiring large
areas of rock sections (cm2) is very computationally expensive and using a
binning factor (b.f.) of 1 for the calculations would take 2 weeks for offline processing of an EDX map, which is unpractical for working in a research laboratory. So, I have been using b.f. = 4 (sharply decreasing
the spatial resolution). This approach has shortfalls but it is the only option nowadays. Thus, if I want to scale up my work and do 4 or 5 thin sections (samples), I need to increase computing power (either by hardware or software solutions). A few details, to follow up:
- For obtaining QuantMap acquisition using a binning factor of 1,
which is means deconvolving 1 pixel composition from 1 pixel spectra
representing a single interaction volume, I need high performance computing. So
far, I have only achieved 24-hour processing at a binning factor of 4
(averaging 42 = 16 neighboring pixels) and downscaling to 1 (i.e.: 4
μm/px) is estimated to take around 2 weeks for completion (not
feasible).
- Further justifying wt. % accuracy, confidence levels and detection
limits will require insight into AZtec software (Oxford Instruments) data
structures and patented algorithms that work as a ´black box´ for front-end
users.
- Additionally, proprietary software (in general) is dedicated to run in a
single PC central processing unit (CPU). This implies that if you want quicker results, you need to upgrade whole PC and not spare parts (RAM, GPU), which is very expensive nowadays considering Moore's law decline in the last 7 years (e.g.: watch any of NVIDIA CEO Jensen Huang talks).
In this context, I made the following questions to Joshua Taillon (Nov 30, 2019):
- Should I recommend
acquiring a high performance PC to run AZtec (Oxford Instruments) in my laboratory? Talking about 10 thousand EUR and coming in at the beginning of 2021.
- Or, should I spend research time figuring out how to use HyperSpy to do the same iterative calculations of pixel wt.%? Will this give me the option to do
parallel computing and speed up?
In his reply (Dec 12, 2019), he mentioned that HyperSpy API software can do quantification from an spectral image (http://hyperspy.org/hyperspy-doc/current/user_guide/eds.html) like a AZtec
SmartMap raw data.
Despite not being user-friendly (Python), HyperSpy is open-source and portable
and can be run on any hardware, whether that's an expensive PCs, or a rented
out cluster on AWS, etc. In fact,
some operations are parallelizable, and matrix operations in general can be
fast, yet HyperSpy is not specifically optimized to be a super-fast analysis
tool. In addition, it enables doing analyses that would be very difficult (or
impossible) in most vendors' software. Finally, if you are doing
research (i.e. varying samples, processes, etc. rather than doing the same
process every day) I am of the strong opinion that "doing it
yourself" with respect to the analysis will give you more confidence in
your results.
Next, thinking about the expensive option of continue working with AZtec and acquiring a high performance PC, I needed PC benchmarking. Unfortunately, AZtec performance with QuantMaps has been rarely compared and you can only trust vendor's PC recommendations and semi-quantitative image array outputs. The best approximation of which hardware would perform best has been given by benchmarking with the automated mineralogy system TIMA 3 from TESCAN, in a personal communication with a member of their research group here is his reply:
- Tomáš Hrstka (January 16, 2020): "I did a lot of internal testing on speed and
performance and there is a number of factors affecting that. TIMA definitely
benefits from big RAM (128GB or more). PCI/M.2. SSD disks with 3000MB/s or more
make a big difference compare to classical SSD at 500MB/s. Running the
system at one such PCI/M.2.SSD and the hot data on another
such PCI/M.2.SSD is ideal. TIMA, unfortunately, does not take the benefit
of GPU computing. (I assume GPU will be important for your other applications). Faster
CPU you can get the better. Many processes in TIMA are optimized for
multithreading but not all, so you want multiple cores, but still decent speed
at a single core (e.g. intel Core i9-9900KS @ 4.00GHz). I can only assume
that other EDS software will potentially behave similarly. If you have AZtec in you lab you can easily run some
TaskManager performance tests to see if multithreading actually works or if GPU
is utilized during QuantMap..."
Hence, considering that pixel processing in a GPU would be cheaper and orders of magnitude faster than CPU, I have lately learned more about HyperSpy methods and done a comprehensive literature review. In brief, I am proposing the following steps:
- 0.- Create the phase map in HyperSpy (e.g.: https://pages.nist.gov/2019-06_CCEM_presentation/#/27). 1.- Export the
summed spectra for 1 mineral mask to DTSA-II. 2.- Perform a theoretical
calculation and analytical simulation (equal conditions to real-life
measurements in the MIRA3). 3.- Export the used k-factors (is that possible?).
4.- Parse the files for the k-factor matrices for the X-ray lines I need. 5.-
Create and order a list in alphabetic order. 6.- Use a rather simple HyperSpy
‘Cliff-Lorimer’ quantification for background-subtracted intensities. 7.-
Repeat for all the pixels of mask. 8.- Join all the masks’ pixels into one fully quantitative map (boundary pixels would be NaN values from step 0.-).
To sum up, Geological applications using SEM-EDX maps require to scale up and speed up offline processing to a quantitative level with pixel-wise information (in contrast with grain-wise QUEMSCAN, TIMA and MLA systems). Achieving it will have an enormous impact on the study of minerals looking forward their economic extraction (Mineral Processing) or genetic interpretation on the context of more reproducible and representative studies (whole-slides and not ROI areas). This must happen in a context where EDX-SDD detectors are becoming 10x faster and gaining sensibility in a few years.
As I said at the beginning, I have already automated the image analysis in MatLab, my results were presented in a recent conference in LTU, Lulea, Sweden. Yet, I need to carry on with my PhD and start developing a software platform Virtual Microscope (
http://geologyslider.com/) that will host the maps and time will not be enough. Thus, I would be looking forward to know if you or your group could find potential in my approach or can test it. These developments are mutually independent but could have synergy later on, so don't hesitate to contact me.
Thank you :).