Geology: SEM-EDX quantification at reasonable processing times. GPU approach?

194 views
Skip to first unread message

Marco Andres Acevedo Zamora

unread,
Mar 1, 2020, 11:29:11 AM3/1/20
to hyperspy-users

Dear all,


My name is Marco, I am doing a PhD in Geology at Trinity College Dublin (acev...@tcd.ie). It has been an intensive 1 year and a half of progress working with a large database and MatLab data management and image analysis tools that I have newly developed for researchers working with Big Data from mapping experiments of rock samples. I know it is unusual to hear a geologist asking for deconvolution issues but I have chosen this mailing list (referred by Joshua Taillon) and I am seeking to collaborate with Computational Microscopy efforts as I am open-source as well.


I was working with the proprietary software AZtec (Oxford Instruments) doing ‘auto PhaseMaps’ and 'QuantMaps' (elements wt.%). Acquiring large areas of rock sections (cm2) is very computationally expensive and using a binning factor (b.f.) of 1 for the calculations would take 2 weeks for offline processing of an EDX map, which is unpractical for working in a research laboratory. So, I have been using b.f. = 4 (sharply decreasing the spatial resolution). This approach has shortfalls but it is the only option nowadays. Thus, if I want to scale up my work and do 4 or 5 thin sections (samples), I need to increase computing power (either by hardware or software solutions). A few details, to follow up:

  • For obtaining QuantMap acquisition using a binning factor of 1, which is means deconvolving 1 pixel composition from 1 pixel spectra representing a single interaction volume, I need high performance computing. So far, I have only achieved 24-hour processing at a binning factor of 4 (averaging 42 = 16 neighboring pixels) and downscaling to 1 (i.e.: 4 μm/px) is estimated to take around 2 weeks for completion (not feasible).
  • Further justifying wt. % accuracy, confidence levels and detection limits will require insight into AZtec software (Oxford Instruments) data structures and patented algorithms that work as a ´black box´ for front-end users.
  • Additionally, proprietary software (in general) is dedicated to run in a single PC central processing unit (CPU). This implies that if you want quicker results, you need to upgrade whole PC and not spare parts (RAM, GPU), which is very expensive nowadays considering Moore's law decline in the last 7 years (e.g.: watch any of NVIDIA CEO Jensen Huang talks).
In this context, I made the following questions to Joshua Taillon (Nov 30, 2019):
  • Should I recommend acquiring a high performance PC to run AZtec (Oxford Instruments) in my laboratory? Talking about 10 thousand EUR and coming in at the beginning of 2021.
  • Or, should I spend research time figuring out how to use HyperSpy to do the same iterative calculations of pixel wt.%? Will this give me the option to do parallel computing and speed up?

In his reply (Dec 12, 2019), he mentioned that HyperSpy API software can do quantification from an spectral image (http://hyperspy.org/hyperspy-doc/current/user_guide/eds.html) like a AZtec SmartMap raw data. Despite not being user-friendly (Python), HyperSpy is open-source and portable and can be run on any hardware, whether that's an expensive PCs, or a rented out cluster on AWS, etc. In fact, some operations are parallelizable, and matrix operations in general can be fast, yet HyperSpy is not specifically optimized to be a super-fast analysis tool. In addition, it enables doing analyses that would be very difficult (or impossible) in most vendors' software. Finally, if you are doing research (i.e. varying samples, processes, etc. rather than doing the same process every day) I am of the strong opinion that "doing it yourself" with respect to the analysis will give you more confidence in your results.


Next, thinking about the expensive option of continue working with AZtec and acquiring a high performance PC, I needed PC benchmarking. Unfortunately, AZtec performance with QuantMaps has been rarely compared and you can only trust vendor's PC recommendations and semi-quantitative image array outputs. The best approximation of which hardware would perform best has been given by benchmarking with the automated mineralogy system TIMA 3 from TESCAN, in a personal communication with a member of their research group here is his reply:
  • Tomáš Hrstka (January 16, 2020): "I did a lot of internal testing on speed and performance and there is a number of factors affecting that. TIMA definitely benefits from big RAM (128GB or more). PCI/M.2. SSD disks with 3000MB/s or more make a big difference compare to classical SSD at 500MB/s. Running the system at one such PCI/M.2.SSD and the hot data on another such PCI/M.2.SSD is ideal. TIMA, unfortunately, does not take the benefit of GPU computing. (I assume GPU will be important for your other applications). Faster CPU you can get the better. Many processes in TIMA are optimized for multithreading but not all, so you want multiple cores, but still decent speed at a single core (e.g. intel Core i9-9900KS @ 4.00GHz). I can only assume that other EDS software will potentially behave similarly. If you have AZtec in you lab you can easily run some TaskManager performance tests to see if multithreading actually works or if GPU is utilized during QuantMap..."
Hence, considering that pixel processing in a GPU would be cheaper and orders of magnitude faster than CPU, I have lately learned more about HyperSpy methods and done a comprehensive literature review. In brief, I am proposing the following steps:
  • 0.- Create the phase map in HyperSpy (e.g.: https://pages.nist.gov/2019-06_CCEM_presentation/#/27). 1.- Export the summed spectra for 1 mineral mask to DTSA-II. 2.- Perform a theoretical calculation and analytical simulation (equal conditions to real-life measurements in the MIRA3). 3.- Export the used k-factors (is that possible?). 4.- Parse the files for the k-factor matrices for the X-ray lines I need. 5.- Create and order a list in alphabetic order. 6.- Use a rather simple HyperSpy ‘Cliff-Lorimer’ quantification for background-subtracted intensities. 7.- Repeat for all the pixels of mask. 8.- Join all the masks’ pixels into one fully quantitative map (boundary pixels would be NaN values from step 0.-).
To sum up, Geological applications using SEM-EDX maps require to scale up and speed up offline processing to a quantitative level with pixel-wise information (in contrast with grain-wise QUEMSCAN, TIMA and MLA systems). Achieving it will have an enormous impact on the study of minerals looking forward their economic extraction (Mineral Processing) or genetic interpretation on the context of more reproducible and representative studies (whole-slides and not ROI areas). This must happen in a context where EDX-SDD detectors are becoming 10x faster and gaining sensibility in a few years.

As I said at the beginning, I have already automated the image analysis in MatLab, my results were presented in a recent conference in LTU, Lulea, Sweden. Yet, I need to carry on with my PhD and start developing a software platform Virtual Microscope (http://geologyslider.com/) that will host the maps and time will not be enough. Thus, I would be looking forward to know if you or your group could find potential in my approach or can test it. These developments are mutually independent but could have synergy later on, so don't hesitate to contact me.

Thank you :).






Josh Taillon

unread,
Mar 1, 2020, 1:55:14 PM3/1/20
to hypersp...@googlegroups.com
Hi Marco,

Thanks for your message to the HyperSpy User's group. I never received a reply from you to my original response a few months ago, so I'm glad to hear you're still considering using HyperSpy in your work. 

It's unclear to me from your message what you are asking for from this user group. This list is primarily designed to be used for software-specific discussion, rather than a forum for general research and collaboration questions. You've presented a fairly thorough plan, but I'm afraid this list (in my opinion) is not the right venue to discuss it and you'll find that most users are otherwise pre-occupied with their own research projects. Most of us either use HyperSpy in their work, or develop features for/maintain it on a volunteer basis. Now, if you are interested in developing additional functionality for HyperSpy (such as GPU-accelerated model fitting or quantification), we would be more than happy to help you with how to suggest these features be added to HyperSpy, and I would point you to the developer's guide for more detail on how to get started.

Finally, I would kindly ask that if you are going to quote someone in a public forum that you not mix your personal opinions with quoted text from an email conversation, as you have done in your recap of our email exchange. In my message to you on Dec. 3 (not Dec. 12, as you stated), I never wrote or meant to imply that HyperSpy was not "user-friendly" as you said; I stated that it would likely take you additional time to learn how to use HyperSpy because it is written in Python rather than the Matlab environment you said you were most familiar with. In fact, I believe the exact opposite and consider it very user-friendly, since Python is one of the easier languages to learn, and HyperSpy provides a very comfortable and easy-to-use programming interface to complex high-dimensional datasets. It also allows you to obtain results you have strong confidence in, which is more difficult in other "black box" vendor softwares.

- Josh

--
You received this message because you are subscribed to the Google Groups "hyperspy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hyperspy-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hyperspy-users/ad6cefa6-37fb-4f96-8860-fe853dc7dcd8%40googlegroups.com.


--
Joshua Taillon

Marco Andres Acevedo Zamora

unread,
Mar 1, 2020, 3:36:26 PM3/1/20
to hypersp...@googlegroups.com
Dear Joshua T.,

I have been busy writing reports and manuscripts and now I have switched from doing HyperSpy to start working on this pilot virtual microscope. After the literature review, it is clear that any HyperSpy implementation would be a singular PhD, not part of a PhD project. So, I submitted this post for getting feedback. 

I see your point. I will edit my post right now for clarification. I am so sorry for that misinterpretation. From a geological standpoint, even Python is extremely complex. I am dealing with 99% black box issues right now for all my postgrad colleagues and collaborators. So, in a sense, it is not comparable to a Material Science or Eng. background population. That only makes your community more interesting for building bridges.

Once I am done with the virtual microscope, I will tackle the tile image distortion problem in the SEM (time drifts and non-rigid transformation models required). The proposed HyperSpy workaround fits at the end of this, when we want to do correlation microscopy or automatic mineral characterization from the images. So, it is a third level priority but data limitations will be sequentially clearer as I go on and demonstrate these problems in Geology.

By the way, NIST is fantastic :).

Cordially,
Marco
  



--

Marco A. Acevedo Zamora

Tlf.: +353 85 744 5696

Email: acev...@tcd.ie

Department of Geology - Museum Building
Trinity College Dublin (TCD)
College Green
Dublin 2, Ireland




Reply all
Reply to author
Forward
0 new messages