Questions about interpretation of ICA

58 views
Skip to first unread message

Joshua Taillon

unread,
Dec 30, 2014, 6:24:15 PM12/30/14
to hypersp...@googlegroups.com

Here's hoping that another knowledgeable HyperSpy user may be able to help assist my understanding of some PCA and ICA results that I have obtained. I am trying to more or less follow the procedure in this paper by the HyperSpy authors. My goal is to localize and quantify the segregation of some cations (La and Mn) at a grain boundary and detect any differences in their fine structure. 

I believe the results I have obtained are valid (there doesn't seem to be much residual between the model and the data, other than noise), but I am having some trouble interpreting the results. Please forgive the length as I explain through what I've done. 

Here is the survey image. It is the boundary of three grains, and I have seen evidence of La and Mn at the boundary between these three grains.

          


















Also shown is an example of what the EELS core-loss spectra looks like from the area between grains (this is the PCA-denoised spectra). As you can see, there's an O-K, Mn-L and La-M signal that are pretty strong. I have done the PCA and BSS using the standard settings with the following commands:

si_EELS.decomposition(True)
si_EELS.plot_explained_variance_ratio()
si_EELS.blind_source_separation(6)
si_EELS_PCA = si_EELS.get_decomposition_model(6)

The output from these commands looks as follows:

The Scree plot seems to show 6 components that are important, and those are shown here, along with their loadings. I understand that the components can be negative, since they are arbitrarily flipped, so that much doesn't concern me. The signals also all appear to be mixed, which makes sense because no BSS has been done yet.


What is confusing me a bit more is the results of the ICA procedure. From my understanding, the components that are found should be physically relevant, and should be more or less orthogonal in the energy dimension. My confusion lies from signals like those seen in components 0 and 2 below. In these, the components are all positive, but they have "negative" edges. If ICA component 2 appears to be a "non-interface" signal, since its loadings are strongest within the bulk of the grains. My question is related to these negative edges. Do they make physical sense, and could they be expected from an ICA analysis? There are no signals like this in the paper that I referenced before, and I am not sure of what physical meaning they might have. Likewise, none of the 6 components appear to represent a "pure background" signal. I get the sense that this is worrisome because one of the components should be just the power-law background. ICA component 0 seems close, but has a significant signal at the O-K edge and a negative peak at Mn-L. I'm hoping that someone knowledgeable in this field may be able to help explain the results that I'm seeing.


Finally, I also have a general question about the ICA procedure. When I run the `si_EELS.blind_source_separation(6)` command, I often see different results. There are messages about different components being reversed, sometimes it complains that the analysis didn't converge (I think this only happens on the first run), and often the loadings will appear different based on the results of the run. Is this expected, and is there any way to improve the repeatability?

Thanks very much in advance!
- Josh 

Josh Taillon

unread,
Jan 5, 2015, 11:52:45 AM1/5/15
to Michael Sarahan, hypersp...@googlegroups.com
Michael,

Thank you very much for your response. I see what you mean about different algorithms. I wrote a test script to try all the different algorithms that makes it easier to compare (see https://github.com/hyperspy/hyperspy/pull/415). In this case, it seems that doing background subtraction first, then using the 'TDSEP' algorithm gets closest to "physical" (look like real) signals, and allows the easiest interpretation. Might I add that it could be very useful to be able to save multiple PCA/BSS results in one data file, rather than one per file, especially when trying to compare. As is, I'm just generating one .hdf5 file per ICA.

I'm going to try my script with different PCA techniques as well to investigate what kind of differences that gives.

Thanks very much,
Josh

On Tue Dec 30 2014 at 9:19:55 PM Michael Sarahan <msar...@gmail.com> wrote:
Hello Josh,

I have only experience, not so much knowledge to speak from.  Take this with a grain of salt.

1) Both PCA and ICA are linear techniques.  With non-linear signals, such as the background, both PCA and ICA will get confused.  I've seen both try to compensate for the background (among other things) by making negative peaks like you see here.  This will especially be a problem for samples that are less uniform in thickness, or vary widely in how much background they have for other physical reasons (not much mass in the boundary?)  I have also seen negative peaks account for fine structure features and chemical shift.  These are more apparent when the components are scaled and overlaid on top of one another.
2) There are many different BSS algorithms, and some work better than others on any given data set.  Check out the HyperSpy documentation to see which ones it supports.
3) Though generally the number of components indicated by the Scree plot works well for choosing the number of components for BSS, you should not feel bound by this.  Choose fewer components and see what happens.  Unlike PCA, BSS isn't discarding components, but rather unmixing the total data into that number of sources.
4) PCA and ICA are completely qualitative and exploratory.  Do not treat them as "physically meaningful" unto themselves.  Rather, use them as guides that highlight signal features that you may want to study.  Next, proceed with model-based approaches that are more grounded in physics theory as to what feature represents what phenomenon.
5) Repeatability or lack thereof is an unfortunate property of BSS.  I never discovered a solid way to force stable solutions.  This is yet more reason to treat PCA/ICA as exploratory.  Also, make sure you're saving the outputs of any such analysis, since you may not get back to exactly what you had.

Hope this helps.
Michael

--
You received this message because you are subscribed to the Google Groups "hyperspy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hyperspy-user...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Francisco

unread,
Jan 6, 2015, 5:39:25 AM1/6/15
to hypersp...@googlegroups.com
Josh,

The HyperSpy mailing is to discuss software issues not scientific ones. Of course you couldn't know this, so I have just added a welcome message to the mailing list to clarify this point. For scientific discussions we encourage you to contact the authors of the article directly.

Best wishes,

Francisco
Reply all
Reply to author
Forward
0 new messages