Is KSG estimator deterministic?

Hiroshi Ashikaga

unread,

Oct 21, 2015, 3:04:14 PM10/21/15

to jidt-d...@googlegroups.com

Hi, Joe

Thank you again for providing JIDT. It’s a wonderful tool and I love it! I have a quick question which I hope you could help me with.

Is your Kraskov-Stögbauer-Grassberger (KSG) estimator algorithm deterministic? I noticed that I got slightly different answers every time I run it on the same data set. So I repeated the same calculation 10,000 times using synthetic time series (5,000 time points) and created histogram plots of KSG-mutual information and KSG-transfer entropy (attached). For transfer entropy, the embedding parameters are fixed with k=l=2.

Based on the histogram, clearly the calculation is not deterministic. The variability is mainly in the order of 10^-3, which doesn’t have much impact on mutual information since the absolute values are in the order of 10^0 (6 in this case). In contrast, the variability of this range for transfer entropy is a critical issue since the absolute values are in the same order (= 10^-3).

I thought KSG estimator was deterministic, or am I wrong?. Where does this variability come from? Is there anything I can do to make it more precise, or perhaps do you recommend repeating the same calculation many times (like the attached samples) to obtain the median value as a representative “true” estimation?

Thank you!

Hiroshi Ashikaga, MD, PhD

Johns Hopkins University School of Medicine

hash...@jhmi.edu

http://www.hiroshiashikaga.org/

MI.png

TE.png

Joseph Lizier

unread,

Oct 21, 2015, 6:32:54 PM10/21/15

to JIDT Discussion list

Dear Hiroshi,

Thanks very much for the detailed report (how good would it be if all bug reports came with graphs!), and your nice compliments about the toolkit.

The answer is that the implementation of the estimator is not deterministic.
The main part of the algorithm itself is deterministic. However, Kraskov et al. recommend adding a small amount of noise to the data (on the order of 1e-8) to ensure that numerical rounding has not led to two data points having the precise same value on any given dimension. This is important because the underlying assumptions of the algorithm include that the data is well spread throughout the space.

You can control the amount of noise we add (or turn it off) via the parameters MutualInfoCalculatorMultiVariateKraskov.PROP_ADD_NOISE and ConditionalMutualInfoCalculatorMultiVariateKraskov.PROP_ADD_NOISE (for transfer entropy) -- the text value of these properties is "NOISE_LEVEL_TO_ADD". The description of this property from the setProperties() methods is:

"a standard deviation for an amount of random Gaussian noise to add to each variable, to avoid having neighbourhoods with artificially large counts. (We also accept "false" to indicate "0".) The amount is added in after any normalisation, so can be considered as a number of standard deviations of the data. (Recommended by Kraskov. MILCA uses 1e-8; but adds in a random amount of noise in [0,noiseLevel) ). Default 1e-8 to match the noise order in MILCA toolkit."

So as you can see, you can set it to "false" or "0" to turn adding noise off. Be cautious to only do this if you are sure that all your data values are distinct (to machine precision).

In my (subjective) opinion, the amount of variability introduced by the noise addition is generally below the variability of the estimation, and so is not worth worrying about. (Recall that your MI/TE estimation is from a finite amount of data anyway, so will include some variability -- e.g. if you regenerate your 5000 point time-series and compute MI/TE again, even with noise addition turned off, you will get a spread of answers). As you say, when you have a significant MI or TE it is well below the order of magnitude of the calculation. I would still say not to worry when your TE is of similar order -- this is simply because your TE seems then to be within the noise floor anyway, and the noise addition in the KSG estimator is only introducing fluctuations within this. (Recall that two time-series that are unrelated will still produce a non-zero TE, due to finite sample size. There will be a distribution of values you obtain here that is close to zero. You can investigate this null distribution by calling the computeSignificance() method. I'm calling this null distribution the "noise floor"). If this worries you, of course you can call the estimator multiple times and average it, but as I say, having multiple realisations of your time-series would be more important than this anyway.

I hope that helps -- let me know if you need any further clarification.

--joe
+61 408 186 901 (Au mobile)

--
JIDT homepage: http://code.google.com/p/information-dynamics-toolkit/
---
You received this message because you are subscribed to the Google Groups "Java Information Dynamics Toolkit (JIDT) discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jidt-discuss...@googlegroups.com.
To post to this group, send email to jidt-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/jidt-discuss.
To view this discussion on the web visit https://groups.google.com/d/msgid/jidt-discuss/D24D5453.1D23F%25hashika1%40jhmi.edu.
For more options, visit https://groups.google.com/d/optout.

Martínez Mediano, Pedro A

unread,

Oct 21, 2015, 7:42:21 PM10/21/15

to jidt-d...@googlegroups.com

Hi Hiroshi,

That's a very good question. As you say, the "vanilla" KSG algorithm is deterministic, since both neighbour counting and the digamma function are deterministic. However, the plain KSG algorithm is fairly unstable. Under certain conditions you can get in serious numerical trouble --- particularly if you have repeated data points.

For this reason, since version 1.3 JIDT by default adds some random jitter to the data points to avoid these numerical issues. By default JIDT normalises the data to have zero mean and unit variance and then adds white Gaussian noise with standard deviation 10^-8, which is the value recommended by KSG in their original paper. At the expense of a small variability in the results, this makes the algorithm much more numerically robust.

You can explicitly turn off or adjust the noise via setProperty() using the property "NOISE_LEVEL_TO_ADD", which represents the standard deviation of the white noise added to the normalised data. If you set that to 0 you should always get the same result on the same dataset.

And as you say, all of this is ok for your MI calculation, but there's an issue with your TE values. Since you're calculating TE from your synthetic data, can you calculate the true TE? It might be the case that the true TE is too small to be accurately estimated from a time series of that size. Typically, a noise of 10^-8 really shouldn't affect much the estimation (unless you have repeated values).

I've put up a small ipython notebook with the example of the MI calculation between two weakly correlated Gaussians, please find it attached. I haven't got clear-cut results, but removing the noise from KSG doesn't seem to bring it significantly closer to the true values, probably because at that level of interaction between the variables the noise involved in generating the data is larger than the noise introduced by the KSG.

Cheers,
Pedro

From: jidt-d...@googlegroups.com [jidt-d...@googlegroups.com] on behalf of Hiroshi Ashikaga [hash...@jhmi.edu]
Sent: Wednesday, October 21, 2015 8:03 PM
To: jidt-d...@googlegroups.com
Subject: [jidt-discuss] Is KSG estimator deterministic?

--

WeakCorrelationMIExample.ipynb

Joseph Lizier

unread,

Oct 22, 2015, 6:44:58 AM10/22/15

to JIDT Discussion list

Thanks Pedro, glad to see we're on the same page :)

--joe
+61 408 186 901 (Au mobile)

To view this discussion on the web visit https://groups.google.com/d/msgid/jidt-discuss/3238E39BDF336E429718A9A3E473FC73182F02E4%40icexch-m2.ic.ac.uk.

Hiroshi Ashikaga

unread,

Oct 22, 2015, 12:10:34 PM10/22/15

to jidt-d...@googlegroups.com

Hi, Joe and Pedro

Thanks so much for your input. I am glad I was not seeing something what I am not supposed to see.

I followed your instructions, removed the added noise, and repeated the same 10,000 calculations for both mutual information and transfer entropy. The results are attached. Clearly the result is reproducible, showing 10,000 identical values for both mutual information and transfer entropy. I also repeated another round of 10,000 calculations after I “clear all” the variables in the work space (I am using MATLAB) and still got the same result from the previous round.

Now that the variability of the results is gone, the question remains whether these values represent “true” values or not. For example, mutual information and transfer entropy calculated above with no added noise are neither the mean or the median of the histograms calculated with default added noise (the plots in my previous email). Or is this question meaningless since it is impossible to calculate the “true” vales of mutual information and transfer entropy from a relatively short time series of 5,000 sample points?

Thanks!

Hiroshi

Hiroshi Ashikaga, MD, PhD

Johns Hopkins University School of Medicine

hash...@jhmi.edu

http://www.hiroshiashikaga.org/

To view this discussion on the web visit https://groups.google.com/d/msgid/jidt-discuss/CAL81BvRAObEKQik6MJyeqt9%2B-EJ-z%2BhtKQCVDzPt7iMMxQnCzQ%40mail.gmail.com.

MI_no_added_noise.png

TE_no_added_noise.png

Martínez Mediano, Pedro A

unread,

Oct 22, 2015, 3:03:05 PM10/22/15

to jidt-d...@googlegroups.com

Hi Hiroshi,

Good to see that it worked as expected!

You're right, you can never "know" that you've hit the "true" value of the MI/TE you're estimating. But this is not a KSG-specific thing, it happens with all statistical estimators --- you can never know that you've hit the true mean of a sample of Gaussian random numbers.

One thing you could try is to generate many time series, calculate the MI/TE for each of them and plot a similar histogram. I would suspect that you're still going to get a noticeable variability in the results (as long as the process you're using to generate the data is stochastic). If you want to get more insights into how does sample size affect this variability you could repeat the same procedure for longer time series. If you plot these histograms I would expect the variance of the estimations to decrease as the sample size gets larger.

Note that this is not only your problem. Talking about time series, how long is long enough? Knowing how many samples you need for a reliable estimation is a serious issue, particularly so in experimental contexts. Some methods have been proposed, but it remains a non-trivial problem in general.

Cheers,
Pedro

From: jidt-d...@googlegroups.com [jidt-d...@googlegroups.com] on behalf of Hiroshi Ashikaga [hash...@jhmi.edu]

Sent: Thursday, October 22, 2015 5:10 PM

To view this discussion on the web visit https://groups.google.com/d/msgid/jidt-discuss/D24E7F60.1D2FF%25hashika1%40jhmi.edu.

Hiroshi Ashikaga

unread,

Oct 22, 2015, 8:53:37 PM10/22/15

to jidt-d...@googlegroups.com

Hi, Pedro

Great. Thank you for the thoughtful answer. Very helpful.

Hiroshi

Hiroshi Ashikaga, MD, PhD

Johns Hopkins University School of Medicine

hash...@jhmi.edu

http://www.hiroshiashikaga.org/

To view this discussion on the web visit https://groups.google.com/d/msgid/jidt-discuss/3238E39BDF336E429718A9A3E473FC73182F0DD1%40icexch-m2.ic.ac.uk.

Reply all

Reply to author

Forward