Meaning of "inf" for selected_sources_te and omnibus_te

53 views
Skip to first unread message

Valentin RINEAU

unread,
Feb 17, 2021, 5:19:12 AM2/17/21
to IDTxl
Hi everyone,
First of all, thank you for this great software!

Sometimes, I have "inf" instead of a float for selected_sources_te and for omnibus_te.
What is its meaning? And why there is never "inf" in "te" (example below)?

sources_tested : [0, 2, 3, 4, 5]
current_value : (1, 5)
selected_vars_target : [(1, 1)]
selected_vars_sources : [(0, 4), (0, 5), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5)]
selected_sources_pval : [0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002
 0.002 0.002]
selected_sources_te : [inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf
 inf inf inf inf inf inf inf inf]
omnibus_te : inf
omnibus_pval : 0.002
omnibus_sign : True
te : [0.00995587 0.12725192 0.05193862 0.15181107 0.06241195]

p.wol...@gmail.com

unread,
Feb 22, 2021, 3:56:34 AM2/22/21
to IDTxl
Hi Valentin,

could you provides some background on the analyses you are running and the data you are using? Without some more info it's hard to tell what is going on.

Thanks,
Patricia

Valentin RINEAU

unread,
Feb 23, 2021, 7:24:51 AM2/23/21
to IDTxl
I am trying to analyse time series of diversity of several fossil groups during the phanerozoic, extinction rates, origination rates, as well as time series of several environmental variables (temperature, carbon isotope, etc.) to detect causalities. All these time series have been detrended by diff, standardised and normalised, so that they are stationary.
I wonder if it could because of very short time series (20 points for the shortest ones)?
I'm using the gaussian estimator for multivariate TE.
Please tell me if you need any other information.

p.wol...@gmail.com

unread,
Mar 2, 2021, 4:57:02 AM3/2/21
to IDTxl
One thing that comes to mind is that, by default, IDTxl standardizes the data when you create a Data instance from it. If your data is already normalized/standardizes, try the following:

from idtxl.data import Data
d = np.arange(50000).reshape((1000, 10, 5))
dat = Data(d, dim_order='spr', normalise=False)  

And just as a comment, 20 samples is very little data and probably too short for TE estimation. I am not sure, you will get away with such a small number of samples with either of the estimators in the toolbox. Could you maybe exclude these very short time series from the analysis?

Valentin RINEAU

unread,
Mar 3, 2021, 3:05:10 AM3/3/21
to IDTxl
Indeed my datasets are already normalized. I ran the analyses with normalise=False but the inf are still there.
And yes, my smallest datasets doesn't show anything significant when adding fdr, so they will be subsequently removed. DO you have an global idea of a minimal number of samples for TE estimation?

p.wol...@gmail.com

unread,
Mar 4, 2021, 10:10:10 AM3/4/21
to IDTxl
It is difficult to give a general recommendation on how many samples one needs. This also largely depends on the strength of the signal and dependencies, and also the quality of the recording. I have mainly worked with neuroscience data, where we usually try to have several hundreds, but preferably thousands of samples per time series. What is maybe important to know is that in general, we are rather confident that the algorithm does not return false positives if there is not enough data and that it will just not return any results if data are too sparse (see the evaluation paper by Leo Novelli).  So, I would expect that you don't find significant relationships for time series with too few samples.
Reply all
Reply to author
Forward
0 new messages