Why does JidtDiscreteCMI take so much longer than JidtDiscreteTE

Thomas Varley

unread,

Sep 11, 2021, 3:31:45 PM9/11/21

to IDTxl

Why does the JidtDiscreteCMI estimator take so much longer than the JidtDiscreteTE estimator? I can use the TE estimator to quickly get a value, but if I build the same bivariate TE out of CMI it takes 50x longer (roughly).

te_settings = {"history_target" : 1,
               "history_source" : 1,
               "source_target_delay" : 1}

raster = raster[:,:3600000]
data = Data(raster, dim_order="ps")

te = JidtDiscreteTE(settings=te_settings).estimate(raster[i], raster[j])

target_future = raster[j,1:]
source_past = raster[i,:-1]
target_past = raster[j,:-1]

estimator = JidtDiscreteCMI()

cmi = estimator.estimate(var1 = target_future,
                         var2 = source_past,
                         conditional = target_past)

te and cmi are the same value here which checks out.

I had been hoping to use the CMI estimator to build a DIY mTE estimator w/ analytic nulls to save time on a very large spiking network since it doesn't look like the JidtDiscreteTE.estimate() function takes conditionals.

Thomas Varley

unread,

Sep 11, 2021, 3:39:39 PM9/11/21

to IDTxl

Also, perhaps I am mis-reading the documentation, but for higher source_target_delay values the CMI estimate and the DiscreteTE estimator diverge. For example, this code does not return the same value for cmi and te (although it worked above when the offsets were 1).

te_settings = {"history_target" : 1,
"history_source" : 1,

"source_target_delay" : 3}

raster = raster[:,:3600000]
data = Data(raster, dim_order="ps")

te = JidtDiscreteTE(settings=te_settings).estimate(raster[i], raster[j])

target_future = raster[j,3:]
source_past = raster[i,:-3]
target_past = raster[j,:-3]

estimator = JidtDiscreteCMI()

cmi = JidtDiscreteCMI().estimate(var1 = target_future,

var2 = source_past,
conditional = target_past)

Perhaps I am just missunderstanding the parameters.

Joseph Lizier

unread,

Sep 12, 2021, 7:54:11 PM9/12/21

to IDTxl

Hi Thomas,

I did put some optimisations into the underlying JIDT TE estimator (e.g. updating the embedding rather than re-reading the whole lot), but they should only make a real difference for large history length (and not 50x difference for k=1).

I've run something similar with 10^7 samples using JIDT directly, and there's no discernable performance difference.

Possibly it's in the preparsing in IDTxl, I'll have to come back to that.

In any case, re taking conditionals you have a couple of options depending on what it is you're trying to do. The underlying JIDT estimators do provide a ConditionalTransferEntropyCalculatorDiscrete estimator, but we haven't put a wrapper on them for IDTxl (you could use them directly though). (You're also able to use the JidtDiscreteTE estimator in IDTxl with a multivariate source, though I guess that doesn't help if you're wanting to only compute a conditional value).

If you're trying to build the parent set, you can turn on analytic nulls for the discrete estimators in IDTxl also, you don't need to add other code for that - indeed I think they're on by default.

Does that help? Maybe you can describe more directly what you're trying to do if not

Re the use of larger delays: the target past should always be only one lag behind the target next, regardless of the source-target lag that you use (we discuss the theory behind this here). So you'll need to adjust target_past to something like raster[j,2:-1] (though my python's not so great so think that through).

Hope that helps,

--joe

--
You received this message because you are subscribed to the Google Groups "IDTxl" group.
To unsubscribe from this group and stop receiving emails from it, send an email to idtxl+un...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/idtxl/f5a3069e-28f5-4687-b2ba-0a88f36a13c9n%40googlegroups.com.

Thomas Varley

unread,

Sep 13, 2021, 10:46:09 AM9/13/21

to IDTxl

Hi Joe, thanks for the speedy reply -

I'll check out working w/ JIDT directly, I'm thinking it might be more efficient since there isn't the intermediate IDTxl layer to to through.

As for inferring the parent set, I would love it if it were possible to (knowingly) toggle on the null models for mTE, although I'm not sure it's on by default. There's nothing about it in the documentation for the multivariate_te function (https://pwollstadt.github.io/IDTxl/html/_modules/idtxl/multivariate_te.html - I did a Ctrl-F for "analytic" and "null") and it seems like the MultivariateTE class defaults to using a brute-force null based on swapping different realizations around. I can switch it to a permute-in-time null but there's nothing about the analytic null.

All the best
~Thomas

Reply all

Reply to author

Forward