Transfer Entropy and Decision Procedures

30 views
Skip to first unread message

Martin Monperrus

unread,
Mar 15, 2021, 7:35:57 AM3/15/21
to Java Information Dynamics Toolkit (JIDT) discussion
Hi Joe, all,

Thanks a lot for JIDT, it's an impressive piece of software.

We already get meaningful results and good inferred networks.

The next step is to synthesize decision procedures: given that JIDT identifies a causal effect between two processes X and Y, what action must be taken on X to have an influence on Y.

For example, naive decision procedures could be:
- a threshold on X as "if X_{n} > 10: do something"
- a condition on the derivative, as "if X_{n} - X_{n-1} > 10: do something"

AFAIU, this topic is not discussed in your "An Introduction to Transfer Entropy", and I don't find anything on the Internet, maybe because of the wrong keywords.

What approaches would you recommend to synthesize decision procedures once you have identified transfer between two processes?

Thanks!

Best regards,

--Martin


Joseph Lizier

unread,
Mar 15, 2021, 7:43:27 AM3/15/21
to JIDT Discussion list
Hi Martin,

Glad to hear that JIDT is running well for you!

What you're getting at is the big question of causal effect. This is quite distinct from questions of information attribution and prediction, which transfer entropy and other traditional information theoretic measures address. Think of this as like correlation vs causation, and I would be careful to avoid saying that the measures in JIDT identify a causal effect; rather, they identify an effective model to explain the dynamics as observed.

I would recommend that you dive into the literature on interventional probabilities, i.e. p(y | \hat{x}) where \hat{x} is a value I impose on x rather than observe such a value (which would be the ordinary p(y | x) ). This area was pioneered by Judea Pearl, and sounds to be where you are headed. We compared transfer entropy to related measures of causal effect in our 2010 paper "Differentiating information transfer and causal effect"  - http://dx.doi.org/10.1140/epjb/e2010-00034-5 - and have references to some of that literature there.

I hope that helps and good luck!
--joe
+61 408 186 901 (Au mobile)



--
JIDT homepage: https://github.com/jlizier/jidt
---
You received this message because you are subscribed to the Google Groups "Java Information Dynamics Toolkit (JIDT) discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jidt-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/jidt-discuss/f876737c-868e-4f84-b33d-d14c059d73e2n%40googlegroups.com.

Martin Monperrus

unread,
Mar 15, 2021, 8:20:40 AM3/15/21
to Java Information Dynamics Toolkit (JIDT) discussion
> We compared transfer entropy to related measures of causal effect in our 2010 paper "Differentiating information transfer and causal effect"

Thanks a lot for the pointer, I'll start from there!

--Martin

Martin Monperrus

unread,
Mar 17, 2021, 12:50:56 PM3/17/21
to Java Information Dynamics Toolkit (JIDT) discussion
Hi Joe,

My reading on causality brings me to convergent cross mapping (CCM) [1].

CCM is not implemented in JIDT and does not seem to be discussed in this group.

How does CCM relate to JIDT to your opinion?

Thanks!

--Martin

Joseph Lizier

unread,
Mar 17, 2021, 10:34:01 PM3/17/21
to JIDT Discussion list
Hi Martin,

To be honest, since CCM is not information theoretic it's basically out of scope for JIDT.

In terms of how it compares to info theoretic tools, I haven't looked particularly closely at CCM, but my understanding is that it seeks to build a model of how the next value of the target time series variable depends not only on it's own past (the embedded past) but also an embedding of the source.
This is similar to how Granger causality calculates the predictive effect of the source on the target, or transfer entropy more generally. CCM looks to be only linear (?), and therefore more comparable to GC only, but more importantly it doesn't seem to characterise the relationship with a single number but outputs the model only. (I could be wrong on that). Also, CCM looks to be looking at no time lag between the source and target (save for their embedded histories) and so may be more comparable to "directed information" than GC/TE.

If anyone out there knows more about CCM, feel free to add and/or correct me.

Hope that helps,
--joe
+61 408 186 901 (Au mobile)


Martin Monperrus

unread,
Mar 18, 2021, 4:07:39 AM3/18/21
to Java Information Dynamics Toolkit (JIDT) discussion
Thanks a lot Joe for your expert opinion. --Martin

Joshua Garland

unread,
Mar 18, 2021, 7:12:13 PM3/18/21
to jidt-d...@googlegroups.com
Fundamentally CCM is based on the idea that if you observe the same (deterministic dynamical) system with two continuous (generic) observation functions and then embed them using delay-coordinate embedding then there is a lot of math theory that tells you the two reconstructed embeddings or models will be diffeomorphic, i.e., there will be a differentiable bijective correspondence between them ( plus a little more). Said more simply the two models will be topologically identical and will have the same dynamical invariants as the original dynamical system. If the two embeddings are in fact from the same system (you didn't actually measure two different uncoupled ecosystems for example) and you can predict one's future from the others past then that tells you something (maybe) about the causal relationship between the two. It at least tells you they are related and you didn't measure to uncoupled systems. FYI, I don't agree with this take on the diffeomorphism but that is the idea. There are obviously more details here that I am brushing under the rug but hopefully this helps some. 

Also, we talk a bit about the connection between CCM and info theory in terms of predictability in this paper. (https://esajournals.onlinelibrary.wiley.com/doi/full/10.1002/ecm.1359) Also, if you end up going down the CCM route for some reason and want a method to select the DCE parameters optimally for foreasting I suggest looking into active information storage. Here is a paper that does this exactly (https://journals.aps.org/pre/abstract/10.1103/PhysRevE.93.022221) Additionally, Joe also has a lot of great papers on active information storage. Furthermore, the JIDT documentation has an intro to info theory that talks quite a bit about AIS and is really well written and easy to digest. Honestly probably the best intro to many info theory topics I have ever read and I refer to it all the time. Best of luck. 

Oh and by the way I believe CCM would be nonlinear as the embeddings and diffeomorphisms are all nonlinear as well as the forecast function. (I could also be wrong though)

Joshua



--
Joshua Garland, Ph.D.
Applied Complexity Fellow, Santa Fe Institute
CEO and Founder, Complexity Analytics LLC

Martin Monperrus

unread,
Mar 19, 2021, 2:32:47 AM3/19/21
to Java Information Dynamics Toolkit (JIDT) discussion
Thanks Joshua for the additional pointers. --Martin

Joseph Lizier

unread,
Mar 21, 2021, 8:21:43 PM3/21/21
to JIDT Discussion list
Thanks for the detailed info Josh and the nice comments on the documentation! I didn't know you were on the list, great to have you :)

To add to that -- Josh's method to select embedding parameters using AIS (which could then be applied to CCE) referred to below is implemented as our "MAX_CORR_AIS" option for the "AUTO_EMBED_METHOD" property of the AIS estimators (KSG and linear-Gaussian) in JIDT. See more details and a demo using this in slides 7-14 of the information storage lecture slides (and corresponding YouTube video) of the associated course materials. (The course is still in beta mode - available at http://lizier.me/joseph/software/jidt/course/  but I haven't really publicly launched it here yet ....)

--joe
+61 408 186 901 (Au mobile)


Reply all
Reply to author
Forward
0 new messages