First - since you're computing TE with the Gaussian estimator, forgetting whether you want it to give you the ground truth or not, you should expect it to give you basically the same thing as Granger Causality in the page you initially point to (since TE with linear Gaussian model is a scaled version of GC). If you print out your TE results for each pair, you will find they give you a match to what is shown in the left hand part of fig 4.6 in the text you point to. (I've verified this).
So, the estimators are doing what they've been asked to do here.
Next, if you want something to match causality, you've always got to be careful in expecting TE to give you a precise match to known causality, because they are simply not measuring the same thing (TE is returning results from a model). There's lots written about that, e.g. see Leo's
latest. Here though as you say we've got no hidden nodes etc, so our results should converge to this as we get enough data points.
So, to answer your question about improving the results, I would suggest the following:
1. Switch to the nonlinear estimator ('cmi_estimator': 'JidtKraskovCMI'). You're currently running with the linear Gaussian estimator for what is known to be a non linear interaction, so this will miss interactions in precisely the same way as that article shows GC does in comparison to nonlinear TE. That will take longer to run, but if you want the accurate result here you have no choice. You could try the discretised estimator as well; it will run faster and give you results akin to RTransferEntropy, but the KSG estimator is best of breed for nonlinear estimates.
2. Allow more data points from the past of the target to be embedded rather than just 1 (e.g. use 'max_lag_target': 5 for instance). The false positives that we see here (e.g. 2 -> 0) look to me to be classic false reverse inferences due to inadequate embedding on the target, which Michael originally pointed out in
this 2011 paper.
This will be required whether you switch to the nonlinear estimator or not.
3. Set the parameter for a Theiler window, which avoids false positive increases due to autocorrelations in the samples (this can lead to non-independent samples influencing the estimation). To do this, set e.g. 'theiler_t': 20, that should be enough. (I think this was definitely useful to remove one of the false positives).
When I run it with all of the above set, the results look much better; indeed all of the causal links are picked up, and the only false positive is 1 -> 4 (or 2 -> 5 in the articles' numbering). You can see that one coming through as well for RTransferEntropy with their discrete TE in Fig 4.6b of the article you point to. It's coming through specifically because nodes 1 and 3 are driven in a very similar nonlinear fashion by node 0, so node 1 will contain information about node 3 (the true driver of node 5), hence the false positive.
If you want to remove redundancies like that, you would need to move to the full multivariate algorithm. This is because a false positive that only comes through due to correlation with a stronger one (like the case of 1->4 above coming through because 1 is correlated to node 3) will be conditioned out by the stronger source.
I've run this at my end, with the same additional settings as I've detailed above (KSG estimator, max_lag_target = 5, theiler_t = 20), and can confirm that we get precisely the correct causal structure when we do. Network image is attached.