Hi all,
I have one specific and one general question regarding an analysis I am planning to do in JIDT and potentially IDTxl:
I would like to analyse tracking data of a locust swarm. We have a video of ~8 min with framerate 25 Hz. Throughout the video there are a few thousand locusts, but each individual locust appears in the video for only ~40 s, i.e. 1000 frames.
The overlap of two locusts in the video is usually a few hundred frames, i.e. the joint time series have a few hundred valid samples.
Sometimes there are missing values, so I load the data via .setObservations(double[][] source, double[][] destination, boolean[] sourceValid, boolean[] destValid).
The first and specific question is the following:
I run into different kinds of trouble when trying to use AutoEmbedding. I saw in a
previous post that data with missing values do not work with AutoEmbedding. Is this still the case? If yes, what would be your suggestion how to handle such data?
All problems only occur if I set K_SEARCH_MAX above a certain value (how large that is depends on the type of estimator and auto embedding):
For a MultiVariateGaussian estimator and MAX_CORR_AIS_DEST_ONLY I get an error in MatrixUtils.CholeskyDecomposition(): java.lang.Exception: CholeskyDecomposition is only performed on symmetric matrices.
For Gaussian and Kraskov estimators and both types of auto embedding I sometimes get a NegativeArraySizeException: -1 in MatrixUtils.makeDelayEmbeddingVector() while setting the observations and sometimes an ArrayIndexOutOfBoundsException during computation of the TE.
Second, more general question:
Is it a problem that there are different overlaps for every pair of time series and that every individual overlap only contains a few hundred samples? I would assume that if we can justify to average over different individuals (as Crosato et al., 2018 did), we would have enough samples but I am not entirely sure about that.
I would also like to infer the effective network with IDTxl and I'm currently waiting to get accepted to the IDTxl Google Group. Maybe you can already leave a short comment, whether you think that network inference would be feasible at all with IDTxl. Some things I could not find out:
- Can IDTxl handle data with missing values?
- What's the proper way to deal with these fragmented time series, where the whole video has 9,000 frames but each animal only appears in ~1,000 frames and has different overlaps with all its neighbours?
- Is it computationally feasible to run IDTxl on a network of >1,000 nodes? All papers I have seen only performed effective network inference on very small networks.
Thanks a lot in advance - any help will be much appreciated!
Lukas