Can MFA be used for time series data?

Kohkichi Hosoda

unread,

Aug 18, 2013, 6:53:09 AM8/18/13

to factomin...@googlegroups.com

Hi FactoMineR users,

I am analyzing some time series data which consist of control group (5 subjects) and treatment group (5 subjects). Each subject was observed at different time points (0h, 0.5h, 1h, 2h, 24h) and 80 metabolites values were obtained. So, this is a kind of repeated measures study. My data looks like;

Condition time metabolite A metabolite B metabolite C

C1_30 control 0.5h -0.04142747 -0.37265363 -0.11276739

C2_30 control 0.5h -0.36124165 -1.17384736 -0.84574746

C3_30 control 0.5h 0.04387140 0.73808805 0.35003297

C4_30 control 0.5h 0.05652616 -0.80994402 0.03163576

C5_30 control 0.5h -0.30130528 -0.06219186 -0.07025853

C2_60 control 1h -0.08966251 0.21990856 -0.94293543

C3_60 control 1h 0.12005663 0.45336003 -0.56922725

C4_60 control 1h -0.21664809 -0.68921863 1.01670941

C5_60 control 1h 0.44345835 -0.23815902 -0.33989998

C1_120 control 2h -0.54637261 -0.82786901 0.37363236

C2_120 control 2h -0.01997253 0.25259060 -0.74666215

For example, C2_30, C2_60, C2_120 are different time point data of the same subject, i.e., C2.

My questions are followings;

1. Can MFA be used for this kind of repeated measure analysis?

2. If so, how should be my data table reorganized for MFA and how should I do MFA?

My guess is;

time0.5h time1h …..

Condition metabolite A metabolite B metabolite C … metabolite A metabolite B metabolite C …

C1 control -0.04142747 -0.37265363 -0.11276739 … -0.08966251 0.21990856 -0.94293543 ...

C2 control -0.36124165 -1.17384736 -0.84574746 ...

…….

and

MFA(mydata, group=c(80, 80, 80, 80), type=(rep("c", 4)), name.group=c("0.5h", "1h", "2h", "24h"), num.group.sup=1)

Each time point has the same 80 metabolites as variables. The num.group.sup indicates control or treatment group in Condition column.

Is this right?

3. I already did pareto scaling for my data and I do not want FactoMineR do any scaling. I guess data is not scaled by FactoMineR if type is "c" in MFA formula. Is this right?

4. I would like to focus the change of metabolic profile and am not interested in between-subject variability. Therefore, each value of each metabolites was transformed into change from 0h time point. For example, value at 0.5h/value at 0h. I would like to focus within-subject change. Is this idea right? Or should I use raw data value including 0h time point?

Kohkichi

François Husson

unread,

Aug 19, 2013, 3:37:26 AM8/19/13

to factomin...@googlegroups.com

Hi,

1) You are right, MFA can be used for times series data (when a same set of individual is described several times).

2) Yes, you have to organise your data set as you do. But the line of code should rather be:

MFA(mydata, group=c(1,80, 80, 80, 80), type=("n",rep("c", 4)), name.group=c("condition","0.5h", "1h", "2h", "24h"), num.group.sup=1)
Indeed, you have to say that the 1st variable is a group, with a categorical variable names "condition", and this group is a supplementary one.

3) Yes, if you use "c", MFA do not scale the variables in each group.

4) If you have only one time and you want to perform a PCA, what would you choose? You have to make the same choice in MFA than in PCA. Perhaps you can see with the too choices what is the best.

When you have time series, you can use the argument chrono=TRUE in the plot.MFA function when you draw the partial points (then the partial points will be connected from one time to another rather than connected to the mean point.

FH

Kohkichi Hosoda

unread,

Aug 19, 2013, 4:11:02 AM8/19/13

to factomin...@googlegroups.com

Hi François,

Thank you very much for your quick response. Now I can be much more confident.

I will try to analyze my data according to your advice.

Kohkichi.

2013年8月19日月曜日 16時37分26秒 UTC+9 François Husson:

Kohkichi Hosoda

unread,

Aug 20, 2013, 9:53:56 AM8/20/13

to factomin...@googlegroups.com

Hi,

I have two more basic questions.

1. Data for MFA should be scaled column-wise on each table. Variables of all time point should not be scaled together but should be scaled on each table of each time point. Is this right?

2. What about missing time point? For example, if C1, C2, C3, C4 have 0h, 1h, 2h, 3h time points, C5 has 0h, 1h, 3h (2h missing) and C6 has 0h, 2h, 3h (1h missing) and each time point has the same 80 variables, how should the data be dealt with? Can I do MFA as they are? or should I do some special manipulation?