Re: [SPM] Longitudinal image analysis using Sandwich estimation with small numbers

299 views
Skip to first unread message

Thomas Nichols

unread,
May 22, 2018, 2:47:22 PM5/22/18
to Fiford, Cassy, S...@jiscmail.ac.uk, swe-t...@googlegroups.com
Dear Cassy,

Sorry I missed your earlier email.  For SwE help also check out the Google Group for the SwE Toolbox (there isn't so much there, but I'll see the message quicker) https://groups.google.com/forum/#!forum/swe-toolbox.

This is a question for those of you who use the Sandwich Estimator (SwE), to model longitudinally registered images. Or any general imaging stats experts who are familiar with SwE, and marginal models.

 

I originally posted this last week but I have not received any responses yet.

 

My model is looking at change in voxel volume predicted by diagnostic group (of which there are 5), presence of a cerebral microbleed (MB) (binary covariate interacted with diagnostic group) and other covariates (see design matrix fig 1).

 

The issue I am having is that one of my groups (SMC) has a very low number of individuals with a microbleed (only 6 out of 70). Therefore, the contrast investigating MB associations in this group looks very odd, with huge z scores (see fig 2 attached).


This is strange.  Can you send me the SwE.mat file, e.g. via this upload service?  I'll try to see what is the cause.  It could be an interaction of the peculiarities of the groups and the covariates and the method used to compute the StdError and/or the small sample size (eDF) correction.
 

I have found results for the effect of MB on atrophy rates in other groups in the same model. My question is whether these results are invalidated the strange effects in SMC. As there is an interaction, the issue is only in the SMC group; would the fact it has not coped well with small numbers in this group interfere with the statistical relationships of covariates on atrophy rates in any other group?


*If* you have a completely separable model, i.e. every effect is essentially split by the 5 groups, then, no, strangeness in one group should propagate.  But I see that you have at least "agetime" that is common... what is that?  How is different from the *time variables?

Thanks for your patience with this.

-Tom

One sanity check I could perform is to run the model without the SMC group, to see if the results in other groups change.

 

Let me know if you have any contributions to this.

 

Many thanks in advance

 

Cassy

 

Cassy Fiford

----------------------------

Dementia Research Centre 
Box 16, National Hospital for Neurology and Neurosurgery 
Queen Square 
London 
WC1N 3BG 

 

+44 (0) 203 1086167

 

think about the enviroment

 




--
__________________________________________________________
Thomas Nichols, PhD
Professor of Neuroimaging Statistics
Nuffield Department of Population Health | University of Oxford
Big Data Institute | Li Ka Shing Centre for Health Information and Discovery
Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom
T: +44 1865 743590 | E: thomas....@bdi.ox.ac.uk
W: http://nisox.org | http://www.bdi.ox.ac.uk

Thomas Nichols

unread,
May 31, 2018, 3:29:55 AM5/31/18
to Fiford, Cassy, swe-t...@googlegroups.com
Hi Cassy,

Sorry for the delay.

Firstly, I’m hoping it is OK that I CC this all to the SwE list… it will be very helpful to others using the tool.

You indeed have a complex design.  There are some basic problems with your model, but I think the problem of ‘exploding’ z-scores is a generic one that, even after you fix your model, you might face. First let’s sort out your model then discuss the exploding 

I would describe your model as 

   Intercept +
   TimeIntraSubj * Group +
   TimeIntraSubj * Group * MB +
   TimeIntraSubj * ICV +
   TimeIntraSubj * Group * Lacune +
   TimeIntraSubj * BaselineAge

So, first of all, there is a golden rule in modelling that you *never* include a higher order interaction unless all lower order terms are included.  Here, you’re interacting Group with various things, but you don’t have the main effect of group!  In effect, you have lots of group-specific slopes without allowing group-specific intercepts!

This also applies to MB, Lacune and, most crucially, BaselineAge… e.g. you’ve included the *3-way* interaction of TimeIntraSubj * Group * MB without including either MB or Group.  Are you really sure you want the interaction with time, or did you just want Group*MB and Group*Lacune?   Anyway, you need the lower order terms to interpret the interactions correctly.

So… if I were to pose your model, not knowing exactly what’s important, I would have had:

   Group + 
   Group * MB +
   Group * Lacune +
   Group * TimeIntraSubj +
   Group * TimeIntraSubj * MB +
   Group * TimeIntraSubj * Lacune +
   ICV +
   BaselineAge +
   TimeIntraSubj * BaselineAge

Where, crucially, the Group is modelled with 5 predictors that sum to a constant column, so the intercept is implicitly modelled.  Likewise, Group*MB will have 5 columns and so implicitly models the main effect of MB; same for Group*Lacune and Group * TimeIntraSubj.

Now, as you’ll see, you’re getting *quite* a complex model!  You might want to consider if you really need Group interacted with all of these things.  Especially considering you don’t have much power for some groups.


OK!  Now to the basic question of the exploding Z stats.  This one contrast depends not only on a small group, but on a predictor that is mostly zeros within that group (many subjects have no MBs).  Detecting this with complete generality is really hard.  I’ve hacked up some code that will tell you about the OLS efficiency of each contrast:

load SwE
X=SwE.xX.X;
nP=size(X,2);
c0=zeros(nP,1);
c0(1)=1; % this must be the contrast selecting the mean
MeanSD=sqrt(c0'*inv(X'*X)*c0);
nC=length(SwE.xCon);
for i=1:nC
  c=SwE.xCon(i).c;
  RelSD(i)=sqrt(c'*inv(X'*X)*c)/MeanSD;
end
[fprintf('Approximate contrast SD relative to mean SD\n\n'),...
fprintf('%3d: %g\n',[(1:nC)' RelSD']')];

This shows that your contrast 7 and 8, and possibly 9 and 10 are going to be troublesome, as they have approximate SD 10x or more than the mean.  There’s no magic threshold, and you might have good designs estimating something particularly hard/subtle that have low efficiency (high relative SD) but it is just a warning flag to check things.

So!  Let me know if all of this makes sense.

-Tom


   
On 23 May 2018, at 15:25, Fiford, Cassy <cassidy....@UCL.AC.UK> wrote:

Dear Tom,
 
Thank you for your response and for showing me the SwE Google Group- I’ll definitely post on there first next time.
 
Very good point about the age*time variable, that is a shared effect of age across the groups (it is each subjects baseline mean centred age multiplied by median centred time). Similarly to agetime, Lacune and Tivtime are also shared effects. I’ve uploaded the SwE.mat (SwE_cassy.mat), thank you for offering to have a look. 
 
The model is quite complex, so here’s a breakdown of each column of the design matrix:
1. constant (column of 1s)
2. ctime- median centred time for control subjects (main effect of control on change in voxel volume)
3. emcitime- median centred time for EMCI subjects (main effect of early MCI status on change in voxel volume)
4. lmcitime- median centred time for LMCI subjects (main effect of late MCI status on change in voxel volume)
5. smctime- median centred time for SMC subjects (main effect of SMC (subjective memory concern) status on change in voxel volume)
6. adtime- median centred time for AD subjects (main effect of AD (subjective memory concern) status on change in voxel volume)
7. somec- covariate indicating whether a control subject had a cerebral microbleed (originally a binary covariate, which was then multiplied by median centred time)
8. someemci- As before for emci
9. somelmci- As before for lmci
10. somesmc- As before for smc
11. somead- As before for ad
12. Total intracranial volume for each subject (multiplied by median centred time)
13. Lacune- binary covariate indicating whether a subject had a lacune (multiplied by median centred time)
14. wmhc- log transformed white matter hyperintensity volume for controls (multiplied by median centred time)
15. wmhemci- as before for emci
16. wmhlmci- as before for lmci
17. wmhsmc- as before for smc
18. wmhad-as before for ad
19. Agetime – baseline mean centred age multiplied by median centred time
 
These are all ADNI2 subjects, (so its de-identified data).
 
Please let me know if anything needs clarification. I really appreciate your help with this.
 
Best wishes,
 
Cassy
 
 
 
 
From: Thomas Nichols [mailto:thomas....@bdi.ox.ac.uk] 
Sent: 22 May 2018 19:47
To: Fiford, Cassy
Cc: S...@jiscmail.ac.uk; swe-t...@googlegroups.com
Subject: Re: [SPM] Longitudinal image analysis using Sandwich estimation with small numbers
 
Dear Cassy,
 
Sorry I missed your earlier email.  For SwE help also check out the Google Group for the SwE Toolbox (there isn't so much there, but I'll see the message quicker)https://groups.google.com/forum/#!forum/swe-toolbox.
 
This is a question for those of you who use the Sandwich Estimator (SwE), to model longitudinally registered images. Or any general imaging stats experts who are familiar with SwE, and marginal models.
 
I originally posted this last week but I have not received any responses yet.
 
My model is looking at change in voxel volume predicted by diagnostic group (of which there are 5), presence of a cerebral microbleed (MB) (binary covariate interacted with diagnostic group) and other covariates (see design matrix fig 1). 
 
The issue I am having is that one of my groups (SMC) has a very low number of individuals with a microbleed (only 6 out of 70). Therefore, the contrast investigating MB associations in this group looks very odd, with huge z scores (see fig 2 attached).
 
This is strange.  Can you send me the SwE.mat file, e.g. via this upload service?  I'll try to see what is the cause.  It could be an interaction of the peculiarities of the groups and the covariates and the method used to compute the StdError and/or the small sample size (eDF) correction.
 
I have found results for the effect of MB on atrophy rates in other groups in the same model. My question is whether these results are invalidated the strange effects in SMC. As there is an interaction, the issue is only in the SMC group; would the fact it has not coped well with small numbers in this group interfere with the statistical relationships of covariates on atrophy rates in any other group?
 
*If* you have a completely separable model, i.e. every effect is essentially split by the 5 groups, then, no, strangeness in one group should propagate.  But I see that you have at least "agetime" that is common... what is that?  How is different from the *time variables?
 
Thanks for your patience with this.
 
-Tom
 
One sanity check I could perform is to run the model without the SMC group, to see if the results in other groups change.
 
Let me know if you have any contributions to this.
 
Many thanks in advance
 
Cassy
 
Cassy Fiford
----------------------------
Dementia Research Centre 
Box 16, National Hospital for Neurology and Neurosurgery 
Queen Square 
London 
WC1N 3BG 
 
 
<image001.png>
 


 
-- 
__________________________________________________________
Thomas Nichols, PhD
Professor of Neuroimaging Statistics
Nuffield Department of Population Health | University of Oxford
Big Data Institute | Li Ka Shing Centre for Health Information and Discovery
Old Road Campus | Headington | Oxford | OX3 7LF | United Kingdom
T: +44 1865 743590 | E: thomas....@bdi.ox.ac.uk
W: http://nisox.org | http://www.bdi.ox.ac.uk
Reply all
Reply to author
Forward
0 new messages