Generalized Linear Mixed Model error

hma...@yorku.ca

unread,

Jul 5, 2011, 8:15:47 AM7/5/11

to

I am trying to run a generalized linear mixed model analysis. The
target
variable is continuous (scale), and the data are counts. I am able
to
successfully run a log-linear analysis, using the counts as the
target,
however I would rather use the number of trials as a denominator
(and,
thus, using the binary logistic regression model in the GLMM).
However,
as soon as I select the option to use number of trials as
denominator,
and enter the fixed value, when I run the analysis, I keep getting
the
error message: "The reference value for the target cannot be found
in
the dataset". As far as I can tell, you shouldn't even need to enter
a
reference value for a continuous target, so I'm not sure how to
correct
this error. I also tried entering a "custom reference category"
or
value, and this did not solve the problem.
Any help would me greatly appreciated...

Rich Ulrich

unread,

Jul 6, 2011, 2:42:20 PM7/6/11

to

On Tue, 5 Jul 2011 05:15:47 -0700 (PDT), hma...@yorku.ca wrote:

>I am trying to run a generalized linear mixed model analysis. The
>target
>variable is continuous (scale), and the data are counts. I am able
>to
>successfully run a log-linear analysis, using the counts as the
>target,
>however I would rather use the number of trials as a denominator
>(and,
>thus, using the binary logistic regression model in the GLMM).

Logistic regression has a 0/1 criterion. Dichotomous. You can
use a criterion that is coded 0/1; for convenience the program lets
you tell it what numeric value to treat as the half of the dichotomy,
versus everything else, when the variable is not already 0/1.
You surely don't want that option. Does it allow you to set a
cut-off for creating a dichotomy? - That would was continuous
information, but it would be a way to run Binary Logistic with
a scale.

... "target variable is continous (scale)" says that "binary logistic
regression" is most definitely the wrong analysis, unless you set
it up to ignore the continuous information and use a cutoff score --
either by creating a dummy variable, or using a program-option if
that option is available.

I think this explains the error messages.

>However,
>as soon as I select the option to use number of trials as
>denominator,
>and enter the fixed value, when I run the analysis, I keep getting
>the
>error message: "The reference value for the target cannot be found
>in
>the dataset". As far as I can tell, you shouldn't even need to enter
>a
>reference value for a continuous target, so I'm not sure how to
>correct
>this error. I also tried entering a "custom reference category"
>or
>value, and this did not solve the problem.
>Any help would me greatly appreciated...

--
Rich Ulrich

hma...@yorku.ca

unread,

Jul 7, 2011, 12:00:25 AM7/7/11

to

Thank you for your help. I still haven't solved the problem. The
SPSS help module gives the following example:

In the Generalized Linear Mixed Model:

"• Use number of trials as denominator. When the target response is a
number of events occurring in a set of trials, the target field
contains the number of events and you can select an additional field
containing the number of trials. For example, when testing a new
pesticide you might expose samples of ants to different concentrations
of the pesticide and then record the number of ants killed and the
number of ants in each sample. In this case, the field recording the
number of ants killed should be specified as the target (events)
field, and the field recording the number of ants in each sample
should be specified as the trials field. If the number of ants is the
same for each sample, then the number of trials may be specified using
a fixed value."

In my case, I have as my target variable "number of looks", which is
continuous and is obviously frequency data. I'd like to use the
number of looks out of a given number of trials. In SPSS, if you
select the option to use number of trials as a denominator, the only
options for target distributions you have are:
-binary logistic regression
-binary probit
-interval censored survival

Does this make any sense to you? At any rate, if I then enter the
number of trials (which was the same across all measures) and try to
run the analysis, I come up with the error message I described
earlier.

Even if you can't help, I wanted to thank you for trying -- you are
the first person who has been able to offer me any advice, and that
includes the 'good folks' at SPSS and IBM, who would only answer
questions about how to install the software...!

Bruce Weaver

unread,

Jul 7, 2011, 8:44:33 AM7/7/11

to

I don't yet have v19, so am not familiar with all the ins and outs of the new GENLIN-MIXED procedure (or whatever it is called). But, it sounds to me like it must be doing something akin to WEIGHT by COUNT, given the structure of your data file. Have you tried (as a work-around) expanding the data file to have one row per trial, with a 1-0 binary variable coding the looks? You could do something like the following to expand your file:

* Create some sample data to illustrate.
new file.
dataset close all.
data list list / cluster looks trials (3f5.0).
begin data
1 35 100
2 44 110
3 22 88
end data.

* Expand to one record per trial .

loop case = 1 to trials.
- xsave outfile = "C:\Temp\expanded data.sav" /
keep = cluster case looks trials .
end loop.
execute.

get file = "C:\Temp\expanded data.sav" .
compute killed = case LE looks.
formats killed (f1.0).
crosstabs cluster by killed.

Do you need/want a multilevel logistic regression model? Or were you using GENLIN-MIXED simply to take advantage of the ability to use a data file with the structure you described (i.e., variables LOOKS and TRIALS, with LOOKS as target and TRIALS as denominator)? If the latter, then you can switch back to the LOGISTIC REGRESSION command when using the expanded data file with one row per trial.

HTH.

--
Bruce Weaver
bwe...@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/Home
"When all else fails, RTFM."

David

unread,

Jul 7, 2011, 9:44:54 AM7/7/11

to

I don't have access to GLMM on my ancient SPSS 11.5 however it sounds
like a simple matter of restructuring the data and WEIGHT!
Looks like your data are arranged as follows.
ID NumberOfLooks <Other_Variables>
1 10 .....
2 5 .....
3 4 .....
....
Assuming you have say 16 trials you would want
ID E_NonE Count <Other_Variables>
1 0 6 .....
1 1 10 .....
2 0 11 .....
2 1 5 .....
3 0 12 .....
3 1 4 .....
.....
Achieved as follows:
COMPUTE E0=16-NumberOfLooks.
COMPUTE E1= NumberOfLooks.
VECTOR E=E0 TO E1.
LOOP E_NonE= 0 TO 1.
COMPUTE COUNT=E(E_NonE+1).
XSAVE OUTFILE "Temp.sav"
/ KEEP ID E_NonE COUNT <Other_Variables>.
END LOOP.
EXECUTE.
GET FILE "Temp.sav".
WEIGHT BY COUNT.
Analyze away to hearts content.
-----
OTOH:"When the target response is a

> number of events occurring in a set of trials, the target field
> contains the number of events and you can select an additional field
> containing the number of trials."

Sounds like the FM is referring to a "field" in the data file ie a
variable!
Maybe this is as simple as
COMPUTE number_of_trials=16 (or appropriate number)
and adding number_of_trials onto the appropriate slot in the dialog.

" 'good folks' at SPSS and IBM, who would only answer
> questions about how to install the software...!"

Must be INCREDIBLY BORING doing teksport these days at IBM/SPSS.
HTH, David
.....

hma...@yorku.ca

unread,

Jul 7, 2011, 1:30:20 PM7/7/11

to

Thank you all for your advice. I thought it might be best to give you
the full 'breakdown' of my data structure so you can best advise.

I have three subjects (it is animal research), and there are repeated
measures. The dependent (aka target) variable is the number of looks,
and as previously noted, each set of counts was out of 24 total
trials, which I'd like to use as a denominator, if possible.

I then have two additional factors -- type of trial (2 levels), and
level of effort (3 levels). So, a typical portion of my data file
would look like this:

ID Trialtype Level Numlooks
1 0 1 16
1 0 2 13
1 0 3 8
1 1 1 4
1 1 2 2
1 1 3 1 etc.

So, for each subject, there are are several lines of data, one for
each combination of factors, plus the number of looks for that
factor. I chose to use the GLMM because it allowed me to specify that
multiple lines of data corresponded to a single subject, which other
models apparently couldn't do.

I should note that if I weight cases according to the "Numlooks"
variable, I can no longer enter that variable as my target variable in
the GLMM.

At any rate, in the meantime, I shall try to re-organize my data as
suggested above, but any further advice would be great. What a
wonderful community this is -- I can't tell you how much I have
appreciated the help offered thus far!

David

unread,

Jul 7, 2011, 1:42:50 PM7/7/11

to

OTOH, have you tried computing a new variable and entering that into
the GLMM dialog?
COMPUTE N_Trials=24.
.....
HTH, David

hma...@yorku.ca

unread,

Jul 16, 2011, 11:21:48 AM7/16/11

to

I have an update and important question. I seem to have successfully
have been able to analyze my data as number of events within the total
number of trials (what I was trying unsuccessfully to do in the
GENLINMIXED analysis), using the Generalized Estimating Equations
analysis tool. Is this a valid way of analyzing my data?

My data are counts of a binary measure (Look/No look), and they are
repeatedly measured for three subjects, across two other factors. In
the GEE analysis, I can specify the "subject ID" variable as a
repeated measure, and can use a binary logistic model with the number
of counts as my dependent variable (and can specify the total number
of trials).

Does this sound like a valid way of analyzing this data? (From what I
can tell, GEE is an alternative to GENLINMIXED, which just seems to be
riddled with problems in SPSS 19).

Finally, if I am most interested in the interaction between the two
factors, should I run a model with ONLY the interaction as a fixed
effect, or should I run the full factorial model (i.e., including main
effects of each of the two individual factors).

I know it's a Saturday, but ANY help would be greatly appreciated!!!!

Rich Ulrich

unread,

Jul 16, 2011, 5:28:56 PM7/16/11

to

On Sat, 16 Jul 2011 08:21:48 -0700 (PDT), hma...@yorku.ca wrote:

>I have an update and important question. I seem to have successfully
>have been able to analyze my data as number of events within the total
>number of trials (what I was trying unsuccessfully to do in the
>GENLINMIXED analysis), using the Generalized Estimating Equations
>analysis tool. Is this a valid way of analyzing my data?
>
>My data are counts of a binary measure (Look/No look), and they are
>repeatedly measured for three subjects, across two other factors. In
>the GEE analysis, I can specify the "subject ID" variable as a
>repeated measure, and can use a binary logistic model with the number
>of counts as my dependent variable (and can specify the total number
>of trials).
>
>Does this sound like a valid way of analyzing this data? (From what I
>can tell, GEE is an alternative to GENLINMIXED, which just seems to be
>riddled with problems in SPSS 19).

I don't recognize any pitfalls. But I haven't used either program, so
this is only a weak stamp of approval.

>
>Finally, if I am most interested in the interaction between the two
>factors, should I run a model with ONLY the interaction as a fixed
>effect, or should I run the full factorial model (i.e., including main
>effects of each of the two individual factors).

There is a general rule, that you almost never want to look
at an interaction term without including those same main effects.
It makes relatively little difference for a balanced design except
when the main effects would reduce the error notably.

If Subject is one of the main effects, I would expect that it
should be included to reduce the error. If there is a logical
way to recode the effects so that your particular interaction
of interest is a main effect - sometimes this makes sense - then
you could recode and proceed.

--
Rich Ulrich

hma...@yorku.ca

unread,

Jul 16, 2011, 9:53:07 PM7/16/11

to

Thanks for the reply -- can anyone else second Rich's stamp of
approval?

A final question -- In the output, if I look at the "Tests of model
effects", I get a non-significant effect of the interaction. However,
if I look at the pairwise contrasts within the interaction (selected
before I ran the analysis, since I didn't know the interaction would
be non-significant), there are MANY VERY significant differences (p = .
000), and the "Overall test results" and corresponding Wald Chi-square
there are significant. Why would this be? (In my understanding, this
latter set of results is "based on the linearly independent pairwise
comparisons among the estimated marginal means", which I understand is
different than testing the effect of each parameter within the model
as a whole. I just don't understand why everything would appear to be
SO significant between the estimated marginal means, and yet not have
a significant effect in the model (albeit it reasonably close to
significant -- p=.105)

Any speculation?

Bruce Weaver

unread,

Jul 18, 2011, 9:29:58 AM7/18/11

to

Jos Twisk discusses GEE vs multilevel models in his book on longitudinal data analysis (at least partially available via Google Books, I think). He also compares them on the same dataset in this article:

http://jech.bmj.com/content/59/8/706.full

One other link you might find useful:

http://www.stat.columbia.edu/~cook/movabletype/archives/2006/12/generalized_est.html