Why use Multiple Baseline?

Jeffrey J. Skowron

unread,

Dec 10, 1998, 3:00:00 AM12/10/98

to BEHA...@listserv.nodak.edu

LIstmembers,

I recently had my dissertation proposal meeting. I am conducting a study
in a facility where I have acess to 100 + potential subjects. I am using a
multiple baseline across subjects design, and plan on visually inspecting
graphed data to determine effectiveness. One of my committee members asked
why I didn't just divide the 100+ subjects into control and experimental
groups, and then use inferential statistics to evaluate differences between
the groups. She went on to say that it was her understanding that multiple
baseline designs were most commonly used when you couldn't get enough
subjects. I gave her my response, which must've been acceptable because
she accepted my proposal. I wondering how fellow listmembers would have
responded.

A second methods question arose- I was planning on reporting overall,
occurrence, and non-occurrence reliability for my measures. It was
suggested that I just use Kappa. I have been taught both methods for
accounting for chance aggreement. What are your views. (The concensus for
my dissertation was to just report Kappa as well as the other figures)

I want to emphasize that I'm not trying to be lazy here. I am in a largely
non-behavioral analytic program, where single case/ time series design are
no the norm for a dissertation. I anticipated these questions and went
into my meeting prepared to answer them, with references if need be. I'm
just curious as to how you would have responded. I think the question on
multiple baseline vs. "big N" statistics goes beyond flaws with
conclusions drawn from inferential statistics, into more philosophical and
ethical issues about what it is we really should be studying and what our
dependent variables should be.

Thank You,
Jeffrey SKowron
jsko...@psych.umass.edu

---------------------------------------------
To join the Behavior Analysis (Behav-An) forum, send the command
SUBSCRIBE BEHAV-AN YOURFIRSTNAME YOURLASTNAME;
to leave the forum, send the command
SIGNOFF BEHAV-AN
to LIST...@LISTSERV.NODAK.EDU or, if you experience difficulties, write to BEHAV-AN...@LISTSERV.NODAK.EDU.

Jim Cowardin

unread,

Dec 10, 1998, 3:00:00 AM12/10/98

to BEHA...@listserv.nodak.edu

There should be a lot of responses on this one, Jeffrey.

A behavioral Single-Subject Design looks at the individual organism's
behavior. After all, you teach individuals not groups. So while group
designs "account" for variability mathematically, SS designs reveal
variability that often provides the most important data in the whole study.

Og Lisdsley once illustrated this point with a story about a client in a
sheltered workshop. This person had great celeration data on a few
isolated days, but lousey performance at other times. It became clear that
when he had to work with a partner at his table, his performance was bad.
On days when his performance was very good, the partner was absent.
Without identifying this variability specifically, and thus being able to
study it, that very important information would have been obscured.

The fact that you have a lot of subjects does not change this at all. In
face, an N of 100 is not very large, but I think relatively speaking, it
may be. That is, a lot of statistical studies do not have sufficiently
large N's. Statistics are good for some human studies, but you have to be
careful when you take them out of the cornfield.

Jim Cowardin

John W. Jacobson

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

Jeffrey,

I looked at the reliability indices a long time ago, but things did not
change much in this area in the past decade or so. The statistical arguments
go, generally, that Kappa is superior to % agreement, and the within groups
correlation derived from ANOVA is superior to that. The latter measure is
related to classical measurement theory and is appraised as an r for
significance. the formula is (sum of the products between groups/square root
of (sum of the squares group 1 times sum of the squares group 2).

See Koeppel (1973), p. 498, Design and analysis: A researcher's handbook.

or see discussions by Cohen in later articles.

John Jacobson

John W. Jacobson

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

Whoops, that should be

(sum of the products between groups)/

square root of (sum of the squares group 1 times sum of the squares group
2).

Joseph Cautilli

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

On Thu, 10 Dec 1998, Jeffrey J. Skowron wrote:

> LIstmembers,
>
> I recently had my dissertation proposal meeting. I am conducting a study
> in a facility where I have acess to 100 + potential subjects. I am using a
> multiple baseline across subjects design, and plan on visually inspecting
> graphed data to determine effectiveness. One of my committee members asked
> why I didn't just divide the 100+ subjects into control and experimental
> groups, and then use inferential statistics to evaluate differences between
> the groups. She went on to say that it was her understanding that multiple
> baseline designs were most commonly used when you couldn't get enough
> subjects. I gave her my response, which must've been acceptable because
> she accepted my proposal. I wondering how fellow listmembers would have
> responded.

Different designs are usedto address different questions. The question
that oyu are tryign to answer is one that highlights how an individual's
respond to the intervention and one which highlights the variations of
each individual. Still 100 subjects on a single subject design will give
you alot of data. It might be that you are biting off more then you can
chew. What do you think?

Joe

Robert C. Townsend, Jr.

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

see: hayes, s.c., barlow, d.h., & nelson, r.o. (2nd edition in press). the
scientist-practitioner: research and accountability in the age of managed
care. new york: allyn & bacon.

based on my reading of the above, i would've suggested that time-series
designs are particularly suited to your question, and that group designs
are not able to answer questions at the level of the individual, e.g.
"gordon paul's (1969) question." it's a 'level of analysis' question, no?

congratulations, BTW.

tuna.

Jeffrey J. Skowron

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

On Thu, 10 Dec 1998, Joseph Cautilli wrote:

>Still 100 subjects on a single subject design will give
>you alot of data. It might be that you are biting off more then you can
>chew. What do you think?

While I have access to 100+ subject, I am only going to use about 12
(weekly 30-45 minute behavioal observations on 100 subjects for 3 months is
defininetely more than I can chew, and probably more than I could even bite
off). My committee member's question was along the lines of "why would you
only use a dozen subjects in a multiple baseline design when you have
access to many more".

Thanks for the reply.

By the way Joe, any news with the Behavior Analysis SIG?

Thanks,
Jeff Skowron
jsko...@psych.umass.edu

Robert W. Montgomery, Ph.D.

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

Jeffrey,

Simple-

Clinical Significance vs. Statistical Significance or Robust Impact vs.
Sometimes meaningless impact. Its a question of power and single-case
will (when reviewed properly) always reject less robust interventions than
statistical analysis. Plus you can review individual impact with the
single-case where the statistical analysis allows for no comment on odd
cases.

Of course I could be wrong. ;->

-Robert

On Thu, 10 Dec 1998, Jeffrey J. Skowron
wrote:

> LIstmembers,
>
> I recently had my dissertation proposal meeting. I am conducting a study
> in a facility where I have acess to 100 + potential subjects. I am using a
> multiple baseline across subjects design, and plan on visually inspecting
> graphed data to determine effectiveness. One of my committee members asked
> why I didn't just divide the 100+ subjects into control and experimental
> groups, and then use inferential statistics to evaluate differences between
> the groups. She went on to say that it was her understanding that multiple
> baseline designs were most commonly used when you couldn't get enough
> subjects. I gave her my response, which must've been acceptable because
> she accepted my proposal. I wondering how fellow listmembers would have
> responded.
>

> A second methods question arose- I was planning on reporting overall,
> occurrence, and non-occurrence reliability for my measures. It was
> suggested that I just use Kappa. I have been taught both methods for
> accounting for chance aggreement. What are your views. (The concensus for
> my dissertation was to just report Kappa as well as the other figures)
>
> I want to emphasize that I'm not trying to be lazy here. I am in a largely
> non-behavioral analytic program, where single case/ time series design are
> no the norm for a dissertation. I anticipated these questions and went
> into my meeting prepared to answer them, with references if need be. I'm
> just curious as to how you would have responded. I think the question on
> multiple baseline vs. "big N" statistics goes beyond flaws with
> conclusions drawn from inferential statistics, into more philosophical and
> ethical issues about what it is we really should be studying and what our
> dependent variables should be.
>
> Thank You,
> Jeffrey SKowron

> jsko...@psych.umass.edu
>
> ---------------------------------------------
> To join the Behavior Analysis (Behav-An) forum, send the command
> SUBSCRIBE BEHAV-AN YOURFIRSTNAME YOURLASTNAME;
> to leave the forum, send the command
> SIGNOFF BEHAV-AN
> to LIST...@LISTSERV.NODAK.EDU or, if you experience difficulties, write to BEHAV-AN...@LISTSERV.NODAK.EDU.
>

Robert W. Montgomery, Ph.D.
P. O. Box 1072
Roswell, Georgia 30077-1072
E-mail: Psy...@Panther.GSU.Edu
http://www.behavior-consultant.com

Joseph Cautilli

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

The BA SIG is still going well. We have membership dues collected on about
23 members. We are looking into what can be done on our budget for a basic
newsletter. Currently, the leadership group is trying to put together a
piece for the behavior therapist (John Forsyth is primary author and
heading up the article). All other suggestions are of course welcome.

Thanks for asking,

Joe

On Thu, 10 Dec 1998, Jeffrey J. Skowron wrote:

> On Thu, 10 Dec 1998, Joseph Cautilli wrote:
>
> >Still 100 subjects on a single subject design will give
> >you alot of data. It might be that you are biting off more then you can
> >chew. What do you think?
>
> While I have access to 100+ subject, I am only going to use about 12
> (weekly 30-45 minute behavioal observations on 100 subjects for 3 months is
> defininetely more than I can chew, and probably more than I could even bite
> off). My committee member's question was along the lines of "why would you
> only use a dozen subjects in a multiple baseline design when you have
> access to many more".
>
> Thanks for the reply.
>
> By the way Joe, any news with the Behavior Analysis SIG?
>
> Thanks,
> Jeff Skowron

Oliver Mudford

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

Some views on the question about methods for computing and reporting
reliability of direct observational data

>From: Jeffrey J. Skowron <jsko...@psych.umass.edu>

>>A second methods question arose- I was planning on reporting overall,
>>occurrence, and non-occurrence reliability for my measures. It was
>>suggested that I just use Kappa. I have been taught both methods for
>>accounting for chance aggreement. What are your views. (The concensus
for
>>my dissertation was to just report Kappa as well as the other figures)

This consensus avoids having to argue for one method or another unless that
is a major part of your dissertation. By reporting the results of all
methods, those that prefer to judge reliability by kappa can do so, those
who judge the quality of data by consideration of interobserver agreement
methods will be happy too. One caution about kappa: Beware of significance
testing as large N of observations as you will get from time-samples or
continuous measurement of behaviors can mislead. Also, as Cohen (1960,
p.44) states, " . . . to know merely that kappa is beyond chance is trivial
since one expects much more than this in the way of reliability in
psychological measurement".

Last, if you want to ensure that everyone is happy with your method for
reporting on the quality of your data, look at Johnston & Pennypacker (1993)
on "calibration" of observers. Personally, I like that a lot. In my
limited experience, though, journal editors like it less. So I guess it has
not been a particularly useful approach for me to take.

oliver Mudford (Psychology, Keele U., England)

Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational
and Psychological Measurement, 20, 37-46.

Johnston, J. M., & Penneypacker, H. S. (1993). Strategies and tactics of
behavioral research (2nd ed.). Hillsdale, NJ: Erlbaum.

Hank Pennypacker

unread,

Dec 11, 1998, 3:00:00 AM12/11/98

to BEHA...@listserv.nodak.edu

Jeff:

>. I think the question on
>multiple baseline vs. "big N" statistics goes beyond flaws with
>conclusions drawn from inferential statistics, into more philosophical and
>ethical issues about what it is we really should be studying and what our
>dependent variables should be.
>

You got that right! There are two distinct subject matters involved; one is
the behavior of individual organisms, measured continuously and directly,
and the other is various phenomena inferred from aggregate measures upon
which inferential statistical manipulations have been performed. Once people
understand that, there is a little less reluctance to accept our methods.

BTW, it sounds like you did a fine job!

Cheers,

Hank Pennypacker