CUSUM Bootstrapping problem

Puliyel

unread,

Sep 21, 2009, 5:08:04 AM9/21/09

to MedS...@googlegroups.com

Dear All

I wonder if any one in the list can please help me with this bootstrapping problem.
I have raw data, in sequence, of failures and success with treatment.

There were 5 failures and 30 successes.
The sequence of failures (F) and success (S) were as follows

SSSSSSSSSSSSSSSSSSSFSSSSSFSSFSSFFSS

For CUSUM calculations each success gets a score of +2/7 and each failure gets a score of -12/7

Using bootstrapping techniques (reordering the sequence 1000 times) I want to calculate the 95% confidence limits for CUSUM

I am not able to get the software I downloaded on to Excel, to provide me the confidence limits for this data using CUSUM

I will appreciate help from anyone familiar with this tool

Background:

I am only a novice but my understanding is that:
CUSUM stands for Cummulative sum charts and is a form of 'time series analysis.' Bootstrapping in this time series is done for 'change point analysis'. More details are given in this excellent paper by Wayne Taylor

http://www.variation.com/cpa/tech/changepoint.html

Suppose ordinarily there is one failure in 5. The failures don't come regularly like this:
SSSSFSSSSFSSSSFSSSSFSSSSFSSSSFSSSSFSSSSF

You could well have by effect purely of chance:
FFFSSSSSSSSSSS

Bootstrapping of the time series will help you define the limits (of failures or success coming together) that can occur by chance.

I was trying to draw the limits for the data
SSSSSSSSSSSSSSSSSSSFSSSSSFSSFSSFFSS
to apply further, to find when an experiment is running outside the limits of control.

Thank you in anticipation

Sincerely

Jacob Puliyel

Head Pediatrics

St Stephens Hospital

Tis Hazari

Delhi

--
___________________________
Jacob M. Puliyel MD MRCP MPhil

eFax 00 44 7092-124285
Phone 00 91 11 23946388
00 91 9868035091

Adrian Sayers

unread,

Sep 21, 2009, 10:23:02 AM9/21/09

to meds...@googlegroups.com

Might be worth googling David spigelhatler, i know he has used the
CUSUM method in identifying overly numerous deaths, i am not sure how
he calculated the intervals, but i expect his papers detail this.

bw
Adrian

2009/9/21 Puliyel <pul...@gmail.com>:

Ted Harding

unread,

Sep 21, 2009, 10:48:33 AM9/21/09

to meds...@googlegroups.com

If you just google on "David spigelhatler" you will end up with
an enormous haystack -- within which will be several needles,
but you will have to do a lot of sifting to find them!

A good search phrase would be

spiegelhalter bristol shipman

Have a look at:

http://intqhc.oxfordjournals.org/cgi/content/abstract/15/1/7
Risk-adjusted sequential probability ratio tests:
applications to Bristol, Shipman and adult cardiac surgery
DAVID SPIEGELHALTER, OLIVIA GRIGG, ROBIN KINSMAN2
and TOM TREASURE
International Journal for Quality in Health Care 15:7-13 (2003)

http://stats-www.open.ac.uk/PHsurv/Spiegelhalter.pdf
[Slides for a talk by David Spiegelhalter on the general use
of CUSUM and Control Charts in detecting untoward trends]
"Extreme multiplicity: monitoring large numbers of indicators
and areas or institutions"
Open University 21 May 2008
[Good examples of graphs, and list of References at the end.]

You will find numerous other candidates with the Google search
suggested above.

Ted.

>> _ I am only a novice but my understanding is that:
>> CUSUM stands for Cummulative sum charts and is a form of_ 'time series

>> analysis.' Bootstrapping in this time series is done for 'change point
>> analysis'. More details are given in this excellent paper by Wayne
>> Taylor
>> http://www.variation.com/cpa/tech/changepoint.html
>>
>> Suppose ordinarily there is one failure in 5. The failures don't come
>> regularly like this:
>> SSSSFSSSSFSSSSFSSSSFSSSSFSSSSFSSSSFSSSSF
>>
>> You could well have by effect purely of chance:
>> FFFSSSSSSSSSSS
>>
>> Bootstrapping of the time series will help you define the limits (of

>> failures or success_ coming together) that can occur by chance.

>>
>> I was trying to draw the limits for the data

>> _SSSSSSSSSSSSSSSSSSSFSSSSSFSSFSSFFSS

>> to apply further, to find when an experiment is running outside the
>> limits
>> of control.
>>
>> Thank you in anticipation
>>
>> Sincerely
>> Jacob Puliyel
>> Head Pediatrics
>> St Stephens Hospital
>> Tis Hazari
>> Delhi
>>
>> --
>> ___________________________
>> Jacob M. Puliyel MD MRCP MPhil
>>
>>

>> eFax _00 44 7092-124285

>> Phone 00 91 11 23946388

>> _ _ _ _ _00 91 9868035091
>>
>> >
>>
>
> >

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.H...@manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 21-Sep-09 Time: 15:48:29
------------------------------ XFMail ------------------------------

Martin Holt

unread,

Sep 21, 2009, 10:52:39 AM9/21/09

to meds...@googlegroups.com

That's David "Spiegelhalter"...involved in the Bristol inquiry and Harold
Shipman.

bw,
Martin Holt

Frank Isackson

unread,

Sep 21, 2009, 6:45:41 PM9/21/09

to meds...@googlegroups.com

The paper in question in other responses is "Risk Adjust Probability Ratios..." available at

http://intqhc.oxfordjournals.org/cgi/reprint/15/1/7

Frank Isackson

On Sep 21, 2009, at 2:08 AM,

wrote:

greybeard

unread,

Sep 23, 2009, 12:01:06 AM9/23/09

to MedStats

On Sep 21, 5:08 am, Puliyel <puli...@gmail.com> wrote:
> Dear All
>
> I wonder if any one in the list can please help me with this bootstrapping
> problem.
> I have raw data, in sequence, of failures and success with treatment.
>
> There were 5 failures and 30 successes.
> The sequence of failures (F) and success (S) were as follows
>
> SSSSSSSSSSSSSSSSSSSFSSSSSFSSFSSFFSS
>
> For CUSUM calculations each success gets a score of +2/7 and each failure
> gets a score of -12/7
>
> Using bootstrapping techniques (reordering the sequence 1000 times) I want
> to calculate the 95% confidence limits for CUSUM
> I am not able to get the software I downloaded on to Excel, to provide me
> the confidence limits for this data using CUSUM
>
> I will appreciate help from anyone familiar with this tool
> Background:
> I am only a novice but my understanding is that:
> CUSUM stands for Cummulative sum charts and is a form of 'time series
> analysis.' Bootstrapping in this time series is done for 'change point

> analysis'. More details are given in this excellent paper by Wayne Taylorhttp://www.variation.com/cpa/tech/changepoint.html

>
> Suppose ordinarily there is one failure in 5. The failures don't come
> regularly like this:
> SSSSFSSSSFSSSSFSSSSFSSSSFSSSSFSSSSFSSSSF
>
> You could well have by effect purely of chance:
> FFFSSSSSSSSSSS
>
> Bootstrapping of the time series will help you define the limits (of
> failures or success coming together) that can occur by chance.
>
> I was trying to draw the limits for the data
> SSSSSSSSSSSSSSSSSSSFSSSSSFSSFSSFFSS
> to apply further, to find when an experiment is running outside the limits
> of control.

The CUMSUM method in that paper is designed for analysis of a variable
measured on a continuous scale. Your data is binary.

Why not ask how likely it would be to see 10 S's in a row? It's fairly
simple ... if the probability of an S is 4/5, then the probability
that the next 10 experiments will not have a single F are (1-1/5)^10
or about 0.11.

The probability that the next 19 experiments would be all S's is
around 0.014. It passes the conventional 5% threshold at 14 straight
S's but you certainly do have a Sequential Probability Problem
("multiplicity" in the Speigelhalter [not "Spigelhalter" or
"Spigelhatler"] presentation). You are going to need to think
seriously about how many of these sequences you are going to see in a
month and how may "false alarms" you can tolerate.

Another way of thinking about this is that after 20 experiments you
would expect 4 F's and your observed number of F's was 1. How unlikely
is that assuming that the F-process is Poisson? Not very unlikely I
would say.

--
David Winsemius, MD

Puliyel

unread,

Sep 23, 2009, 1:45:17 AM9/23/09

to meds...@googlegroups.com

Dear David.

Thanks for that novel insight of looking at probability of getting 4 F in a row.

However in CUSUM I expect you are more tolerant of 4 F after 20 S than you will be of 4 F after 2 S. (The failures may cross limits of tolerance in the latter case whereas in the former instance the 20 S have taken the cumalative sum so high that 4 F will bring it close to the zero line for CUSUM.)

The purpose is 'change point analysis' as Wayne Taylor writes in the paper (I referenced earlier) and to pick up deviations that most likely is not due to chance.

Sincerely

Jacob Puliyel

Puliyel

unread,

Nov 11, 2009, 3:11:54 AM11/11/09

to MedS...@googlegroups.com

Dear All

2 months back I had posted this bootstrapping problem for a clinical trial we were doing.

We have now developed a software using this for 'CUSUM (cumulative sum) limits' calculations.

The background

RCTs are the best way to study a new intervention. However they are very expensive (and so RCTs are often done by the pharmaceutical industry promoting the new intervention). We wondered if CUSUM as used in industry for quality control, can be used, at least initially, to check if a new intervention does more harm or more good than traditional treatment.

The Proof of Concept

I am attaching a small study as 'proof of concept' When we acquired the data for the study the software had not been developed. The pragmatic stopping rule we adopted while acquiring the data was to temporarily stop the trail (pending full development of the software) if the CUSUM with the new drug exceeded the overall rate of failure with the standard drug (if the CUSUM of failures with the new drug crossed the zero line).

I have uploaded the software at
http://jacob.puliyel.com/foresee/

This now allows the intervention to be compared to 'standard therapy' in real time (meaning a new CUSUM graph can be drawn with each new patient treated) and so the lag phase before an intervention is declared as 'causing more harm,' is minimized.

I would greatly appreciate any feed back (including negative feedback!) on this.

Regards

Jacob Puliyel

Neeraj CUSUM.doc

Martin Holt

unread,

Nov 11, 2009, 10:04:29 AM11/11/09

to meds...@googlegroups.com

Dear Jacob,

Thank you for this interesting post. Some years ago I worked in QA/QC in a
clinical diagnostics environment, so read your proposal closely. Especially
since it seems you are proposing replacing RCT methodology with the CUSUM
approach.

From Page 7, "However, RCT's have inherent problems, especially in the
context of trials in children. According to Mc Culloch and colleagues, RCT's
require large samples, long duration, difficult blinding and are very
expensive10 and it is difficult to recruit cases11. Parents find the concept
of equipoise between trial drugs and the need for blinded randomization
difficult to understand." Are you suggesting that the CUSUM approach
improves on these conditions whilst remaining effective ?

In your latest posting you say, "The pragmatic stopping rule we adopted

while acquiring the data was to temporarily stop the trail (pending full
development of the software) if the CUSUM with the new drug exceeded the
overall rate of failure with the standard drug (if the CUSUM of failures

with the new drug crossed the zero line)." RCT's also employ stopping rules.
I'm finding it difficult to understand why the CUSUM approach with a
stopping rule is better than an RCT approach with a stopping rule. This is
worsened because I don't recognise that if "the new drug exceeded the
overall rate of failure with the standard drug" it is equivalent to "if the
CUSUM of failures with the new drug crossed the zero line". This might be
me, I might be rusty, but if for example the new drug scored -0.25, say,
over a number of occasions while the limit was -2, it would pass each time
whilst the CUSUM would steadily decline. I'm wary, in case your definition
of CUSUM (enclosed in your software) is different to mine.

Page 2:
"What this study adds
Nebulised hypertonic saline is at least as good as standard treatment with
nebulised Epinephrine." In standard equivalence trials, showing this
requires a larger sample size. Whilst open to a new idea, again I wonder if
the CUSUM approach is as valid as RCT, yet achieves this ?

Page 6:
"A Cochrane review of the use of Epinephrine found evidence that it was more
effective when used in the outpatient setting but no evidence of benefit
when used in inpatients when compared against either placebo or Salbutamol3.
"
Page 5:
". Nebulised bronchodilators like Salbutamol, Ipravent and Epinephrine have
been used by some in treatment of bronchiolitis. A Cochrane meta-analysis
has not found these drugs to be useful2"
Page 5:
". It is clear that it is for this temporary but perceptible relief of
symptoms that these drugs are used."
You require an active comparator, and Epinephrine seems commonly recognised
as such, but care is required as formal studies suggest it may not be
effective in the long term, in your setting (in patient). So to say that
your drug is at least as effective as Epinephrine might not prove much.

I wondered if a crossover method might be used ? Or might this limit study
participants to being not too ill.

Analysis of RCT data allows for adjustments to be made (eg age). How could
this be achieved with CUSUM ?

Page 13: a strength of the study is early stoppage in real time if
necessary. What about false negatives ?

Table 1 goes from Score 0 to 3, but at the bottom refers briefly to scores 4
to 9.

Figure 2 at patient 3 shows a sharp drop in the bootstrapping lines, in 8
out of 10, yet no alteration in gradient is seen in the blue, CUSUM
graph....seems strange ?

Re-reading this, I haven't said much positive about the paper.....sorry. I
like the idea, and would appreciate in words and diagram how you translate
your data into CUSUM, using a real example. When I used it, I would have
found it straightforward to do so (what does bootstrapping achieve here ? ~
this question probably reflects more on me than the paper !)

Best Regards,

Martin Holt

----- Original Message -----
From: Puliyel
To: MedS...@googlegroups.com
Sent: Wednesday, November 11, 2009 8:11 AM
Subject: {MEDSTATS} Re: CUSUM Bootstrapping problem

Puliyel

unread,

Nov 11, 2009, 2:24:46 PM11/11/09

to m861...@btinternet.com, MedS...@googlegroups.com

Dear Martin
Thank you for the detailed letter. I will answer your questions in red just under each question below.
Warm regards
Jacob

On Wed, Nov 11, 2009 at 8:34 PM, Martin Holt <m861...@btinternet.com> wrote:

Dear Jacob,

Thank you for this interesting post. Some years ago I worked in QA/QC in a
clinical diagnostics environment, so read your proposal closely. Especially
since it seems you are proposing replacing RCT methodology with the CUSUM
approach.

From Page 7, "However, RCT's have inherent problems, especially in the
context of trials in children. According to Mc Culloch and colleagues, RCT's
require large samples, long duration, difficult blinding and are very
expensive10 and it is difficult to recruit cases11. Parents find the concept
of equipoise between trial drugs and the need for blinded randomization
difficult to understand." Are you suggesting that the CUSUM approach
improves on these conditions whilst remaining effective ?

I am suggesting that CUSUM may be a less expensive way to look at a new mode of treatment and compare it to standard therapy. Here of course the we lose the advantage of randomization, blinding etc that RCTs have, and we are looking only at historical controls.

In your latest posting you say, "The pragmatic stopping rule we adopted

while acquiring the data was to temporarily stop the trail (pending full
development of the software) if the CUSUM with the new drug exceeded the
overall rate of failure with the standard drug (if the CUSUM of failures

with the new drug crossed the zero line)."

This pragmatic rule needs explanation.

After the first part of the trial with standard drug (epinephrine for bronciolitis) we hoped to do the bootstrapping and to draw the control lines before we used the study drug (hypertonic saline). Unfortunately we had difficulties with the bootstrapping. Rather than delay the second part of the study, we chose to use the 'pragmatic stopping rule' - ie to recruit cases and use study drug as long as the CUSUM stayed above the zero line (and we planned to suspend the study till the limit lines were drawn, if the CUSUM for failures crossed the zero line.)

RCT's also employ stopping rules.
I'm finding it difficult to understand why the CUSUM approach with a
stopping rule is better than an RCT approach with a stopping rule. This is
worsened because I don't recognise that if "the new drug exceeded the
overall rate of failure with the standard drug" it is equivalent to "if the
CUSUM of failures with the new drug crossed the zero line". This might be
me, I might be rusty, but if for example the new drug scored -0.25, say,
over a number of occasions while the limit was -2, it would pass each time
whilst the CUSUM would steadily decline. I'm wary, in case your definition
of CUSUM (enclosed in your software) is different to mine.

I hope the explanation above answers the question. Ordinarily the trial with the study drug should be stopped when failures crosses the -2SD line.

We planned to use the zero line stopping rule only temporarily, till the 2SD lines were drawn. In the event we were able to complete the study as the failures with the study drug happened only at the end of the study.
RCTs also use stopping rules but that involves one or two 'mid-study analysis' of data. Here I am suggesting we can monitor CUSUM with each patient recruited (not after a third of the study is done).

Page 2:
"What this study adds
Nebulised hypertonic saline is at least as good as standard treatment with
nebulised Epinephrine." In standard equivalence trials, showing this
requires a larger sample size. Whilst open to a new idea, again I wonder if
the CUSUM approach is as valid as RCT, yet achieves this ?

The question of sample size for a valid CUSUM study needs to be looked at. I am not confident our sample size was any where near adequate but I thot it was enough to test the concept of using CUSUM for such comparisons.

Page 6:
"A Cochrane review of the use of Epinephrine found evidence that it was more
effective when used in the outpatient setting but no evidence of benefit
when used in inpatients when compared against either placebo or Salbutamol3.
"
Page 5:
". Nebulised bronchodilators like Salbutamol, Ipravent and Epinephrine have
been used by some in treatment of bronchiolitis. A Cochrane meta-analysis
has not found these drugs to be useful2"
Page 5:
". It is clear that it is for this temporary but perceptible relief of
symptoms that these drugs are used."
You require an active comparator, and Epinephrine seems commonly recognised
as such, but care is required as formal studies suggest it may not be
effective in the long term, in your setting (in patient). So to say that
your drug is at least as effective as Epinephrine might not prove much.

I agree completely and that is why we brought up all those Cochrane meta-analysis. So to say that
your drug is at least as effective as Epinephrine might not prove much. My question is - Can we use this CUSUM analysis tool, to compare a new drug against 'standard therapy'

I wondered if a crossover method might be used ? Or might this limit study
participants to being not too ill.

Analysis of RCT data allows for adjustments to be made (eg age). How could
this be achieved with CUSUM ?

Perhaps only the same age can be compared.

Page 13: a strength of the study is early stoppage in real time if
necessary. What about false negatives ?

The stopping rule will be 'crossing of the -2SD lower line'. The bootstrapping process makes allowance for the random clustering of failures.

Table 1 goes from Score 0 to 3, but at the bottom refers briefly to scores 4
to 9.

There are 5 rows and each row has a maximum score of 3. The highest score one can achieve is 15

Figure 2 at patient 3 shows a sharp drop in the bootstrapping lines, in 8
out of 10, yet no alteration in gradient is seen in the blue, CUSUM
graph....seems strange ?

The blue line is the actual sequence with the standard drug. 10 random reodering of the data is also shown.

Thus it must be clarified, that the blue line is not the mean of the data from 10 iterations.

Re-reading this, I haven't said much positive about the paper.....sorry.

No apologies are called for. I sent it to the experts for their honest feed back. And I am not saying this is better than RCT but only that RCT are difficult to do so can we consider this method even if it is not as good.

I
like the idea, and would appreciate in words and diagram how you translate
your data into CUSUM, using a real example. When I used it, I would have
found it straightforward to do so (what does bootstrapping achieve here ? ~
this question probably reflects more on me than the paper !)

You write What does bootstrapping achieve here?

In the study with standard drug we have a rate of failures to success. However given the same rate of failure, the sequence of failures can be different and so the the highest and lowest CUSUM scores in that sequence can be different (Figure 2 shows 10 such randomly reordered data ). Bootstrapping using a 10000 iterations can help examine the highest and lowest scores achieved by chance, by randomly reordering the data.

Thank you again for taking so much effort to study this paper.
I hope I have answered all your questions.My purpose is not to promote one type of therapy (hypertonic saline or epinephrine) but to see if CUSUM is a valid (quick and less expensive) method to compare therapies.

Martin Holt

unread,

Nov 12, 2009, 8:45:19 AM11/12/09

to meds...@googlegroups.com

Thank you Jacob for expanding on the points I raised. I do have one further question, now that I understand the role of bootstrapping, etc.

Years ago, when I routinely used Shewhart and CUSUM plots, the CUSUM plot worked as follows. It would start at the mid-line (zero line). If the next observation was +0.3, say, the CUSUM score became 0.3. If the next observation was also 0.3, the CUSUM became 0.6. And so on: the CUSUM score was really a measure of bias. It was possible to have a number of individual observations all of which were "within limits", but the CUSUM might cross one of the limits and so flag that the observations were biased. (But not necessarily failures). Does this accord with your understanding of CUSUM ? If so, for a particular failure rate of the standard drug, bootstrapping would produce +/- 2SD limits, but for the test drug to be seen to fail, the CUSUM would need to cross the limit in the same number of observations or less, wouldn't it ? ~ the test drug might not fail at all but might eventually pass the limit because it is slightly biased. If that's right, one then gets into the problems of clusters of failures, etc, with the new drug that you are addressing with the standard drug by bootstrapping.

Does this make sense ?

Puliyel

unread,

Nov 12, 2009, 9:52:41 AM11/12/09

to meds...@googlegroups.com, m861...@btinternet.com

Dear Martin
That is correct.

Assume that with standard drug, failures occur 1in 5.
But the sequence need not be SSSSFSSSSF
It could well be SFFSSSSSSS.

Bootstrapping examines the limits of CUSUM by randomly reordering the data.

-------------------------------------------
I think the method should work if one has a large sample with standard drug. But I am not a statistician and perhaps there are a hundred pitfalls I am unaware of. Instead of making an ass of myself by sending it for publication, I decided to ask what this group thought of the method.

Sample size calculation for the standard drug trial is one of the problems that need sorting.

Sincerely
Jacob

Reply all

Reply to author

Forward