Leading vs. Lagging Indicators, sample size, and confidence

13 views
Skip to first unread message

Franklin Mellott

unread,
Aug 20, 2014, 3:40:46 PM8/20/14
to piface-d...@googlegroups.com
Hello,
 
I'm looking for some help on correlating leading indicators with lagging indicators. The basic idea is say that when you see an increase in the number of red light violations, that this correlates to an increase in car accidents at the same locations where you're measuring.
 
Let's say for now that I've got a single metric that I track as a leading indicator, and a single metric that I track as a lagging indicator. If I wanted to say there is a correlation between the two, how large would my sample size need to be if I wanted to be say 75% confident there is a correlation? What if I wanted to be 90% confident?
 
Then, what happens if you have cascading indicators, say (most leading to lagging), where you have Tier 4 events, then Tier 3 events, then Tier 2 events, and finally Tier 1 events. How would the calculation above change if you wanted to establish say that a rise in the number of Tier 4 events (least serious) is correlated to a rise in Tier 1 events?
 
I'm a recovering physics major, and I've enough understanding of statistics to be dangerous, but know that this one is something where I need some help.
 
Thanks,
Frank

Lenth, Russell V

unread,
Aug 26, 2014, 5:27:48 PM8/26/14
to franklin...@gmail.com, piface-d...@googlegroups.com

I am sorry I have let this languish unanswered for so long. Basically, my instinct is that it could be very difficult to get enough data, just based on reading newspaper reports here in Iowa where people have tried to relate intersections with red-light cameras and seeing if there is a deterrent effect on accidents. There just aren’t enough accidents, and they happen in too many places, to get a handle on it.

 

I suppose you could try something simple, like the app for sample size for R-square. For just the correlation between two variables, specifiy that there is 1 regressor and put for rho^2 the square of the smallest correlation you’d think is important in a practical sense. For example, with rho = .3, rho^2 = .09 and you need n = 85 to get an 80% power of detecting it. Problem is, realistically I am guessing that rho is quite small.

 

I don’t have any ideas for the multi-tier situation.

 

You might try posting this question on CrossValidated – stats.stackexchange.com. There are a lot of people who look at that site so potentially a lot more possibility of getting some ideas.

 

Russ

--
You received this message because you are subscribed to the Google Groups "PiFace discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to piface-discuss...@googlegroups.com.
To post to this group, send email to piface-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/piface-discussion.
To view this discussion on the web visit https://groups.google.com/d/msgid/piface-discussion/32fce6ba-322e-47fa-8124-ec1d2c8142b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages