A friend of mine recommended this:
Your outcome is the % increase. Theoretically, I suppose costs could
decrease, but I'm guessing it's pretty rare. Anyway, suffice to say
that you might consider a log transform or other sort of transform.
You also might try to see if you can get numbers, rather than %
increase per year. And if you get those, then you might also do a log
transformm. Anyway, all I'm suggesting is that whatever outcome you
regress (or do whatever) on, you should do some serious EDA to see if
the assumptions of regression (or whatever model) are met.
Having said that, let's say you nailed down a continuous outcome
variable and you're calling it outc. I'd start by getting the data
long, like this:
st lawyr year outc
AK none 2001 0.01
AK none 2002 0.03
AK none 2003 0.02
AL 2004 2001 0.06
AL 2004 2002 0.05
etc. We have one record per state per year, so with 8 years of data
you'd have 400 records.
Next I'd generate a variable "law" which is 1 if and only if smoking
legislation is in effect that year:
st lawyr year outc law
AK none 2001 0.01 0
AK none 2002 0.03 0
AL 2004 2003 0.06 0
AL 2004 2004 0.05 0
AL 2004 2005 0.05 1
AL 2004 2006 0.02 1
etc.
Next, I'd do regression, accounting for clustering at the state level,
with indicator variables for the year. If you're using Stata, it
might look like this:
xi: glm outc i.year law, vce(cluster st) family(whatever) link
(whatever)
[By "whatever" I don't mean that you should just put whatever you
feel, but you should decide based on the distribution of the outc
variable.]
You'll get back output with a coefficient for each year* (indicating
the overall change in "outc" that year from the previous year in a
state with no smoking law), and also a coefficient for law (indicating
the effect of a smoking law on outc per year). If the coefficient for
"law" is statistically significantly negative (or < 1 if you're using
some sort of multiplicative model), then smoking laws are good. If
the coefficient is statistically significantly positive, then they're
bad.
(*Actually I think you won't get a constant for the first year. The
constant term will represent year-on-year increase for the first year,
and coefficients for years will be the difference between the first
year and each other year. If you add the option "noconst" you might
get around this.)
An advantage of this method is that it doesn't assume anything about
the overall distribution of cost increases. For example, you can
allow costs to shoot up from 2003 to 2004, then stay constant from
2004 to 2005.
On Sep 14, 12:10 pm, Anthony <ajdam...@gmail.com> wrote:
> Hello, I'm trying to run some sort of epidemiological analysis or test
> on whether smoking legislation is at all linked to the health care
> spending rates of increase. I have data for all fifty states
> (structured just like the random table with ten example states below),
> with each state's a) rates of increase in health spending for every
> year, and b) year that some sort of anti-smoking legislation passed
> (if at all).
> I'd like to know if there's a suitable epidemiological test that could
> use data in this format in order to determine whether or not the rate
> of increase in health spending slows down in the years after smoking
> legislation has passed (the Law Year on the left of the table)...
> On top of that, I'd also like to know if (and then how) I would need
> to account for the population size of each state...
> St. Law Year 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
> 1 None 0% 3% 3% 4% 1% 1% 2% 1% 4% 3% 2%
> 2 1998 2% 4% 5% 1% 1% 3% 1% 2% 1% 4% 2%
> 3 None 1% 3% 3% 0% 3% 4% 3% 2% 4% 1% 1%
> 4 None 2% 3% 2% 4% 5% 1% 4% 4% 0% 4% 3%
> 5 2003 3% 4% 3% 3% 1% 1% 2% 3% 2% 4% 5%
> 6 None 1% 3% 4% 4% 1% 4% 2% 0% 3% 5% 4%
> 7 2005 1% 2% 3% 2% 2% 5% 0% 4% 3% 5% 1%
> 8 2006 0% 4% 2% 1% 4% 3% 1% 3% 2% 3% 5%
> 9 None 4% 3% 1% 1% 3% 5% 1% 2% 3% 5% 2%
> 10 2001 1% 3% 0% 4% 3% 0% 4% 1% 5% 3% 2%
> Any advice on how I might begin to analyze data like this would be
> tremendously appreciated.
> Thanks,
> Anthony Damico