Help with three way Anova speed

26 views
Skip to first unread message

Muhammad Kaleem

unread,
Jun 9, 2023, 7:03:26 AM6/9/23
to pystatsmodels
Hi,

I am running a three way ANOVA using statsmodels stats.anova_lm method. My data has 100000 time-point data, with one continuous variable(dependent) and three categorical variables(independent). There are 100 possible values for two of the independent variables and 12 possible values for the third independent variable. My system (3 GHz, 16 GB Ram), takes 7 seconds for one running of anova, whereas I need to run the anova 10100 times in my analysis. 
My code works like this:

c1 = np.arange(12)
c1 = c1.repeat(10000)
c2 = np.arange(100)
c2 = c2.repeat(1200)
c3 = np.arange(12)
c3 = c3.repeat(10000)
continuous = np.random.rand(120000)
df = pd.DataFrame()
df["c1"] = c1
df["c2"] = c2
df["c3"] = c3
df["v"] = continuous
temp_model = ols(f"v ~ C(c1) + C(c2) + C(c3)", data=df).fit()
temp_tbl = sm.stats.anova_lm(temp_model, typ=2)

Can someone please help with the speed issues or point out anything that I am doing wrong?

Best,
Muhammad

josef...@gmail.com

unread,
Jun 9, 2023, 10:35:57 AM6/9/23
to pystat...@googlegroups.com
I don't see anything to speed this up.
On my notebook it takes around 5.4 seconds, 
of which 3 seconds are in `fit` (mainly pinv, I guess) and a bit over 2 seconds in patsy dmatrices.
(when I tried method="qr", it was even slower than default pinv)

There would be a large speedup in repeated estimation if exog stays the same and the pinv_wexog can be reused.
If both endog and exog differ across cases, then I don't see a possible speedup.

Treating c2 as continuous, reduces time to around 1.7 seconds on my computer.

Josef




--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/f549d53c-1ff1-44af-97ec-1509f7700eb5n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages