Large dataset with categories UI unresponsive and unusable, test case?

34 views
Skip to first unread message

jo...@theabrahams.ca

unread,
Mar 17, 2022, 1:42:56 PM3/17/22
to Wizard User Group
Hi there.  I have a dataset with 1.78 million records. There is a variable with about 1000 categories, and I'm trying to estimate an interaction coefficient for each category, in conjunction with one real-valued variable, with one other (non-interacted) real-valued variable, and no intercept. 

Test
* Create a new project
* Tell Wizard 2 to treat the categorical variable as categories
* Click the New Model button, and select the dependent variable, the two real-valued variables, and the categorical variable, click "Build Model".

It beach balls for about 4 minutes at this point. What's it doing?  Shouldn't the UI thread be responsive, even while it's building up the model?  When it comes back, the old window seems to have disappeared, a new window pops up (on my other monitor), and it says "An error occurred while running the regression.".  

Test 2:
* Delete model
* filter down to 179,000 records
* Create New Model

It beach balls for XXX minutes, before again closing the window, and opening a brand new window.  This time there are some results.

This is on my 14 core (28 thread) iMac Pro. 

When I try to change the model by deselecting any variables, it beach balls for 4 minutes after each click, unless I completely uncheck the category variable. 

When I finally get around to specifying the model I actually want, it complains "Not enough data.  Try fewer filters". 

I was able to do the same regression in R, just to show that it's not a data problem.

Can this be fixed up?  I'm not sure why the UI thread is blocking, perhaps having 1000 values in a category variable is a use-case you didn't consider/test for?

Let me know if you would like the data, I can strip identifying columns and send it to you.

John Abraham

Reply all
Reply to author
Forward
0 new messages