Note: If I say something that's factually wrong, please correct me.
I'll be referring to statsmodels/pystatsmodels as SM.
Problem: The time series models in SM "carry around" their data. What I mean is, the model definition contains all the data it trained/fitted on. For many applications this behavior is fine - you normally want to work with one model per time series, R does exactly this, etc.
However, in a "machine learning" context, this becomes a problem. If one wants to save their model (e.g. ARIMA) in SM, they can't just save the parameters (e.g. ARIMA coefficients and, maybe, starting values) - they must save the whole time series. There is no (apparent) way to transfer parameters from one model to another.
How scikit-learn does things, is have a "model" object which can be "fitted", which changes its internal parameter values but does NOT save the entire dataset into the "model" object (unless it's e.g. k-nearest-neighbors).
In another, really old, thread
(here) has some arguments for SM's architecture, however
I really need the model to be decoupled from the data.
Is there any way to do this currently with SM?I've been trying to do a workaround from the API side, but I haven't found any way to do that. It feels a bit like I'll have to modify the internals by hand, which would be unfortunate. Or write it myself, which would be even more unfortunate. ;)
P.S. I was about to go on a rant discussing a way to appease both camps, but that would cause significant API changes, and I'm pretty sure backwards-compatibility is really important to a lot of folks, so I won't waste my breath/typing/time. :)