I guess you are just at the limit for in-memory computing on a regular computer.
The two main discussions were for out-of-memory computation and using sparse matrices for dummy variables.
In your case, it might be possible to save just enough memory to get it to run, but I'm not sure how much slack there is.
I find it a bit puzzling that numpy tries to allocate the full (682,542, 123) array for the dot product.
(maybe exog needs to be fortran contiguous for the linalg)
Did you check that you didn't use much memory already before creating the model?
Then we would need to check individual parts.
e.g. how much does memory consumption increase when creating the model?
If lbfgs works, then we would need to check whether or which hessian/cov_params computation will work.
IIRC fit(method= "xxx", max_start_irls=0, skip_hessian=True) should work for scipy optimizers
skip_hessian is mainly an internal keyword, and I'm not sure it works on the user sized.
It should prevent to automatically compute the hessian.
Also, You could try using discrete.Poisson instead of GLM-Poisson.
It's implemented specifically for the Poisson family and might need less memory. I never checked.
The big solution, out of memory and sparse matrices are likely not in the near future for statsmodels.
Josef