defensive programming

Skip to first unread message

May 18, 2022, 10:48:40 PM5/18/22
to pystatsmodels
I haven't looked at programming discussion in a while, but ran into this:

statsmodels is still pretty restrictive in the code, internally we only use numpy arrays and some selective explicit use of pandas data structures.

We don't dispatch to other numpy like data structures and don't take advantage of those like dask for out of core or distributed processing, automatic differentiation libraries or even sparse arrays/matrices.

The advantage of only using numpy arrays (np.asarray) for serious computation is that we don't have to worry about whether other data structures are using the same definitions for various functions and methods. 

This reduces the number of bugs we can get from ambiguous behavior.
On the other hand, supporting additional data structures requires a lot of work to explicitly support them, which so far is still very limited.

one of my old examples: 
numpy var defaults to ddof=0, pandas var defaults to ddof=1, which makes a mess with writing a t-test for both of them.

Reply all
Reply to author
0 new messages