Some comments inserted below...
I'm not sure what you have in mind here. Feel free to use or adapt what I have there as you see fit.
In a sense the equations in the code I provided are very general because they support any model that can be expressed as a set of MLE equations with additively separable log likelihood terms for each observation. (CGM provide an even more general approach for m-estimators, encompassing both non-linear and linear. but additively separable, GMM and MLE models.)
There are probably a variety of other diagnostics that apply to all MLE estimators (or all m-estimators) that could be applied across a wide variety of models if the models provide gradient(by obs)+hessian or score(by obs)+information matrix. For example, if the OLS model, which does derive from Likelihood model, implemented methods for the score (by obs) and information, then the clustered standard errors could be computed very easily and efficiently from the likelihood (or m)-based standard error formulas.
That looks pretty interesting (I'm not familiar with the paper). There's an interesting design problem here in that you have these more general classes of models that nest a bunch of interesting special cases (e.g. m-class estimators > MLE estimators > Generalized Linear models > OLS) and in theory you would probably want to be able to supply diagnostics that apply to the widest class of models. So, for example, any model that can be expressed as MLE, you should be able to get the standard set of model tests (Wald, LR, LM, AIC? ...). For more specialized models there will be more efficient ways to calculate some of the needed quantities (e.g. standard closed forms for the score and info matrices for OLS) or more model-specific tests.
Anyway, I guess it would be good to make sure that all models correctly derive from the more general model classes and supply as many of the base class methods as possible. That's not currently the case (I got many NotImplementedErrors in trying the get my code working)
Thanks,
dm