regarding the citation: not sure. maybe the original DIC paper. also check out the DIC page of the BUGS software.
one problem with them is that it is not clear what is the effective number of parameters you have, when using a hierarchical model.
For the individual models:
optimize('ML') does not use quantiles, it use ML on the original data. I didn't implement AIC/BIC but it should be fairly easy. To get the log likelihood you can just take the observed nodes and sum their logp:
maybe something like this:
sum([x.logp for x in model.get_observeds()['node']])
Then use the equations from wikipedia.
also, a general comment:
all information criterions come with their own assumptions. A much better approach is cross validation. You cannot wrong with that. But sometimes its better not to argue with the reviewer...