Carlo,
The BIC is not defined for hierarchical models. The DIC can be seen as a generalization for this setting (but more like a generalization of AIC, which is meant as a metric to be able to predict out of sample data rather than a metric for defining the 'true' model). It generally can be a reasonable approximation when pD (the number of effective parameters, which is part of the DIC calculation) is << n, the number of observations.
The bias toward complexity can be resolved a little bit with subsequent versions of DIC, e.g. by Plummer (2008). This is not calculated automatically in pyMC (it is computationally costly) but is available in JAGS -- and from my experience there, and also some notes from Plummer himself, it almost always penalizes by very close to twice as much as standard DIC. So if you wish you can simply multiply the pD that HDDM gives you (the penalty term) by 2 and then adjust DIC accordingly. I haven't verified that this selects the appropriate model though given known generative data with DDM.
As mentioned in previous posts, I would suggest also doing a posterior predictive check to complement any model selection metric, and also note that with more complex models the posteriors will generally be wider and hence that also provides a form of penalty in terms of how easy it will be to detect a difference in parameters (between conditions groups etc). Thus if you see a significant difference (little to no overlap in posteriors), this is in spite of the increased complexity. (This is roughly Kruschke's argument).
Michael