Hi,
I think that's a great idea. Note, however, that the MCMC algorithms currently implemented can not be parallelized (except trivially running multiple chains). There have been a slew of papers coming out that allow MCMC to run in parallel on subsets of the data (see e.g.
http://arxiv.org/abs/1311.4780 or
http://arxiv.org/abs/1402.4102 or Max Wellings papers) which seems more appropriate for Spark.
I'm not sure what the best way to interface with PyMC3 would be. Would you want to replace all the likelihoods? Perhaps there is even a way to interface at the Theano level which might make PyMC run more seamlessly.
If anything, I would imagine that you would want to go with PyMC 3 rather than 2 as it's a clean slate rewrite with a more thought out architecture. I think at this point the core is pretty solid but documentation is missing.
Thomas