What component do you hope will consume your YAML? (Hiera, maybe?) Are the "modules" you're talking about Puppet modules, or some other kind? Are you expecting to do this via the "apply" face or via the "agent" face?
For the belance of this post, I'm going to assume that the answers are, in order, "I don't actually care", "Yes, Puppet modules", and "the 'apply' face".
it seems that default or merge doesn't do what I need here.
Each execution of the create_resources() function creates resources of exactly one type. It takes multiple calls to create resources of multiple types. Moreover, create_resources() is rarely used to declare classes, in part because that's what an external node classifier is for. In fact, what you are describing sounds much like a data-driven external node classifier.
It's a little bit wonky to download your Puppet manifests after invoking the catalog compiler, but I think you can do it in this one-time mode [I think] you're describing. I suggest that you indeed do approach it by writing an external node classifier that consumes the YAML, performs the SVN checkout into the right location, and emits the desired specifications for classes to apply. It looks like that last bit might just be a subtree of your overall data.
I do not think it is wise to try to download Puppet modules for the current Puppet run any time after Puppet has passed the ENC point.
John