The only thing I would say is to prototype for your use case.
There are a few things to keep in mind regarding large documents (e.g., > 500k):
1. If you're doing a full-document, replace-style, update, that entire 500k needs to be serialized and sent across the wire. This could get expensive on an update-heavy deployment.
2. Same goes for queries. If you're pulling back 500k at a time, that has to go across the network and be deserialized on the driver side.
3. While most atomic updates happen in-place, the document usually has to be rewritten in-place on the server, as this is dictated by the BSON format. If you're doing lots of $push operations on a very large document, that document will have to be rewritten server-side, which, again, on a heavy deployment, could get expensive.
4. If an inner-document is frequently manipulated on its own, it can be less computationally expensive both client-side and server-side simply to store that "many" relationship in its own collection. It's also frequently easier to manipulate the "many" side of a relationship when it's in its own collection.
If going embedded all the way works for your use case, then there's probably no problem with it. But with these extra-large documents, and a heavy load, you may start to see consequences in terms of performance and/or manipilability.
What'd I'd recommend to Dave, and I think Durran would agree, is to do some serious prototyping and benchmarking, and let the results of that investigation determine his course.