think it's closer to the first one, but as I have not started the design for other cases, it's hard to give a general answer.
What I expect is that in all cases the code that is in charge of generating the AX tree sends a snapshot of what the user is really seeing to the library, and gets some annotations that help in improving (or in some cases creating) the tree.
For the /content/renderer case, this call is from inside the content layer, but I think for the other use cases (PDF, Aura, and Arc), it will be from outside /content.
Did I answer your question?