The issue I've had with MSA is that it has a semantic query layer which is "smart" but doesn't take exact phrases. But by doing that it outsmarts attempts to do reliable research and the results it returns are terrible. But perhaps a more targeted search for longer title strings from single articles gives much better results. I do have an email out to another researcher who is rumored to have code which might help me get what I need from MSA, but it's sort of a shot in the dark. Currently I am not accessing MSA via its API.
I don't know how smoothly 4. would run or how much time it'd take but it sounds like you're willing to tolerate it taking a bit of time. The full-text generation is automatic although I believe it is clipped at a certain length by default. That is configurable though.
WRT 5, you can get the full-text either locally or via the API. The API is fairly simple but I haven't worked with it much. I have the most experience with the internal JS API. The choice seems to me to be between