Do I understand this right: What is dragged is merely a copy of the seen text (the "view mode" text), i.e there is no mechanism going into the actual wikitext to copy it?
If this is the case... then, yeah, I understand if it can't be much improved... :-(
Not saying you should do it, but is it even technically possible to access the actual wikitext by selecting view mode text? To get away from a need for "pattern matching" (i.e finding the literal selected text also in the wikitext, which is what I interpret your current solution to use) maybe the wikitext could produce some kind of physical position indicator, like the line number, which can be accessed in view mode. So selecting a snip also conveys what original rows this rendered text stems from, and then these rows are instead what is drag'n dropped.
- At all realistic?
<:-)