I am looking at using the diff-match-patch routines when applying relatively small changes in potentially large strings. For example, making a few edits to a single paragraph within a 100,000 word text document.
If I know the range of the paragraph being modified, what would be the best way to manage this (e.g. speed/memory use)? It would seem that duplicating the entire string in order to compare before/after edits knowing that 99%+ will be the same is unnecessarily labor intensive and requires holding two copies of the string in memory.
Can I copy the paragraph to a new string, apply the edits, calculate diff between the two substrings, and then add an "offset" before/after calculating the patch? I want to be able to store the change information for later retrieval, but for obvious reasons don't want to store multiple copies of the entire string.
Any thoughts on this before I head too far down an unworkable rabbit hole would be appreciated.
Thanks!
Fletcher