Levenshtein maximum distance is greater than length of both strings (JavaScript)

Showing 1-5 of 5 messages
Levenshtein maximum distance is greater than length of both strings (JavaScript) Anders Lyman 5/22/13 1:08 PM

var dmp = new diff_match_patch();
var diffs = dmp.diff_main("709113544", "1555855732");
var distance = dmp.diff_levenshtein(diffs);
console.log(distance); // 14

I'm pretty sure this is a bug, since the API docs say: "the maximum distance is the length of the longer string".
I've logged a new issue, but I'm wondering if I've simply misunderstood something?

Thoughts?

Re: Levenshtein maximum distance is greater than length of both strings (JavaScript) Neil Fraser 5/22/13 11:06 PM
On 22 May 2013 13:08, Anders Lyman <ander...@gmail.com> wrote:
> I'm pretty sure this is a bug, since the API docs say: "the maximum distance
> is the length of the longer string".
> I've logged a new issue, but I'm wondering if I've simply misunderstood
> something?

It is a bug, but in the API docs.  That Levenshtien distance is
correct.  Neat, I didn't realize that a Levenshtien distance could be
longer than the longer string.

I'll take a closer look at this tomorrow, and if it checks out I'll
change the docs.  Thanks!

--
Neil Fraser
http://neil.fraser.name
Re: Levenshtein maximum distance is greater than length of both strings (JavaScript) Anders Lyman 5/23/13 6:01 AM

Ha! That's awesome. I've had intermittent bugs surface due to this, but this is the first time I've been able to nail it down.
I need to generate a percentage of change between two strings, so my assumption that the maximum distance is the length of the longer string is clearly what caused the aforementioned bugs...

What would be the maximum distance? Is there any easy/consistent way to determine that?

> --
> You received this message because you are subscribed to a topic in the Google Groups "Diff Match Patch" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/diff-match-patch/RXxj5-fxMKs/unsubscribe?hl=en.
> To unsubscribe from this group and all its topics, send an email to diff-match-pat...@googlegroups.com.
> To post to this group, send email to diff-mat...@googlegroups.com.
> Visit this group at http://groups.google.com/group/diff-match-patch?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Re: Levenshtein maximum distance is greater than length of both strings (JavaScript) Anders Lyman 5/23/13 7:39 AM
This seems really bizarre to me - every definition I've found of the levenshtein algorithm indicates "It is at most the length of the longer string."
For example: 
  1. http://en.wikipedia.org/wiki/Levenshtein_distance
  2. http://stackoverflow.com/questions/6087281/similarity-score-levenshtein/6087809#6087809
  3. http://stackoverflow.com/questions/2781252/probability-algorithm-finding-probable-correct-item-in-a-list-e-g-john-john/2782593#2782593 (3rd answer down)
It does mention an edge case: "If the strings are the same size, the Hamming distance is an upper bound on the Levenshtein distance."
That doesn't apply here however, as the strings have different lengths.
Re: Levenshtein maximum distance is greater than length of both strings (JavaScript) Christian LeMoussel 10/1/13 3:46 AM
I have the same probleme with this C# code :

            DiffMatchPatch.diff_match_patch dmpDiff = new DiffMatchPatch.diff_match_patch();
            List<DiffMatchPatch.Diff> lDiffs = dmpDiff.diff_main("considéré", "apprécié");
            int iIndex = dmpDiff.diff_levenshtein(lDiffs);

                        Console.WriteLine(iIndex); // 11 ??????

According to the docs on the API home page  this is not possible. The length of the longer string is equal to 9.
The minimum distance is 0 which means equality, the maximum distance is the length of the longer string.