[vim/vim] Add support for the diff() function (PR #12321)

Yegappan Lakshmanan

unread,

Apr 30, 2023, 2:54:37 PM4/30/23

to vim/vim, Subscribed

Add support for computing diff between two Lists of strings using a function.

When using a LSP plugin, whenever a buffer is modified, the modifications are sent to the language server.
To do this, the LSP plugins have implemented computing the diff in Vimscript or in Lua:

https://github.com/prabirshrestha/vim-lsp/blob/master/autoload/lsp/utils/diff.vim
https://github.com/natebosch/vim-lsc/blob/master/autoload/lsc/diff.vim
https://github.com/neovim/neovim/blob/master/runtime/lua/vim/lsp/sync.lua

These computations are expensive as they compare every character and line in a buffer.
This new function will help in efficiently computing the diff.

You can view, comment on, or merge this pull request online at:

https://github.com/vim/vim/pull/12321

Commit Summary

2036da0 Add support for the diff() function

File Changes

(9 files)

Patch Links:

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Apr 30, 2023, 3:01:03 PM4/30/23

to vim/vim, Push

@yegappan pushed 1 commit.

c40f95c Fix build failure

—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Apr 30, 2023, 3:04:37 PM4/30/23

to vim/vim, Push

@yegappan pushed 1 commit.

aa8801e Fix build failure

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

codecov[bot]

unread,

Apr 30, 2023, 3:15:05 PM4/30/23

to vim/vim, Subscribed

Codecov Report

Merging #12321 (aa8801e) into master (5b10a14) will increase coverage by 0.67%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##           master   #12321      +/-   ##
==========================================
+ Coverage   82.05%   82.73%   +0.67%     
==========================================
  Files         160      150      -10     
  Lines      193404   180423   -12981     
  Branches    43423    40543    -2880     
==========================================
- Hits       158699   149267    -9432     
+ Misses      21826    18192    -3634     
- Partials    12879    12964      +85

Flag	Coverage Δ
huge-clang-none	`82.73% <88.88%> (+<0.01%)`	⬆️
linux	`82.73% <88.88%> (+<0.01%)`	⬆️
mingw-x64-HUGE	`?`
mingw-x86-HUGE	`?`
windows	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/evalfunc.c	`88.88% <ø> (-1.27%)`	⬇️
src/diff.c	`83.15% <88.72%> (-0.11%)`	⬇️
src/typval.c	`89.97% <100.00%> (-0.43%)`	⬇️

... and 137 files with indirect coverage changes

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Apr 30, 2023, 4:26:13 PM4/30/23

to vim/vim, Subscribed

> Add support for computing diff between two Lists of strings using a
> function.
>
> When using a LSP plugin, whenever a buffer is modified, the
> modifications are sent to the language server.
> To do this, the LSP plugins have implemented computing the diff in
> Vimscript or in Lua:
>
> https://github.com/prabirshrestha/vim-lsp/blob/master/autoload/lsp/utils/diff.vim
> https://github.com/natebosch/vim-lsc/blob/master/autoload/lsc/diff.vim
> https://github.com/neovim/neovim/blob/master/runtime/lua/vim/lsp/sync.lua
>
> These computations are expensive as they compare every character and
> line in a buffer.
> This new function will help in efficiently computing the diff.

I suppose you made the structure of the return value work for your
purpose. It's a bit strange though: List of Dict of Dicts. And using a
special "end" item to indicate added or removed text (the help is not
clear, but I guess this is only used when a whole item was
added/removed, not when a word was added/removed).

To find inserted or removed list items requires checking whether the
"byte" item indicates the end of the string.

Wouldn't it be easier to use when returning something similar to unified
diff:
- Indicate range of deleted items (like a "-" line)
- Indicate range of inserted items (like a "+" line)
- Indicate a modified item (like a "-" line followed by a "+" line)

Could be done with a "count" item, which is negative for deleted items,
positive for inserted items and zero if items were modified.

Does this still fit in with what you need for LSP support?

Minor remarks:

+diff({list}, {list} [, {options}])
+ List compute the diff of two List of strings

"List" -> "Lists".
"of" sounds a bit wrong, using "between" makes the text wrap. How about
"diff two Lists of strings"?

--
It is too bad that the speed of light hasn't kept pace with the
changes in CPU speed and network bandwidth. -- ***@***.***>

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Apr 30, 2023, 5:01:40 PM4/30/23

to vim...@googlegroups.com, reply+ACY5DGCSX7R5F3QVXR...@reply.github.com, vim/vim, Subscribed

Hi Bram,

On Sun, Apr 30, 2023 at 1:26 PM Bram Moolenaar <vim-dev...@256bit.org> wrote:

> Add support for computing diff between two Lists of strings using a
> function.
>
> When using a LSP plugin, whenever a buffer is modified, the
> modifications are sent to the language server.
> To do this, the LSP plugins have implemented computing the diff in
> Vimscript or in Lua:
>
> https://github.com/prabirshrestha/vim-lsp/blob/master/autoload/lsp/utils/diff.vim
> https://github.com/natebosch/vim-lsc/blob/master/autoload/lsc/diff.vim
> https://github.com/neovim/neovim/blob/master/runtime/lua/vim/lsp/sync.lua
>
> These computations are expensive as they compare every character and
> line in a buffer.
> This new function will help in efficiently computing the diff.

I suppose you made the structure of the return value work for your
purpose. It's a bit strange though: List of Dict of Dicts. And using a
special "end" item to indicate added or removed text (the help is not
clear, but I guess this is only used when a whole item was
added/removed, not when a word was added/removed).

The "end" item contains the end index and the byte offset of the diff hunk.

If the string was not present earlier, then the byte offset cannot be

calculated. So it will be -1. When a word is removed, it contains the

position of the end of the word.

To find inserted or removed list items requires checking whether the
"byte" item indicates the end of the string.

Wouldn't it be easier to use when returning something similar to unified
diff:

- Indicate range of deleted items (like a "-" line)
- Indicate range of inserted items (like a "+" line)
- Indicate a modified item (like a "-" line followed by a "+" line)

The start and end index values in the returned Dict does contain the

above information (range of deleted or inserted or modified items).

Could be done with a "count" item, which is negative for deleted items,
positive for inserted items and zero if items were modified.

Does this still fit in with what you need for LSP support?

Minor remarks:

+diff({list}, {list} [, {options}])
+ List compute the diff of two List of strings

"List" -> "Lists".
"of" sounds a bit wrong, using "between" makes the text wrap. How about
"diff two Lists of strings"?

I have updated the help text.

Regards,

Yegappan

vim-dev ML

unread,

Apr 30, 2023, 5:01:57 PM4/30/23

to vim/vim, vim-dev ML, Your activity

Hi Bram,

On Sun, Apr 30, 2023 at 1:26 PM Bram Moolenaar ***@***.***>
wrote:

>
> > Add support for computing diff between two Lists of strings using a
> > function.
> >
> > When using a LSP plugin, whenever a buffer is modified, the
> > modifications are sent to the language server.
> > To do this, the LSP plugins have implemented computing the diff in
> > Vimscript or in Lua:
> >
> >
> https://github.com/prabirshrestha/vim-lsp/blob/master/autoload/lsp/utils/diff.vim
> > https://github.com/natebosch/vim-lsc/blob/master/autoload/lsc/diff.vim
> >
> https://github.com/neovim/neovim/blob/master/runtime/lua/vim/lsp/sync.lua
> >
> > These computations are expensive as they compare every character and
> > line in a buffer.
> > This new function will help in efficiently computing the diff.
>

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Apr 30, 2023, 5:02:13 PM4/30/23

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

909f0fe Update help

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Apr 30, 2023, 5:13:34 PM4/30/23

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

a7255f6 Add more examples

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Apr 30, 2023, 6:30:00 PM4/30/23

to vim...@googlegroups.com, reply+ACY5DGDIGR7U3GG7EU...@reply.github.com, vim/vim, vim-dev ML, Your activity

Hi Bram,

I am attaching some examples to describe the return value of this function.

Example 1:

---------- old text -----------

aaaaaa

--------------------------------

---------- new text -----------

aaaxx

xxxxxx

xxaaa

----------------------------------

The modification starts at index 0 and byte 3 and ends at index 2 byte 2 in the

old text. In the new text, the changes start at index 0 byte 3 and end at index

2 byte 1. In this case, the diff() function returns the following:

[{'from': {'start': {'idx': 0, 'byte': 3}, 'end': {'idx': 2, 'byte': 2}},
'to': {'start': {'idx': 0, 'byte': 3}, 'end': {'idx': 2, 'byte': 1}}}]

Example 2:

---------- old text -----------

aaaaaa

bbbbbb

cccccc

--------------------------------

---------- new text -----------

aaaaaa

----------------------------------

In the old text, the modification starts at index 1 byte 0 and ends at index 2 byte 5.

In the new text, the modification starts at index 1. But the byte cannot be calculated

as the line is removed. Similarly the end of the modification also cannot be calculated.

In this case, the diff() function returns the following:

[{'from': {'start': {'idx': 1, 'byte': 0}, 'end': {'idx': 2, 'byte': 5}},
'to': {'start': {'idx': 1, 'byte': -1}, 'end': {'idx': -1, 'byte': -1}}}]

Example 3:

---------- old text -----------

aaaaaa

--------------------------------

---------- new text -----------

aaaaaa

bbbbbb

cccccc

----------------------------------

In the old text, the modification starts at index 1. But the starting byte and ending index/byte

cannot be calculated. In the new text, the change starts at index 1 byte 0 and ends at

index 2 byte 5. In this case, the diff() function returns the following:

[{'from': {'start': {'idx': 1, 'byte': -1}, 'end': {'idx': -1, 'byte': -1}},
'to': {'start': {'idx': 1, 'byte': 0}, 'end': {'idx': 2, 'byte': 5}}}]

Hopefully the above examples clarify your questions.

Regards,

Yegappan

vim-dev ML

unread,

Apr 30, 2023, 6:30:18 PM4/30/23

to vim/vim, vim-dev ML, Your activity

Hi Bram,

On Sun, Apr 30, 2023 at 2:01 PM vim-dev ML ***@***.***>

wrote:

> On Sun, Apr 30, 2023 at 1:26 PM Bram Moolenaar ***@***.***>
> wrote:
>
> >

> > > Add support for computing diff between two Lists of strings using a
> > > function.
> > >
> > > When using a LSP plugin, whenever a buffer is modified, the
> > > modifications are sent to the language server.
> > > To do this, the LSP plugins have implemented computing the diff in
> > > Vimscript or in Lua:
> > >
> > >
> >
> https://github.com/prabirshrestha/vim-lsp/blob/master/autoload/lsp/utils/diff.vim
> > > https://github.com/natebosch/vim-lsc/blob/master/autoload/lsc/diff.vim
> > >
> >
> https://github.com/neovim/neovim/blob/master/runtime/lua/vim/lsp/sync.lua
> > >
> > > These computations are expensive as they compare every character and
> > > line in a buffer.
> > > This new function will help in efficiently computing the diff.
> >

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

vim-dev ML

unread,

May 1, 2023, 1:22:47 PM5/1/23

to vim/vim, vim-dev ML, Your activity

> I am attaching some examples to describe the return value of this function.
>
> Example 1:
> ---------- old text -----------
> aaaaaa
> aaaaaa
> aaaaaa
> --------------------------------
>
> ---------- new text -----------
> aaaxx
> xxxxxx
> xxaaa
> ----------------------------------
>
> The modification starts at index 0 and byte 3 and ends at index 2 byte 2 in
> the
> old text. In the new text, the changes start at index 0 byte 3 and end at
> index
> 2 byte 1. In this case, the diff() function returns the following:
>
> [{'from': {'start': {'idx': 0, 'byte': 3}, 'end': {'idx': 2, 'byte': 2}},
> 'to': {'start': {'idx': 0, 'byte': 3}, 'end': {'idx': 2, 'byte': 1}}}]

So, how does one decide whether items were inserted or removed?
I think that's by computing the size in items of "from" and doing the
same for "to" and then they turn out to be equal. If someone cares
about items being inserted/deleted this is a bit of an indirect way to
find out.

> Example 2:
> ---------- old text -----------
> aaaaaa
> bbbbbb
> cccccc
> --------------------------------
>
> ---------- new text -----------
> aaaaaa
> ----------------------------------
>
> In the old text, the modification starts at index 1 byte 0 and ends at
> index 2 byte 5.
> In the new text, the modification starts at index 1. But the byte cannot
> be calculated
> as the line is removed. Similarly the end of the modification also cannot
> be calculated.
> In this case, the diff() function returns the following:
>
> [{'from': {'start': {'idx': 1, 'byte': 0}, 'end': {'idx': 2, 'byte': 5}},
> 'to': {'start': {'idx': 1, 'byte': -1}, 'end': {'idx': -1, 'byte': -1}}}]

Now how does one compute the number of items inserted/removed? Is it
true that when "to" has an "end" with index -1, that there are no items
at all? Not sure that is always true.

> Example 3:
> ---------- old text -----------
> aaaaaa
> --------------------------------
>
> ---------- new text -----------
> aaaaaa
> bbbbbb
> cccccc
> ----------------------------------
>
> In the old text, the modification starts at index 1. But the starting byte
> and ending index/byte
> cannot be calculated. In the new text, the change starts at index 1 byte 0
> and ends at
> index 2 byte 5. In this case, the diff() function returns the following:
>
> [{'from': {'start': {'idx': 1, 'byte': -1}, 'end': {'idx': -1, 'byte': -1}},
> 'to': {'start': {'idx': 1, 'byte': 0}, 'end': {'idx': 2, 'byte': 5}}}]

So the "from" position indicates "end of the list". And the "to" info
has two items, thus two items have been added.

I hope you see that, depending on what the user of the function wants to
know, the returned value can be hard to understand.

--
BEDEVERE: And that, my lord, is how we know the Earth to be banana-shaped.
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

vim-dev ML

unread,

May 1, 2023, 1:22:48 PM5/1/23

to vim/vim, vim-dev ML, Your activity

> > I suppose you made the structure of the return value work for your
> > purpose. It's a bit strange though: List of Dict of Dicts. And using a
> > special "end" item to indicate added or removed text (the help is not
> > clear, but I guess this is only used when a whole item was
> > added/removed, not when a word was added/removed).
>
> The "end" item contains the end index and the byte offset of the diff hunk.
> If the string was not present earlier, then the byte offset cannot be
> calculated. So it will be -1. When a word is removed, it contains the
> position of the end of the word.

Before or after removing the word? That's probably the main confusion
I'm seeing, that one has to translate the provided information into what
it really means.

> > Wouldn't it be easier to use when returning something similar to unified
> > diff:
> >
> > - Indicate range of deleted items (like a "-" line)
> > - Indicate range of inserted items (like a "+" line)
> > - Indicate a modified item (like a "-" line followed by a "+" line)
>
> The start and end index values in the returned Dict does contain the
> above information (range of deleted or inserted or modified items).

Well, in some form. My point is that it would be easier to interpret
the returned value if you put it in the form I suggested.

Note that some information goes missing when changing N lines/items to M
lines/items: some lines/items might only have a tiny change (e.g.
indent) while other lines/items may have been completely removed or
inserted. There was someone who looked into this (mostly for how to
display a diff hunk) but I haven't heard about it for a long time.
In case this work gets picked up, it would be nice if it can be included
in the returned value.

--
If you had to identify, in one word, the reason why the
human race has not achieved, and never will achieve, its
full potential, that word would be "meetings."

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 2, 2023, 12:58:21 AM5/2/23

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

f2c5843 Add count to the 'from' and 'to' Dict items in the return value

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 2, 2023, 1:18:07 AM5/2/23

to vim...@googlegroups.com, reply+ACY5DGEFD6X3ENFQM7...@reply.github.com, vim/vim, vim-dev ML, Your activity

Hi Bram,

I have updated the PR to include the "count" item to "from" and "to".

It indicates the number of items added or modified.

The way to interpret the "from" and "to" items is:

The text beginning at the "start" item and ending at the "end" item in

"from" is replaced by the text beginning at the "start" item and ending

at the "end" item in "to" in a diff hunk.

Hopefully the new "count" item will help with this.

Regards,

Yegappan

vim-dev ML

unread,

May 2, 2023, 1:18:24 AM5/2/23

to vim/vim, vim-dev ML, Your activity

Hi Bram,

On Mon, May 1, 2023 at 10:22 AM vim-dev ML ***@***.***>

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 2, 2023, 1:24:19 AM5/2/23

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

80ac1ef Update help

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

Christian Brabandt

unread,

May 2, 2023, 4:28:12 AM5/2/23

to vim/vim, vim-dev ML, Comment

there is also a related issue #4241
I am not sure the current design helps with that however. I find the result quite hard to understand.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

May 2, 2023, 11:08:36 AM5/2/23

to vim/vim, vim-dev ML, Comment

@chrisbra: The result contains the starting and ending position for the range of added/removed/modified lines in the original list and the new list. This will allow a plugin to get the new range of text that replaces the original range of text. Note that I have included both the line offset and the starting and ending byte offset for the change.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Christian Brabandt

unread,

May 2, 2023, 12:43:17 PM5/2/23

to vim/vim, vim-dev ML, Comment

I am still confused by the output, even after reading the help section several times. So index is the index into the list input and byte is the start byte from that list[index]? And count is the number of changes performed? That is a bit confusing, because if an item was added to the list, like here:

" few lines added at the end
  :echo diff(['abc'], ['abc', 'def', 'ghi'])
   [{'from': {'start': {'idx': 1, 'byte': -1}, 'end': {'idx': -1, 'byte': -1},
	      'count': 0},
     'to': {'start': {'idx': 1, 'byte': 0}, 'end': {'idx': 2, 'byte': 2},
	    'count': 2}}]

the count in the to dict means two additional items have been added (appended) to the list. But then if you take this example:

  " word is removed in the middle of a string
  :echo diff(['abc def ghi'], ['abc ghi'])
   [{'from': {'start': {'idx': 0, 'byte': 4}, 'end': {'idx': 0, 'byte': 7},
	      'count': 1},
     'to': {'start': {'idx': 0, 'byte': 4}, 'end': {'idx': 0, 'byte': -1},
	    'count': 1}}]

Then count does not mean to add or remove a single item to the list, but apparently that only one single change has been done, e.g. removing from byte 4 to byte 7.

Also, I guess the help could be a bit more precise, I suppose the -1 applies only for added items in the from Dict and removed items in the to Dict

    Each item in the returned List is a Dict containing
    information about a diff hunk.  Each Dict contains the
    following items:
        from  Dict with {list1} diff hunk information
        to    Dict with {list2} diff hunk information
    The "from" and "to" Dicts contain the following items:
        start  Dict containing the starting index and byte of the
         diff hunk.  The "byte" is -1 if text is added 
         (in the "from" Dict) or removed (in the "to" Dict).
        end    Dict containing the ending index and byte of the
         diff hunk.  The "idx" and "byte" are -1 if text is
         added (in the "from" Dict) or removed (in the "to" Dict).
        count  number of items added/removed/modified in this diff
         hunk.

And how would this look like if we have several hunks? Can we have this as example as well?

Also I would have thought you would need something like oldstart,oldcount newstart,newcount which basically is the hunk header from a unified diff. So perhaps we can just also add the raw diff output per hunk as well?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Bram Moolenaar

unread,

May 2, 2023, 3:54:04 PM5/2/23

to vim/vim, vim-dev ML, Comment

> I have updated the PR to include the "count" item to "from" and "to".
> It indicates the number of items added or modified.

The help for this is:

count number of items added/removed/modified in this diff
hunk.

I'm afraid this doesn't really help. First of al, I would expect the
info not inside the "from" and "to" Dicts, but besides them. And I
would expect a separate number for items added/deleted and items
modified. Something like:

Each item in the returned List is a Dict containing
information about a diff hunk. Each Dict contains the
following items:
from Dict with {list1} diff hunk information
to Dict with {list2} diff hunk information

extra Number of items added (positive) or
removed (negative)
modified Number of items that were modified

> The way to interpret the "from" and "to" items is:
> The text beginning at the "start" item and ending at the "end" item in
> "from" is replaced by the text beginning at the "start" item and ending
> at the "end" item in "to" in a diff hunk.

This needs the addition that the "to" item refers to the index and byte
in {list2}. This is important, since previous hunks may have added and
removed items.

This part of the help is confusing and probably not right:

The "byte" is -1 if text is added or removed.

In the "from" dict this doesn't make sense. If text is added (and
nothing modified) then "start" and "end should be equal, the position
where the text is inserted. If text is removed then "start" gives the
start of the removed text and "end" the end. The text exists, thus
there is no reason to use -1.

It's important to get this right. We had similar issues to solve with
the listener callback function. See the help for listner_add(). It
might actually be helpful if the information is in the same form.

--
The acknowledged parents of reengineering are Michael Hammer and James Champy.
When I say they're the "parents" I don't mean they had sex - and I apologize
for making you think about it. I mean they wrote the best-selling business
book _Reengineering the Corporation_, which was published in 1993.
Businesses flocked to reengineering like frat boys to a drunken
cheerleader. (This analogy wasn't necessary, but I'm trying to get my mind
off that Hammer and Champy thing.)
(Scott Adams - The Dilbert principle)

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Bram Moolenaar

unread,

May 2, 2023, 7:39:11 PM5/2/23

to vim...@googlegroups.com, Yegappan Lakshmanan, reply+ACY5DGDIGR7U3GG7EU...@reply.github.com

[resend, picky postmaster refused the message]

> I am attaching some examples to describe the return value of this function.
>
> Example 1:
> ---------- old text -----------
> aaaaaa
> aaaaaa
> aaaaaa
> --------------------------------
>
> ---------- new text -----------
> aaaxx
> xxxxxx
> xxaaa
> ----------------------------------
>
> The modification starts at index 0 and byte 3 and ends at index 2 byte 2 in
> the
> old text. In the new text, the changes start at index 0 byte 3 and end at
> index
> 2 byte 1. In this case, the diff() function returns the following:
>
> [{'from': {'start': {'idx': 0, 'byte': 3}, 'end': {'idx': 2, 'byte': 2}},
> 'to': {'start': {'idx': 0, 'byte': 3}, 'end': {'idx': 2, 'byte': 1}}}]

So, how does one decide whether items were inserted or removed?
I think that's by computing the size in items of "from" and doing the
same for "to" and then they turn out to be equal. If someone cares
about items being inserted/deleted this is a bit of an indirect way to
find out.

> Example 2:
> ---------- old text -----------
> aaaaaa
> bbbbbb
> cccccc
> --------------------------------
>
> ---------- new text -----------
> aaaaaa
> ----------------------------------
>
> In the old text, the modification starts at index 1 byte 0 and ends at
> index 2 byte 5.
> In the new text, the modification starts at index 1. But the byte cannot
> be calculated
> as the line is removed. Similarly the end of the modification also cannot
> be calculated.
> In this case, the diff() function returns the following:
>
> [{'from': {'start': {'idx': 1, 'byte': 0}, 'end': {'idx': 2, 'byte': 5}},
> 'to': {'start': {'idx': 1, 'byte': -1}, 'end': {'idx': -1, 'byte': -1}}}]

Now how does one compute the number of items inserted/removed? Is it
true that when "to" has an "end" with index -1, that there are no items
at all? Not sure that is always true.

> Example 3:
> ---------- old text -----------
> aaaaaa
> --------------------------------
>
> ---------- new text -----------
> aaaaaa
> bbbbbb
> cccccc
> ----------------------------------
>
> In the old text, the modification starts at index 1. But the starting byte
> and ending index/byte
> cannot be calculated. In the new text, the change starts at index 1 byte 0
> and ends at
> index 2 byte 5. In this case, the diff() function returns the following:
>
> [{'from': {'start': {'idx': 1, 'byte': -1}, 'end': {'idx': -1, 'byte': -1}},
> 'to': {'start': {'idx': 1, 'byte': 0}, 'end': {'idx': 2, 'byte': 5}}}]

So the "from" position indicates "end of the list". And the "to" info
has two items, thus two items have been added.

I hope you see that, depending on what the user of the function wants to
know, the returned value can be hard to understand.

--
BEDEVERE: And that, my lord, is how we know the Earth to be banana-shaped.
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\

vim-dev ML

unread,

May 2, 2023, 7:39:33 PM5/2/23

to vim/vim, vim-dev ML, Your activity

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\

/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

May 2, 2023, 7:46:26 PM5/2/23

to vim...@googlegroups.com, Yegappan Lakshmanan, reply+ACY5DGCSX7R5F3QVXR...@reply.github.com

[resend, picky postmaster refused the message]

> > I suppose you made the structure of the return value work for your
> > purpose. It's a bit strange though: List of Dict of Dicts. And using a
> > special "end" item to indicate added or removed text (the help is not
> > clear, but I guess this is only used when a whole item was
> > added/removed, not when a word was added/removed).
>
> The "end" item contains the end index and the byte offset of the diff hunk.
> If the string was not present earlier, then the byte offset cannot be
> calculated. So it will be -1. When a word is removed, it contains the
> position of the end of the word.

Before or after removing the word? That's probably the main confusion
I'm seeing, that one has to translate the provided information into what
it really means.

> > Wouldn't it be easier to use when returning something similar to unified
> > diff:
> >
> > - Indicate range of deleted items (like a "-" line)
> > - Indicate range of inserted items (like a "+" line)
> > - Indicate a modified item (like a "-" line followed by a "+" line)
>
> The start and end index values in the returned Dict does contain the
> above information (range of deleted or inserted or modified items).

Well, in some form. My point is that it would be easier to interpret
the returned value if you put it in the form I suggested.

Note that some information goes missing when changing N lines/items to M
lines/items: some lines/items might only have a tiny change (e.g.
indent) while other lines/items may have been completely removed or
inserted. There was someone who looked into this (mostly for how to
display a diff hunk) but I haven't heard about it for a long time.
In case this work gets picked up, it would be nice if it can be included
in the returned value.

--
If you had to identify, in one word, the reason why the
human race has not achieved, and never will achieve, its
full potential, that word would be "meetings."

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\

vim-dev ML

unread,

May 2, 2023, 7:46:44 PM5/2/23

to vim/vim, vim-dev ML, Your activity

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\

/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 3, 2023, 1:06:04 AM5/3/23

to vim...@googlegroups.com, reply+ACY5DGDB2OCVNIHPW5...@reply.github.com, vim/vim, vim-dev ML, Comment

Hi Christian,

On Tue, May 2, 2023 at 9:43 AM Christian Brabandt <vim-dev...@256bit.org> wrote:

I am still confused by the output, even after reading the help section several times. So index is the index into the list input and byte is the start byte from that list[index]?

Yes.

And count is the number of changes performed?

Yes. "count" is the number of items in the list added or removed or modified in the diff hunk.

That is a bit confusing, because if an item was added to the list, like here:
" few lines added at the end
  :echo diff(['abc'], ['abc', 'def', 'ghi'])
   [{'from': {'start': {'idx': 1, 'byte': -1}, 'end': {'idx': -1, 'byte': -1},
	      'count': 0},
     'to': {'start': {'idx': 1, 'byte': 0}, 'end': {'idx': 2, 'byte': 2},
	    'count': 2}}]
the count in the to dict means two additional items have been added (appended) to the list. But then if you take this example:
  " word is removed in the middle of a string
  :echo diff(['abc def ghi'], ['abc ghi'])
   [{'from': {'start': {'idx': 0, 'byte': 4}, 'end': {'idx': 0, 'byte': 7},
	      'count': 1},
     'to': {'start': {'idx': 0, 'byte': 4}, 'end': {'idx': 0, 'byte': -1},
	    'count': 1}}]
Then count does not mean to add or remove a single item to the list, but apparently that only one single change has been done, e.g. removing from byte 4 to byte 7.

In this example, there is only one item in the "from" and "to" lists. As this item is modified, the "count"

is 1 in both the "from" and "to" dicts.

Also, I guess the help could be a bit more precise, I suppose the -1 applies only for added items in the from Dict and removed items in the to Dict

I will update the help text to clarify this.

    Each item in the returned List is a Dict containing
    information about a diff hunk.  Each Dict contains the
    following items:
        from  Dict with {list1} diff hunk information
        to    Dict with {list2} diff hunk information
    The "from" and "to" Dicts contain the following items:
        start  Dict containing the starting index and byte of the
         diff hunk.  The "byte" is -1 if text is added 
         (in the "from" Dict) or removed (in the "to" Dict).
        end    Dict containing the ending index and byte of the
         diff hunk.  The "idx" and "byte" are -1 if text is
         added (in the "from" Dict) or removed (in the "to" Dict).
        count  number of items added/removed/modified in this diff
         hunk.

And how would this look like if we have several hunks? Can we have this as example as well?

Yes. I will add additional examples for multiple diff hunks.

Also I would have thought you would need something like oldstart,oldcount newstart,newcount which basically is the hunk header from a unified diff. So perhaps we can just also add the raw diff output per hunk as well?

The raw diff output from xdiff contains the line number and count for both the original

and new lines. Are you referring to including these in the output?

Regards,

Yegappan

vim-dev ML

unread,

May 3, 2023, 1:06:20 AM5/3/23

to vim/vim, vim-dev ML, Your activity

Hi Christian,

On Tue, May 2, 2023 at 9:43 AM Christian Brabandt ***@***.***>

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Christian Brabandt

unread,

May 3, 2023, 2:46:02 AM5/3/23

to vim/vim, vim-dev ML, Comment

The raw diff output from xdiff contains the line number and count for both
the original
and new lines. Are you referring to including these in the output?

Yes the raw diff output. I think it is okay to include it, since you get it anyway from the xdl function.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you commented.

Yegappan Lakshmanan

unread,

May 3, 2023, 11:33:07 AM5/3/23

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

8298a2a Add the 'added' and 'modified' items to the return value and remove the 'count' item. Add an example to show multiple diff hunks

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 3, 2023, 11:40:17 AM5/3/23

to vim...@googlegroups.com, reply+ACY5DGFHRJ725KLZR4...@reply.github.com, vim/vim, vim-dev ML, Comment

Hi Bram,

On Tue, May 2, 2023 at 12:54 PM Bram Moolenaar <vim-dev...@256bit.org> wrote:

> I have updated the PR to include the "count" item to "from" and "to".
> It indicates the number of items added or modified.

The help for this is:

count number of items added/removed/modified in this diff
hunk.

I'm afraid this doesn't really help. First of al, I would expect the
info not inside the "from" and "to" Dicts, but besides them. And I
would expect a separate number for items added/deleted and items
modified. Something like:

Each item in the returned List is a Dict containing
information about a diff hunk. Each Dict contains the
following items:
from Dict with {list1} diff hunk information
to Dict with {list2} diff hunk information
extra Number of items added (positive) or
removed (negative)
modified Number of items that were modified

I have updated the PR to add the "added" and "modified" items for every diff hunk.

The "added" item will be positive if new strings are added to the original List

and will be negative if strings are removed from the original List. The "modified"

item indicates the number of strings modified from the original List.

I have also removed the "count" item.

Regards,

Yegappan

vim-dev ML

unread,

May 3, 2023, 11:40:32 AM5/3/23

to vim/vim, vim-dev ML, Your activity

Hi Bram,

On Tue, May 2, 2023 at 12:54 PM Bram Moolenaar ***@***.***>

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 3, 2023, 11:41:04 AM5/3/23

to vim...@googlegroups.com, reply+ACY5DGDB2OCVNIHPW5...@reply.github.com, vim/vim, vim-dev ML, Comment

Hi Christian,

On Tue, May 2, 2023 at 9:43 AM Christian Brabandt

I have updated the help with an example for multiple diff hunks.

Regards,
Yegappan

vim-dev ML

unread,

May 3, 2023, 11:41:23 AM5/3/23

to vim/vim, vim-dev ML, Your activity

—
Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

May 3, 2023, 5:55:53 PM5/3/23

to vim...@googlegroups.com, vim-dev ML

Thanks for using the suggestions.

One more thing to keep in mind: Creating a Dict and putting items in it
has quite bit of overhead. The two levels of Dicts means the function
is not going to be efficient. For short lists this might not matter
much, but what if it is used on a long list? Perhaps using one Dict per
hunk would be OK without making it less usable. Might even make it
simpler to understand.

--
If the Universe is constantly expanding, why can't I ever find a parking space?

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\

Yegappan Lakshmanan

unread,

May 4, 2023, 12:32:23 AM5/4/23

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

0a108f1 Use a single level Dict for the 'from' and 'to' items in the diff() return value

—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 4, 2023, 12:41:01 AM5/4/23

to vim...@googlegroups.com, vim-dev ML

Hi Bram,

> has quite a bit of overhead. The two levels of Dicts means the function

> is not going to be efficient. For short lists this might not matter
> much, but what if it is used on a long list? Perhaps using one Dict per
> hunk would be OK without making it less usable. Might even make it
> simpler to understand.
>

I have updated the PR to reduce one level in the Dict. I cannot make it a
single flat Dict as we need to separate out the "from" and "to"
position information.

Regards,
Yegappan

Yegappan Lakshmanan

unread,

May 4, 2023, 1:01:43 AM5/4/23

to vim...@googlegroups.com, reply+ACY5DGFHRJ725KLZR4...@reply.github.com, vim/vim, vim-dev ML, Comment

Hi Bram,

On Tue, May 2, 2023 at 12:54 PM Bram Moolenaar <vim-dev...@256bit.org> wrote:

This part of the help is confusing and probably not right:

The "byte" is -1 if text is added or removed.

In the "from" dict this doesn't make sense. If text is added (and
nothing modified) then "start" and "end should be equal, the position
where the text is inserted. If text is removed then "start" gives the
start of the removed text and "end" the end. The text exists, thus
there is no reason to use -1.

If the "endbyte" is not -1, then removing a character cannot be distinguished

from modifying a character as shown below:

Modifying the first character of a string:
:echo diff(['abc'], ['xbc'])
{'from': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': 0},
'to': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': 0},
'added': 0, 'modified': 1}

Adding a first character to a string:
:echo diff(['bc'], ['abc'])
{'from': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': -1},
'to': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': 0},
'added': 0, 'modified': 1}

In this case, if the "from.endbyte" is 0 instead of -1, then this case cannot
be differentiated from the previous example.

Removing the first character from a string:
:echo diff(['abc'], ['bc'])
{'from': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': 0},
'to': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': -1},
'added': 0, 'modified': 1}

In this case, if the "to.endbyte" is 0 instead of -1, then this case cannot
be differentiated from the first example.

There are some more examples where the "endbyte" or the "startbyte" is -1:

Removing a character from the middle of a string:

:echo diff(['abc'], ['ac'])
{'from': {'startidx': 0, 'startbyte': 1, 'endidx': 0, 'endbyte': 1},
'to': {'startidx': 0, 'startbyte': 1, 'endidx': 0, 'endbyte': -1},
'added': 0, 'modified': 1}

Adding a character to the middle of a string:
:echo diff(['ac'], ['abc'])
{'from': {'startidx': 0, 'startbyte': 1, 'endidx': 0, 'endbyte': -1},
'to': {'startidx': 0, 'startbyte': 1, 'endidx': 0, 'endbyte': 1},
'added': 0, 'modified': 1}

Removing the last character from a string:
:echo diff(['abc'], ['ab'])
{'from': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': 2},
'to': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': -1},
'added': 0, 'modified': 1}

Adding a character to the end of a string:
:echo diff(['ab'], ['abc'])
{'from': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': -1},
'to': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': 2},
'added': 0, 'modified': 1}

Modifying the last character in a string:
:echo diff(['abc'], ['abx'])
{'from': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': 2},
'to': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': 2},
'added': 0, 'modified': 1}

Adding a new item:
:echo diff(['a'], ['a', 'b'])
{'from': {'startidx': 1, 'startbyte': -1, 'endidx': -1, 'endbyte': -1},
'to': {'startidx': 1, 'startbyte': 0, 'endidx': 1, 'endbyte': 0},
'added': 1, 'modified': 0}

Removing an item from the end:
:echo diff(['a', 'b'], ['a'])
{'from': {'startidx': 1, 'startbyte': 0, 'endidx': 1, 'endbyte': 0},
'to': {'startidx': 1, 'startbyte': -1, 'endidx': -1, 'endbyte': -1},
'added': -1, 'modified': 0}

Regards,

Yegappan

vim-dev ML

unread,

May 4, 2023, 1:02:00 AM5/4/23

to vim/vim, vim-dev ML, Your activity

Hi Bram,

On Tue, May 2, 2023 at 12:54 PM Bram Moolenaar ***@***.***>

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 5, 2023, 12:44:27 AM5/5/23

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

1c8ea43 Update help text

—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 5, 2023, 12:46:08 AM5/5/23

to vim/vim, vim-dev ML, Push

@yegappan pushed 2 commits.

127360b Add support for the diff() function using the builtin diff support.
c0e594f Update help text

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

May 5, 2023, 1:12:48 PM5/5/23

to vim...@googlegroups.com, Yegappan Lakshmanan, reply+ACY5DGFHRJ725KLZR4...@reply.github.com

> > This part of the help is confusing and probably not right:
> >
> > The "byte" is -1 if text is added or removed.
> >
> > In the "from" dict this doesn't make sense. If text is added (and
> > nothing modified) then "start" and "end should be equal, the position
> > where the text is inserted. If text is removed then "start" gives the
> > start of the removed text and "end" the end. The text exists, thus
> > there is no reason to use -1.
> >
> >
> >
> If the "endbyte" is not -1, then removing a character cannot be
> distinguished
> from modifying a character as shown below:
>
> Modifying the first character of a string:
> :echo diff(['abc'], ['xbc'])
> {'from': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': 0},
> 'to': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': 0},
> 'added': 0, 'modified': 1}
>
> Adding a first character to a string:
> :echo diff(['bc'], ['abc'])
> {'from': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': -1},
> 'to': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': 0},
> 'added': 0, 'modified': 1}
>
> In this case, if the "from.endbyte" is 0 instead of -1, then this case
> cannot be differentiated from the previous example.

When adding a character then to "to" "endbyte" should be a positive
value, since the text is longer than it was before. Thus the
from-endbyte should be zero, indicating that the "from" text was empty,
while the to-endbyte would be one (or as many bytes as the new character
occupies), indicating the text that was inserted. This matches what
would be highlighted with DiffText.

> Removing the first character from a string:
> :echo diff(['abc'], ['bc'])
> {'from': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': 0},
> 'to': {'startidx': 0, 'startbyte': 0, 'endidx': 0, 'endbyte': -1},
> 'added': 0, 'modified': 1}
>
> In this case, if the "to.endbyte" is 0 instead of -1, then this case cannot
> be differentiated from the first example.

The from-endbyte here should be positive, indicating the length of the
text that was removed. The to-endbyte should be zero, indicating that
after the change the text is no longer there.

If no items were added or deleted, then "endbyte - startbyte" should
indicate the text affected. If this is zero, then it means that text is
not present. The other entry (from or to) then will be non-zero and
indicates the text removed or added. If both are non-zero it indicates
text was changed.

> There are some more examples where the "endbyte" or the "startbyte" is -1:
>
> Removing a character from the middle of a string:
> :echo diff(['abc'], ['ac'])
> {'from': {'startidx': 0, 'startbyte': 1, 'endidx': 0, 'endbyte': 1},
> 'to': {'startidx': 0, 'startbyte': 1, 'endidx': 0, 'endbyte': -1},
> 'added': 0, 'modified': 1}

Here from-endbyte should be 2 (the "b" was deleted) and to-endbyte
should be 1 (same as to-startbyte, empty text, indicating the "b" that
was removed).

> Adding a character to the middle of a string:
> :echo diff(['ac'], ['abc'])
> {'from': {'startidx': 0, 'startbyte': 1, 'endidx': 0, 'endbyte': -1},
> 'to': {'startidx': 0, 'startbyte': 1, 'endidx': 0, 'endbyte': 1},
> 'added': 0, 'modified': 1}

Here from-endbyte should be 1 and to-endbyte should be 2.

> Removing the last character from a string:
> :echo diff(['abc'], ['ab'])
> {'from': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': 2},
> 'to': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': -1},
> 'added': 0, 'modified': 1}

Here from-endbyte should be 3 and to-endbyte should be 2.

> Adding a character to the end of a string:
> :echo diff(['ab'], ['abc'])
> {'from': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': -1},
> 'to': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': 2},
> 'added': 0, 'modified': 1}

Here from-endbyte should be 2 and to-endbyte should be 3.

> Modifying the last character in a string:
> :echo diff(['abc'], ['abx'])
> {'from': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': 2},
> 'to': {'startidx': 0, 'startbyte': 2, 'endidx': 0, 'endbyte': 2},
> 'added': 0, 'modified': 1}

Here both from-endbyte and to-endbyte should be 3.

> Adding a new item:
> :echo diff(['a'], ['a', 'b'])
> {'from': {'startidx': 1, 'startbyte': -1, 'endidx': -1, 'endbyte': -1},
> 'to': {'startidx': 1, 'startbyte': 0, 'endidx': 1, 'endbyte': 0},
> 'added': 1, 'modified': 0}

When adding an item the byte index doesn't have a meaning, we only need
to know the item index. Here from-endidx should be 1, indicating the
index of where the new item was added.

Having the to-start and to-end be equal suggests that the text is empty.
Only adjusting "endbyte" to the length of the text makes it difficult to
distinguish between adding a whole item and inserting text in an
existing item. Making the "end" exclusive would be simpler, then
to-endidx can be 2 and to-endbyte can remain zero.

> Removing an item from the end:
> :echo diff(['a', 'b'], ['a'])
> {'from': {'startidx': 1, 'startbyte': 0, 'endidx': 1, 'endbyte': 0},
> 'to': {'startidx': 1, 'startbyte': -1, 'endidx': -1, 'endbyte': -1},
> 'added': -1, 'modified': 0}

Also applying the exclusive index here means from-endidx should be 2.
That avoids having from-start and from-end having the same values, thus
it looks like nothing is deleted. to-endidx should be the same as
to-startidx, indicating there is nothing left. The byte indexes in "to"
should also be zero, so it is clear that no text is left.

Hopefully this works out and I actually expect the computations to be
simpler and consistent.

--
Q: What's a light-year?
A: One-third less calories than a regular year.

vim-dev ML

unread,

May 5, 2023, 1:13:05 PM5/5/23

to vim/vim, vim-dev ML, Your activity

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\

/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

May 5, 2023, 1:38:20 PM5/5/23

to vim...@googlegroups.com, Yegappan Lakshmanan, vim-dev ML

> > One more thing to keep in mind: Creating a Dict and putting items in it
> > has quite a bit of overhead. The two levels of Dicts means the function
> > is not going to be efficient. For short lists this might not matter
> > much, but what if it is used on a long list? Perhaps using one Dict per
> > hunk would be OK without making it less usable. Might even make it
> > simpler to understand.
> >
>
> I have updated the PR to reduce one level in the Dict. I cannot make it a
> single flat Dict as we need to separate out the "from" and "to"
> position information.

Hmm, this still uses a Dict of Dicts.

It could be flattened by using a "from-" and "to-" prefix instead of
using separate Dicts. Is that getting too ugly?

There are several alternatives to represent the same information. E.g.,
instead of using a "start index" and "end index" it could be "index" and
"item count". You would then have:
from_index
from_count
to_index
to_count

That looks OK to me. Does this also work for the byte values? How
about:
from_byte
from_length
to_byte
to_length

It's probably a matter of taste what to call these, but at least taking
out a level of Dict nesting will make it simpler and more efficient.
Perhaps trying this out with the examples you can see if this also works
for corner cases.

--
I used to be indecisive, now I'm not sure.

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\

Yegappan Lakshmanan

unread,

May 6, 2023, 12:12:45 AM5/6/23

to Bram Moolenaar, vim...@googlegroups.com, vim-dev ML

Hi Bram,

On Fri, May 5, 2023 at 10:38 AM Bram Moolenaar <Br...@moolenaar.net> wrote:
>
>
> > > One more thing to keep in mind: Creating a Dict and putting items in it
> > > has quite a bit of overhead. The two levels of Dicts means the function
> > > is not going to be efficient. For short lists this might not matter
> > > much, but what if it is used on a long list? Perhaps using one Dict per
> > > hunk would be OK without making it less usable. Might even make it
> > > simpler to understand.
> > >
> >
> > I have updated the PR to reduce one level in the Dict. I cannot make it a
> > single flat Dict as we need to separate out the "from" and "to"
> > position information.
>
> Hmm, this still uses a Dict of Dicts.
>
> It could be flattened by using a "from-" and "to-" prefix instead of
> using separate Dicts. Is that getting too ugly?
>

The field names will be long (from-startidx, from-startbyte, from-endidx,
from-endbyte, to-startidx, to-startbyte, to-endidx and to-endbyte).

>
> There are several alternatives to represent the same information. E.g.,
> instead of using a "start index" and "end index" it could be "index" and
> "item count". You would then have:
> from_index
> from_count
> to_index
> to_count
>
> That looks OK to me. Does this also work for the byte values? How
> about:
> from_byte
> from_length
>

The modification can end in a line different from the starting line. So the
from_length field needs to count the number of bytes in all the lines between
the starting line and the ending line (including the newline characters).
Then it will be difficult to compute the column number of the ending change
in the last line. We need to use the from-startbyte and from-endbyte fields.

Regards,
Yegappan

rickhowe

unread,

May 6, 2023, 7:44:25 AM5/6/23

to vim/vim, vim-dev ML, Comment

I am interested in this builtin function as I have developed some plugins to support word/character level diff in lines (this) and range/area selectable diff in buffers (this). I implemented the O(NP) comparison algorithm in vim script, but I have been looking for a builtin function like yours. Several month ago, I noticed that neovim has vim.diff() function so I switched to use it for neovim in my script.

Honestly, I am confused about a return value of your function. vim.diff() actually returns diff in a unified format as default, from which is very easy to generate a shortest edit script (SES).

For example, how easy can I generate this SES from your return value?

list1 = ['a', 'b', 'c', 'a', 'b', 'b', 'a']
list2 = ['c', 'b', 'a', 'b', 'a', 'c']
SES = '--=-=+==+'

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

May 6, 2023, 1:34:37 PM5/6/23

to vim...@googlegroups.com, reply+ACY5DGDV63DPEKQP33...@reply.github.com, vim/vim, vim-dev ML, Comment

Hi,

On Sat, May 6, 2023 at 4:44 AM rickhowe <vim-dev...@256bit.org> wrote:

I am interested in this builtin function as I have developed some plugins to support word/character level diff in lines (this) and range/area selectable diff in buffers (this). I implemented the O(NP) comparison algorithm in vim script, but I have been looking for a builtin function like yours. Several month ago, I noticed that neovim has vim.diff() function so I switched to use it for neovim in my script.

Honestly, I am confused about a return value of your function. vim.diff() actually returns diff in a unified format as default, from which is very easy to generate a shortest edit script (SES).

For example, how easy can I generate this SES from your return value?

Does your plugin use the starting line number and count in the original and new files

for a diff hunk (just like shown in a unified diff output)?

If it does, then the diff() function return value contains these values. To get these values,

you can use [from.startidx, from.endidx - from.startidx + 1, to.startidx, to.endidx - to.startidx + 1]

for a diff hunk. I see that the Neovim vim.diff() function returns these values when

using the {result_type = 'indices'} option.

Regards,

Yegappan

vim-dev ML

unread,

May 6, 2023, 1:34:51 PM5/6/23

to vim/vim, vim-dev ML, Your activity

Hi,

On Sat, May 6, 2023 at 4:44 AM rickhowe ***@***.***> wrote:

> I am interested in this builtin function as I have developed some plugins
> to support word/character level diff in lines (this

> <https://github.com/rickhowe/diffchar.vim>) and range/area selectable
> diff in buffers (this <https://github.com/rickhowe/spotdiff.vim>). I

> implemented the O(NP) comparison algorithm in vim script, but I have been
> looking for a builtin function like yours. Several month ago, I noticed
> that neovim has vim.diff() function so I switched to use it for neovim in
> my script.
>
> Honestly, I am confused about a return value of your function. vim.diff()
> actually returns diff in a unified format as default, from which is very
> easy to generate a shortest edit script (SES).
>
> For example, how easy can I generate this SES from your return value?
>

Does your plugin use the starting line number and count in the original and
new files
for a diff hunk (just like shown in a unified diff output)?

If it does, then the diff() function return value contains these values.
To get these values,
you can use [from.startidx, from.endidx - from.startidx + 1, to.startidx,
to.endidx - to.startidx + 1]
for a diff hunk. I see that the Neovim vim.diff() function returns these
values when
using the {result_type = 'indices'} option.

Regards,
Yegappan

> list1 = ['a', 'b', 'c', 'a', 'b', 'b', 'a']
> list2 = ['c', 'b', 'a', 'b', 'a', 'c']
> SES = '--=-=+==+'
>
>
>

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

May 11, 2023, 1:00:09 AM5/11/23

to vim...@googlegroups.com, reply+ACY5DGDV63DPEKQP33...@reply.github.com, vim/vim, vim-dev ML, Comment

On Sat, May 6, 2023 at 4:44 AM rickhowe <vim-dev...@256bit.org> wrote:

I am interested in this builtin function as I have developed some plugins to support word/character level diff in lines (this) and range/area selectable diff in buffers (this). I implemented the O(NP) comparison algorithm in vim script, but I have been looking for a builtin function like yours. Several month ago, I noticed that neovim has vim.diff() function so I switched to use it for neovim in my script.

Honestly, I am confused about a return value of your function. vim.diff() actually returns diff in a unified format as default, from which is very easy to generate a shortest edit script (SES).

The Neovim vim.diff() function (https://neovim.io/doc/user/lua.html#lua-diff) returns the

diff between two strings. It returns either a String or a List depending on the "result_type"

option value. If "result_type" is set to "unified" (which is the default), then this function returns

a String which is the unified diff output between the two string arguments. If the "result_type"

option is set to "indices", then vim.diff() returns a List with 4 numbers (starting line number in

the first string, line count, starting line number in the second string, line count).

The return value of vim.diff() is not fully suitable for getting the change information needed for

the LSP DidChangeTextDocument notification (https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocumentContentChangeEvent).

For this notification, we need the range of characters in the old text (starting and ending

text position) and the new text that replaces the old text.

I can modify the new diff() function to return a value similar to that returned by the

Neovim vim.diff() function. i.e. either a string with the unified diff or a List with the indices.

We can then add an additional option ("range" or "extended" or "position") to return

the starting and ending position of the change in the old and the new text.

Bram: What do you think about this approach?

Regards,

Yegappan

vim-dev ML

unread,

May 11, 2023, 1:00:24 AM5/11/23

to vim/vim, vim-dev ML, Your activity

On Sat, May 6, 2023 at 4:44 AM rickhowe ***@***.***> wrote:

> I am interested in this builtin function as I have developed some plugins
> to support word/character level diff in lines (this

> <https://github.com/rickhowe/diffchar.vim>) and range/area selectable
> diff in buffers (this <https://github.com/rickhowe/spotdiff.vim>). I

—
Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.

rickhowe

unread,

May 11, 2023, 4:08:58 AM5/11/23

to vim/vim, vim-dev ML, Comment

It is much better to provide unified diff format. For example, we can simply use a builtin func instead of external diff command in diffexpr on neovim now:

set diffexpr=MyDiff()
function! MyDiff()
  let f1 = join(readfile(v:fname_in), "\n") . "\n"
  let f2 = join(readfile(v:fname_new), "\n") . "\n"
  call writefile(split(v:lua.vim.diff(f1, f2), "\n"), v:fname_out)
endfunction

No need to modify the result. Also to generate SES:

function! SES(l1, l2)
  let l1 = join(a:l1, "\n") . "\n"
  let l2 = join(a:l2, "\n") . "\n"
  let vd = v:lua.vim.diff(l1, l2, {'result_type': 'indices'})
  let ses = ''
  let p1 = 1
  for [s1, c1, s2, c2] in vd + [[len(a:l1), 0, 0, 0]]
    if c1 == 0 | let s1 += 1 | endif
    let ses .= repeat('=', s1 - p1) . repeat('-', c1) . repeat('+', c2)
    let p1 = s1 + c1
  endfor
  return ses
endfunction

It is simple enough to make use of it.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you commented.

Yegappan Lakshmanan

unread,

May 11, 2023, 9:35:30 AM5/11/23

to vim/vim, vim-dev ML, Comment

@rickhowe For these two use cases that you have cited, the unified diff output format and the diff hunk indices output format are sufficient. But as I described in my earlier reply, to get the range of changed text for the language server, these two output information is not sufficient. We need the start and end position (line and column number) of the change in the original text and the corresponding new text.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Bram Moolenaar

unread,

May 11, 2023, 11:08:23 AM5/11/23

to vim...@googlegroups.com, Yegappan Lakshmanan, reply+ACY5DGDV63DPEKQP33...@reply.github.com

Yegappan wrote:

> On Sat, May 6, 2023 at 4:44 AM rickhowe <vim-dev...@256bit.org> wrote:
>
> > I am interested in this builtin function as I have developed some plugins
> > to support word/character level diff in lines (this

> > <https://github.com/rickhowe/diffchar.vim>) and range/area selectable
> > diff in buffers (this <https://github.com/rickhowe/spotdiff.vim>). I

The unified diff will contain less information, it only shows
added/removed/modified lines, not column information. This then needs
to be parsed, thus a sequence of digits converted to a number. For just
getting the information this adds overhead, thus it would only be useful
if a unified diff is actually needed.

Also being able to make a diff between two string with NL characters
seems hardly useful. I would not know where such a string comes from.
When using buffer lines we have a list of strings. If needed split()
can be used before passing the text to diff().

--
How To Keep A Healthy Level Of Insanity:
13. Go to a poetry recital and ask why the poems don't rhyme.

Bram Moolenaar

unread,

May 11, 2023, 11:08:44 AM5/11/23

to vim/vim, vim-dev ML, Comment

> It is much better to provide unified diff format. For example, we can
> simply use a builtin func instead of external diff command in diffexpr
> on neovim now:
>
> ```
> set diffexpr=MyDiff()
> function! MyDiff()
> let f1 = join(readfile(v:fname_in), "\n") . "\n"
> let f2 = join(readfile(v:fname_new), "\n") . "\n"
> call writefile(split(v:lua.vim.diff(f1, f2), "\n"), v:fname_out)
> endfunction

> ```

Using join() is very inefficient, lots of memory allocation and moving
text around. The 'diffexpr' was really intended for using an external
program, that's why it uses files. When using some internal diff
implementation going through files is a weird detour. Thus this example
is not relevant.

> No need to modify the result. Also to generate SES:
>
> ```
> function! SES(l1, l2)
> let l1 = join(a:l1, "\n") . "\n"
> let l2 = join(a:l2, "\n") . "\n"
> let vd = v:lua.vim.diff(l1, l2, {'result_type': 'indices'})
> let ses = ''
> let p1 = 1
> for [s1, c1, s2, c2] in vd + [[len(a:l1), 0, 0, 0]]
> if c1 == 0 | let s1 += 1 | endif
> let ses .= repeat('=', s1 - p1) . repeat('-', c1) . repeat('+', c2)
> let p1 = s1 + c1
> endfor
> return ses
> endfunction
> ```
> It is simple enough to make use of it.

What is SES?

--
Giving a commit hash to refer to a patch is like giving longitude and
lattitude to refer to a city.

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\

/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

vim-dev ML

unread,

May 11, 2023, 11:08:44 AM5/11/23

to vim/vim, vim-dev ML, Your activity

Yegappan wrote:

/// Bram Moolenaar -- ***@***.*** -- http://www.Moolenaar.net \\\
/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

—

Reply to this email directly, view it on GitHub, or unsubscribe.

You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

May 18, 2023, 5:24:07 AM5/18/23

to vim...@googlegroups.com, Yegappan Lakshmanan, vim-dev ML

[resend, picky postmaster rejected the message]

> > There are several alternatives to represent the same information. E.g.,
> > instead of using a "start index" and "end index" it could be "index" and
> > "item count". You would then have:
> > from_index
> > from_count
> > to_index
> > to_count
> >
> > That looks OK to me. Does this also work for the byte values? How
> > about:
> > from_byte
> > from_length
>
> The modification can end in a line different from the starting line. So the
> from_length field needs to count the number of bytes in all the lines between
> the starting line and the ending line (including the newline characters).
> Then it will be difficult to compute the column number of the ending change
> in the last line. We need to use the from-startbyte and from-endbyte fields.

Right, if the "from" spans more than one line then "from_length" isn't
what we want. Using "from-startbyte" and "from-endbyte" should be OK,
it's just that the names are a bit long. This implies we should also
have "to-startbyte" and "to-endbyte", right?

In case the change only covers one item, the length can easily be
computed from "endbyte - startbyte".

Anyway, I hope this avoids the nested Dicts without making it
complicated.

--
All good vision statements are created by groups of people with bloated
bladders who would rather be doing anything else.
(Scott Adams - The Dilbert principle)

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\

/// \\\
\\\ sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

--ABD0A6082.1683378535/mail1.g16.pair.com--

Magnus Groß

unread,

May 31, 2023, 4:56:15 AM5/31/23

to vim/vim, vim-dev ML, Comment

What is SES?

It stands for shortest edit script: https://arxiv.org/abs/2208.08823

I can modify the new diff() function to return a value similar to that
returned by the
Neovim vim.diff() function. i.e. either a string with the unified diff or
a List with the indices.
We can then add an additional option ("range" or "extended" or "position")
to return
the starting and ending position of the change in the old and the new text.

Bram: What do you think about this approach?

I think adding an option to select the behaviour would be a good idea, as it allows for all usecases.
Even though Bram is right about the join() being inefficient as a diffexpr, the unified diff may still be useful for some other usecases.
But of course it's @brammool call in which direction to go here.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you commented.

Bram Moolenaar

unread,

May 31, 2023, 7:48:21 AM5/31/23

to vim...@googlegroups.com, Magnus Groß

> > What is SES?
>
> It stands for shortest edit script: https://arxiv.org/abs/2208.08823
>
> > I can modify the new diff() function to return a value similar to that
> > returned by the
> > Neovim vim.diff() function. i.e. either a string with the unified diff or
> > a List with the indices.
> > We can then add an additional option ("range" or "extended" or "position")
> > to return
> > the starting and ending position of the change in the old and the new text.
> >
> > Bram: What do you think about this approach?
>
> I think adding an option to select the behaviour would be a good idea,
> as it allows for all usecases.

> Even though Bram is right about the `join()` being inefficient as a `diffexpr`, the unified diff may still be useful for some other usecases.

> But of course it's @brammool call in which direction to go here.

As mentioned before, the proposed return value includes column numbers,
it has more information than a unified diff. It should not be difficult
to turn the returned information into a unified diff if really needed.
Although I doubt it would be useful in more than a few cases.

It's easy to think of all kinds of options to add to return the diff in
various forms. But do we really need it? I rather just provide the
basic functionality, what would be slow to do in Vim script, and return
the result in a form that is easy to use for several purposes.

--
From "know your smileys":
8-O "Omigod!!" (done "rm -rf *" ?)

Christian Brabandt

unread,

Jan 23, 2024, 5:32:41 PM1/23/24

to vim/vim, vim-dev ML, Comment

@yegappan I think this is a good idea:

I can modify the new diff() function to return a value similar to that
returned by the Neovim vim.diff() function. i.e. either a string with the unified diff or
a List with the indices.
We can then add an additional option ("range" or "extended" or "position")
to return the starting and ending position of the change in the old and the new text.

Can you make this change please?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Jan 24, 2024, 10:57:19 AM1/24/24

to vim/vim, vim-dev ML, Comment

@yegappan I think this is a good idea:

I can modify the new diff() function to return a value similar to that
returned by the Neovim vim.diff() function. i.e. either a string with the unified diff or
a List with the indices.
We can then add an additional option ("range" or "extended" or "position")
to return the starting and ending position of the change in the old and the new text.

Can you make this change please?

@chrisbra I will work on this in a few days.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Jan 31, 2024, 9:07:19 PM1/31/24

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

fe6bc5f Support returning unified diff output

—
View it on GitHub.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Jan 31, 2024, 9:17:34 PM1/31/24

to vim/vim, vim-dev ML, Push

@yegappan pushed 1 commit.

22025c5 Fix build error

—
View it on GitHub.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Jan 31, 2024, 9:50:55 PM1/31/24

to vim/vim, vim-dev ML, Comment

@yegappan I think this is a good idea:

I can modify the new diff() function to return a value similar to that
returned by the Neovim vim.diff() function. i.e. either a string with the unified diff or
a List with the indices.
We can then add an additional option ("range" or "extended" or "position")
to return the starting and ending position of the change in the old and the new text.

Can you make this change please?

I have updated the PR to return either the unified diff or the indices. In a later PR, I will
add the support for returning the range information which is needed for LSP.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you commented.

Christian Brabandt

unread,

Feb 1, 2024, 3:59:05 PM2/1/24

to vim/vim, vim-dev ML, Comment

Thanks @yegappan

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Christian Brabandt

unread,

Feb 1, 2024, 4:21:03 PM2/1/24

to vim/vim, vim-dev ML, Comment

Closed #12321 via fa37835.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

errael

unread,

Feb 2, 2024, 11:29:51 AM2/2/24

to vim/vim, vim-dev ML, Comment

@yegappan Somewhat off topic...

I work with a port of a 3 way merge tool. I'm wondering if having the kind of information provided by diff() might be useful for adding features. The tool has up to 4 buffers involved in a diff. This new diff() is the closest I've seen for getting info about diffs (but I could easily have missed something). I'm wondering how hard it would be to get this info for active diff buffers.

This is not a feature request; I'm curious if diff()s implementation would be a good guide for getting the data on active diff buffers if it turns out to be useful for the merge tool.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 2, 2024, 3:48:49 PM2/2/24

to vim/vim, vim-dev ML, Comment

@yegappan Somewhat off topic...

I work with a port of a 3 way merge tool. I'm wondering if having the kind of information provided by diff() might be useful for adding features. The tool has up to 4 buffers involved in a diff. This new diff() is the closest I've seen for getting info about diffs (but I could easily have missed something). I'm wondering how hard it would be to get this info for active diff buffers.

This is not a feature request; I'm curious if diff()s implementation would be a good guide for getting the data on active diff buffers if it turns out to be useful for the merge tool.

@errael What sort of information are you looking for in an active diff buffer? Which lines are added, removed and modified in a diff buffer?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

rickhowe

unread,

Feb 3, 2024, 1:54:17 AM2/3/24

to vim/vim, vim-dev ML, Comment

Hi,

Thank you for providing diff(). I checked how it works while comparing with nvim's vim.diff() as follows.

diff(['a'], ['x', 'a'])
unified: @@ -0,0 +1 @@
indices: {0, 0, 0, 1} *
nvim:    [0, 0, 1, 1]

diff(['a'], ['a', 'x'])
unified: @@ -1,0 +2 @@
indices: {1, 0, 1, 1} *
nvim:    [1, 0, 2, 1]

diff(['x', 'a'], ['a'])
unified: @@ -1 +0,0 @@
indices: {0, 1, 0, 0} *
nvim:    [1, 1, 0, 0]

diff(['a', 'x'], ['a'])
unified: @@ -2 +1,0 @@
indices: {1, 1, 1, 0} *
nvim:    [2, 1, 1, 0]

*: {from_idx, from_count, to_idx, to_count}

I am a bit confused about idx, In unified, idx 0 always means '^'. But in indices, idx 0 means '^' if count = 0 else the first index of hunk. In nvim, vim.diff() returns the equivalent value of the unified as indices.

Is it difficult to follow vim.diff() as indices to make it simple and consistent between vim and nvim?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 3, 2024, 10:52:12 AM2/3/24

to vim...@googlegroups.com, reply+ACY5DGAUUDZ2NPTLFP...@reply.github.com, vim/vim, vim-dev ML, Comment

Hi,

On Fri, Feb 2, 2024 at 10:54 PM rickhowe <vim-dev...@256bit.org> wrote:

Hi,

Thank you for providing diff(). I checked how it works while comparing with nvim's vim.diff() as follows.

Note that nvim's vim.diff() is a lua function whereas the newly introduced diff() is a

Vim builtin function. Lua uses 1-based indexing whereas Vimscript uses 0-based

indexing. So the indices returned by the diff() function are 0-based whereas the

indices returned by the vim.diff() lua function are 1-based.

An interpretation of the indices for the below examples is inline.

diff(['a'], ['x', 'a'])
unified: @@ -0,0 +1 @@
indices: {0, 0, 0, 1} *
nvim:    [0, 0, 1, 1]

One string at index 0 in {list2} is inserted before the string at index 0 in {list1}.



diff(['a'], ['a', 'x'])
unified: @@ -1,0 +2 @@
indices: {1, 0, 1, 1} *
nvim:    [1, 0, 2, 1]

One string at index 1 in {list2} is inserted before the string at index 1 in {list1}.


diff(['x', 'a'], ['a'])
unified: @@ -1 +0,0 @@
indices: {0, 1, 0, 0} *
nvim:    [1, 1, 0, 0]

One string at index 0 in {list1} is removed from {list2} at index 0.


diff(['a', 'x'], ['a'])
unified: @@ -2 +1,0 @@
indices: {1, 1, 1, 0} *
nvim:    [2, 1, 1, 0]

One string at index 1 in {list1} is removed from {list2} at index 1.


*: {from_idx, from_count, to_idx, to_count}

When the from_count is zero, the strings from to_idx to (to_idx + to_count) in {list2} are inserted at from_idx in {list1}.

When the to_count is zero, the strings from from_idx to (from_idx + from_count) are removed from {list2} at to_idx.

I am a bit confused about idx, In unified, idx 0 always means '^'. But in indices, idx 0 means '^' if count = 0 else the first index of hunk. In nvim, vim.diff() returns the equivalent value of the unified as indices.

We can add the "idx 0 means '^' if count is 0, otherwise it is the first index of the hunk" note to

the help text.

Is it difficult to follow vim.diff() as indices to make it simple and consistent between vim and nvim?

Yes. The Vimscript diff() function should use 0-based indexing. This helps the caller in

directly using the returned index to access the lists.

Regards,

Yegappan

vim-dev ML

unread,

Feb 3, 2024, 10:52:31 AM2/3/24

to vim/vim, vim-dev ML, Your activity

Hi,

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

errael

unread,

Feb 3, 2024, 12:53:59 PM2/3/24

to vim/vim, vim-dev ML, Comment

@yegappan Somewhat off topic...
I work with a port of a 3 way merge tool. I'm wondering if having the kind of information provided by diff() might be useful for adding features. The tool has up to 4 buffers involved in a diff. This new diff() is the closest I've seen for getting info about diffs (but I could easily have missed something). I'm wondering how hard it would be to get this info for active diff buffers.
This is not a feature request; I'm curious if diff()s implementation would be a good guide for getting the data on active diff buffers if it turns out to be useful for the merge tool.

@errael What sort of information are you looking for in an active diff buffer? Which lines are added, removed and modified in a diff buffer?

I'm only beginning to explore outstanding issues on the original merge tool. So I don't really know what information I might need. When I saw this PR/issue it seemed worth exploring. The merge tool mostly uses vim builtin commands, like :diffget and :diffput.

There's a request for a command that merges both sides into the result; there have been times I wanted that myself. Seems it might be tricky. Maybe a popup that show a some options and lets the user pick the one that's closet to what they want; the user will probably still need to do some editing to get the right result. I'm wondering if having the specific diff information would help. I'm hoping I can determine the text from both sides (I mean the text that would be used by vim's :diffput/:diffget) and then I need to come up with some algorithms to provide options on how to combine both sides.

—
Reply to this email directly, view it on GitHub.
You are receiving this because you commented.

rickhowe

unread,

Feb 4, 2024, 6:23:04 AM2/4/24

to vim/vim, vim-dev ML, Comment

When the from_count is zero, the strings from to_idx to (to_idx + to_count) in {list2} are inserted at from_idx in {list1}. When the to_count is zero, the strings from from_idx to (from_idx + from_count) are removed from {list2} at to_idx.

Let me make sure the relation between unified and indices.

unified: @@ -from_line,from_count +to_line,to_count @@
indices: {from_idx, from_count, to_idx, to_count}

for both from_ and to_:

if 0 < conut
  line = idx + 1
else
  line = idx
endif

Is that correct?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 4, 2024, 12:03:39 PM2/4/24

to vim/vim, vim-dev ML, Comment

When the from_count is zero, the strings from to_idx to (to_idx + to_count) in {list2} are inserted at from_idx in {list1}. When the to_count is zero, the strings from from_idx to (from_idx + from_count) are removed from {list2} at to_idx.

Let me make sure the relation between unified and indices.
unified: @@ -from_line,from_count +to_line,to_count @@
indices: {from_idx, from_count, to_idx, to_count}

The unified diff uses line numbers which start at 1 whereas indices returns
List indexes which start at 0.

for both from_ and to_:

if 0 < conut
line = idx + 1
else
line = idx
endif
Is that correct?

No. In both the cases, the line number is idx + 1.

Some examples below:

  :echo diff(['1', '2'], ['0', '1', '2'], {'output': 'indices'})
   [{'from_count': 0, 'to_idx': 0, 'to_count': 1, 'from_idx': 0}]

  :echo diff(['1', '2'], ['1', '2', '3'], {'output': 'indices'})
   [{'from_count': 0, 'to_idx': 2, 'to_count': 1, 'from_idx': 2}]

In both the examples, the from_count is 0 and the to_count is 1. In the first example, the string at index 0 (to_idx) in the second List is inserted at index 0 (from_idx) in List1. In the second example, the string at index 2 (to_idx) in the second List is inserted at index 2 (`from_idx) in List1.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

rickhowe

unread,

Feb 4, 2024, 9:42:18 PM2/4/24

to vim/vim, vim-dev ML, Comment

I checked again on your examples about how unified relates to indices.

  :echo diff(['1', '2'], ['0', '1', '2'], {'output': 'indices'})
   [{'from_count': 0, 'to_idx': 0, 'to_count': 1, 'from_idx': 0}]

unified: @@ -0,0 +1 @@

from_line (0) = from_idx (0) (where from_conut = 0)
to_line (1) = to_idx (0) + 1 (where 0 < to_count)

  :echo diff(['1', '2'], ['1', '2', '3'], {'output': 'indices'})
   [{'from_count': 0, 'to_idx': 2, 'to_count': 1, 'from_idx': 2}]

unified: @@ -2,0 +3 @@

from_line (2) = from_idx (2) (where from_conut = 0)
to_line (3) = to_idx (2) + 1 (where 0 < to_count)

No. In both the cases, the line number is idx + 1.

The line number is not always idx + 1 but is idx if count = 0, right?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

rickhowe

unread,

Feb 7, 2024, 7:02:15 AM2/7/24

to vim/vim, vim-dev ML, Comment

This is a different question. It looks like that diff()calls a function which is set in &diffexpr. &diffopt says thatinternal is ignored when &diffexpr is set. It is not necessary for 'diff() to work along with those options, right?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 7, 2024, 11:16:26 AM2/7/24

to vim/vim, vim-dev ML, Comment

This is a different question. It looks like that diff() calls a function which is set in &diffexpr. &diffopt says that internal is ignored when &diffexpr is set. It is not necessary for diff() to work along with those options, right?

This is a bug. Can you open a separate issue for this? I will open a PR to fix this.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

errael

unread,

Feb 7, 2024, 11:54:02 AM2/7/24

to vim/vim, vim-dev ML, Comment

Excuse my ignorance of the details of this, but a few days ago, in #12321 (comment)

There's

I checked again on your examples about how unified relates to indices.

[snip]

The line number is not always idx + 1 but is idx if count = 0, right?

Does this need to be addressed/documented/???

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 7, 2024, 1:24:22 PM2/7/24

to vim/vim, vim-dev ML, Comment

I checked again on your examples about how unified relates to indices.

  :echo diff(['1', '2'], ['0', '1', '2'], {'output': 'indices'})
   [{'from_count': 0, 'to_idx': 0, 'to_count': 1, 'from_idx': 0}]
diff() returns as unified: @@ -0,0 +1 @@ then, from_line (0) = from_idx (0) (where from_conut = 0) to_line (1) = to_idx (0) + 1 (where 0 < to_count)

The values returned for the 'indices' option always refer to the item indices in
List1 and List2. So the corresponding line number is idx + 1.

The unified diff output returns line numbers. But there is a special case when
a change is made to the very first line. In this case it returns 0.

What is the use case for comparing the unified diff output and the indices?

  :echo diff(['1', '2'], ['1', '2', '3'], {'output': 'indices'})
   [{'from_count': 0, 'to_idx': 2, 'to_count': 1, 'from_idx': 2}]

diff() returns as unified: @@ -2,0 +3 @@ then, from_line (2) = from_idx (2) (where from_conut = 0) to_line (3) = to_idx (2) + 1 (where 0 < to_count)

No. In both the cases, the line number is idx + 1.

The line number is not always idx + 1 but is idx if count = 0, right?

—

Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 7, 2024, 1:24:37 PM2/7/24

to vim/vim, vim-dev ML, Comment

Excuse my ignorance of the details of this, but a few days ago, in #12321 (comment)

There's

I checked again on your examples about how unified relates to indices.
[snip]

The line number is not always idx + 1 but is idx if count = 0, right?

Does this need to be addressed/documented/???

This needs an documentation update.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

rickhowe

unread,

Feb 8, 2024, 9:17:54 AM2/8/24

to vim/vim, vim-dev ML, Comment

The unified diff output returns line numbers. But there is a special case when a change is made to the very first line. In this case it returns 0.

Yes, that is why I am asking. Let me make sure that the line number is not always idx + 1 but is idx if count = 0.

What is the use case for comparing the unified diff output and the indices?

My plugin will use a builtin diff function in vim and nvim and handle both outputs. I need to understand the exact difference between them. And, for example in &diffexpr, it may be possible that a function reads v:fname_in and v:fname_new files, handles and refines those diff as indices, and writes to v:fname_out file as unified.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 10, 2024, 11:51:05 AM2/10/24

to vim/vim, vim-dev ML, Comment

The unified diff output returns line numbers. But there is a special case when a change is made to the very first line. In this case it returns 0.

Yes, that is why I am asking. Let me make sure that the line number is not always idx + 1 but is idx if count = 0.

I have created PR #14010 to add support for specifying the optional
unified diff context length. If you specify the 'context' length as 1 or above, then you will get 1 as the
line number for these case.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

rickhowe

unread,

Feb 11, 2024, 8:55:20 PM2/11/24

to vim/vim, vim-dev ML, Comment

Yes, that is why I am asking. Let me make sure that the line number is not always idx + 1 but is idx if count = 0.

I have created PR #14010 to add support for specifying the optional unified diff context length. If you specify the 'context' length as 1 or above, then you will get 1 as the line number for these case.

I am confused again. I do not know how useful the context length is in vim script. I just want to make sure the relation between unified and indices.

Is that same as ctxlen in vim.diff() on nvim?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 11, 2024, 9:12:42 PM2/11/24

to vim/vim, vim-dev ML, Comment

Yes, that is why I am asking. Let me make sure that the line number is not always idx + 1 but is idx if count = 0.

I have created PR #14010 to add support for specifying the optional unified diff context length. If you specify the 'context' length as 1 or above, then you will get 1 as the line number for these case.

I am confused again. I do not know how useful the context length is in vim script. I just want to make sure the relation between unified and indices.

Is that same as ctxlen in vim.diff() on nvim? If so, I will not use it (use 0 as default).

Yes. It is same as the ctxlen item in nvim.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

rickhowe

unread,

Feb 11, 2024, 9:25:12 PM2/11/24

to vim/vim, vim-dev ML, Comment

Is that same as ctxlen in vim.diff() on nvim? If so, I will not use it (use 0 as default).

Yes. It is same as the ctxlen item in nvim.

OK but why its default is 1, which is different from vim.diff()?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

rickhowe

unread,

Feb 11, 2024, 9:43:44 PM2/11/24

to vim/vim, vim-dev ML, Comment

Another topic:
If you would update diff,txt help file, it is time to exclude external "diff" command. For example, in diff-diffexpr section, how would you like to introduce your diff(), such as:

Example (this does almost the same as 'diffexpr' being empty): >

set diffexpr=MyDiff()
function MyDiff()
  let in = readfile(v:fname_in)
  let new = readfile(v:fname_new)
  let out = diff(in, new, {'icase': &diffopt =~ "icase", 'iwhite': &diffopt =~ "iwhite"})
  call writefile(split(out, "\n"), v:fname_out)
endfunction

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 12, 2024, 12:33:49 AM2/12/24

to vim/vim, vim-dev ML, Comment

Is that same as ctxlen in vim.diff() on nvim? If so, I will not use it (use 0 as default).

Yes. It is same as the ctxlen item in nvim.

OK but why its default is 1, which is different from vim.diff()?

When the context length is 0, diff optimizes the output and the line number in a diff hunk can
be 0 in a few cases. When the context length is 1, then the number of cases where the line
number is 0 is reduced and this will help in avoiding confusion with the line number.

Note that the Linux "diff" command uses a default value of 3 for the context length.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 12, 2024, 12:36:11 AM2/12/24

to vim/vim, vim-dev ML, Comment

Another topic: If you would update diff,txt help file, it is time to exclude external "diff" command. For example, in diff-diffexpr section, how would you like to introduce your diff(), such as:
Example (this does almost the same as 'diffexpr' being empty): >

set diffexpr=MyDiff()
function MyDiff()
  let in = readfile(v:fname_in)
  let new = readfile(v:fname_new)
  let out = diff(in, new, {'icase': &diffopt =~ "icase", 'iwhite': &diffopt =~ "iwhite"})
  call writefile(split(out, "\n"), v:fname_out)
endfunction

Can you create a PR with this example added to diff.txt (in addition to the existing example
for using an external command)?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

rickhowe

unread,

Feb 12, 2024, 1:12:41 AM2/12/24

to vim/vim, vim-dev ML, Comment

OK but why its default is 1, which is different from vim.diff()?

When the context length is 0, diff optimizes the output and the line number in a diff hunk can be 0 in a few cases. When the context length is 1, then the number of cases where the line number is 0 is reduced and this will help in avoiding confusion with the line number.

No. line number 0 is not a special case and there is no confusion in existing diff. And I need diff() to optimize the result as a default. What I have confused is that a relation between unified and 0-based indices.

Note that the Linux "diff" command uses a default value of 3 for the context length.

Yes, because it is a command, A user can directly see the result. We are talking about diff() function.

I found that MyDiff() using diff() does not work, because vim does not accept context:1 unified format. diff-diffexpr section says:

For a unified diff no context lines can be used.
Using "diff -u" will NOT work, use "diff -U0".

Do we need to always specify context:0 to diff()?

—
Reply to this email directly, view it on GitHub.

You are receiving this because you commented.

Yegappan Lakshmanan

unread,

Feb 12, 2024, 12:11:32 PM2/12/24

to vim/vim, vim-dev ML, Comment

OK but why its default is 1, which is different from vim.diff()?

When the context length is 0, diff optimizes the output and the line number in a diff hunk can be 0 in a few cases. When the context length is 1, then the number of cases where the line number is 0 is reduced and this will help in avoiding confusion with the line number.

No. line number 0 is not a special case and there is no confusion in existing diff. And I need diff() to optimize the result as a default. What I have confused is that a relation between unified and 0-based indices.

Note that the Linux "diff" command uses a default value of 3 for the context length.

Yes, because it is a command, A user can directly see the result. We are talking about diff() function.

I found that MyDiff() using diff() does not work, because vim does not accept context:1 unified format. diff-diffexpr section says:
For a unified diff no context lines can be used.
Using "diff -u" will NOT work, use "diff -U0".
Do we need to always specify context:0 to diff()?

Good point. I missed this note in the documentation about the unified diff expecting context length of 0.
We can use 0 as the default context length for the diff() function.