[vim/vim] Make charidx() and utf16idx() return value consistent with byteidx() return value (PR #12503)

Yegappan Lakshmanan

unread,

Jun 7, 2023, 9:44:16 PM6/7/23

to vim/vim, Subscribed

The byteidx() and byteidxcomp() functions return the length of a string in bytes when the specified character
index is equal to the number of characters in the string. But the charidx() function returns -1 when the specified
byte index is equal to the number of bytes in the string. Same behavior applies to the utf16idx() function also.
Make the return value of these two functions consistent with the byteidx() function.

You can view, comment on, or merge this pull request online at:

https://github.com/vim/vim/pull/12503

Commit Summary

231add9 Make charidx() and utf16idx() return value consistent with byteidx() return value

File Changes

(4 files)

Patch Links:

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Jun 7, 2023, 10:30:02 PM6/7/23

to vim/vim, Push

@yegappan pushed 1 commit.

bc28764 Fix test failure

—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.

codecov[bot]

unread,

Jun 7, 2023, 10:40:33 PM6/7/23

to vim/vim, Subscribed

Codecov Report

Merging #12503 (bc28764) into master (5bf0428) will increase coverage by 0.65%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #12503      +/-   ##
==========================================
+ Coverage   82.09%   82.74%   +0.65%     
==========================================
  Files         160      150      -10     
  Lines      193606   180421   -13185     
  Branches    43469    40547    -2922     
==========================================
- Hits       158933   149288    -9645     
+ Misses      21834    18186    -3648     
- Partials    12839    12947     +108

Flag	Coverage Δ
huge-clang-none	`82.74% <100.00%> (-0.01%)`	⬇️
linux	`82.74% <100.00%> (-0.01%)`	⬇️
mingw-x64-HUGE	`?`
mingw-x86-HUGE	`?`
windows	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/evalfunc.c	`88.97% <ø> (-1.30%)`	⬇️
src/strings.c	`92.94% <100.00%> (-0.46%)`	⬇️

... and 140 files with indirect coverage changes

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Bram Moolenaar

unread,

Jun 8, 2023, 12:10:39 PM6/8/23

to vim/vim, Subscribed

Closed #12503 via 577922b.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Yee Cheng Chin

unread,

Jun 20, 2023, 5:04:38 PM6/20/23

to vim/vim, Subscribed

May I ask what the motivation is for such a change? This seems counter to any programming language libraries that I'm aware of. You will end up returning an "index" that's clearly out of bounds and seems to actively encourage unsafe programming where you now have to check for length or -1 to tell if the result is invalid. Maybe I'm missing something by it seems more like byteidx is the undesired behavior here.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Yee Cheng Chin

unread,

Jun 20, 2023, 5:08:50 PM6/20/23

to vim/vim, Subscribed

I guess this is to make it so that you can do something like byteidx(charidx(…)) and vice versa and have it return itself? And you are essentially using charidx double duty as a fictional strcharlen() function (since strlen() returns bytes and we don't have a function to return number of Unicode characters in a string)?

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Jun 21, 2023, 10:04:23 AM6/21/23

to vim...@googlegroups.com, reply+ACY5DGEZBXA22NEM4K...@reply.github.com, vim/vim, Subscribed

Hi,

On Tue, Jun 20, 2023 at 2:04 PM Yee Cheng Chin <vim-dev...@256bit.org> wrote:

May I ask what the motivation is for such a change? This seems counter to any programming language libraries that I'm aware of. You will end up returning an "index" that's clearly out of bounds and seems to actively encourage unsafe programming where you now have to check for length or -1 to tell if the result is invalid. Maybe I'm missing something by it seems more like byteidx is the undesired behavior here.

A LSP server can return a character position which is at the end of a line (equal to the number

of characters in the line). This happens, for example, when doing insert-mode completion at the

end of a line. In this case, if charidx() is used to translate the returned LSP position to a character

position, it returns -1. So the plugin needs to check for this case and then use strchars() and use

the return value as the character position:

https://github.com/yegappan/lsp/blob/main/autoload/lsp/offset.vim#L55

With this PR, the charidx() function returns the number of characters in the line if the supplied index

is equal to the string length and the plugin can directly use the return value avoid an additional

call to strchars().

Regards,

Yegappan

vim-dev ML

unread,

Jun 21, 2023, 10:04:39 AM6/21/23

to vim/vim, vim-dev ML, Your activity

Hi,

On Tue, Jun 20, 2023 at 2:04 PM Yee Cheng Chin ***@***.***>

wrote:

> May I ask what the motivation is for such a change? This seems counter to
> any programming language libraries that I'm aware of. You will end up
> returning an "index" that's clearly out of bounds and seems to actively

> encourage unsafe programming where you now have to check for length *or*

> -1 to tell if the result is invalid. Maybe I'm missing something by it
> seems more like byteidx is the undesired behavior here.
>
>
> A LSP server can return a character position which is at the end of a line

(equal to the number

of characters in the line). This happens, for example, when doing
insert-mode completion at the
end of a line. In this case, if charidx() is used to translate the
returned LSP position to a character
position, it returns -1. So the plugin needs to check for this case and
then use strchars() and use
the return value as the character position:

https://github.com/yegappan/lsp/blob/main/autoload/lsp/offset.vim#L55

With this PR, the charidx() function returns the number of characters in
the line if the supplied index

is equal to the string length and the plugin can directly use the return
value avoid an additional
call to strchars().

Regards,
Yegappan

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Yegappan Lakshmanan

unread,

Jun 21, 2023, 10:07:57 AM6/21/23

to vim...@googlegroups.com, reply+ACY5DGC2O4DIYRRDDY...@reply.github.com, vim/vim, Subscribed

Hi,

On Tue, Jun 20, 2023 at 2:08 PM Yee Cheng Chin <vim-dev...@256bit.org> wrote:

I guess this is to make it so that you can do something like byteidx(charidx(…)) and vice versa and have it return itself? And you are essentially using charidx double duty as a fictional strcharlen() function (since strlen() returns bytes and we don't have a function to return number of Unicode characters in a string)?

A LSP server can encode a position offset using UTF-8 or UTF-16 or UTF-32 encoding as described here:

https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments.

A Vim LSP plugin needs to decode and encode the character position in a file to the corresponding

encoding supported by the LSP server. For example, see the functions in the following file:

https://github.com/yegappan/lsp/blob/main/autoload/lsp/offset.vim

This PR helps in simplifying some of the code in the above functions.

Regards,
Yegappan

vim-dev ML

unread,

Jun 21, 2023, 10:08:13 AM6/21/23

to vim/vim, vim-dev ML, Your activity

Hi,

On Tue, Jun 20, 2023 at 2:08 PM Yee Cheng Chin ***@***.***>

wrote:

> I guess this is to make it so that you can do something like
> byteidx(charidx(…)) and vice versa and have it return itself? And you are
> essentially using charidx double duty as a fictional strcharlen()
> function (since strlen() returns bytes and we don't have a function to
> return number of Unicode characters in a string)?
>
>
>
A LSP server can encode a position offset using UTF-8 or UTF-16 or UTF-32
encoding as described here:
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

..

A Vim LSP plugin needs to decode and encode the character position in a
file to the corresponding
encoding supported by the LSP server. For example, see the functions in
the following file:

https://github.com/yegappan/lsp/blob/main/autoload/lsp/offset.vim

This PR helps in simplifying some of the code in the above functions.

Regards,
Yegappan

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Reply all

Reply to author

Forward