[vim/vim] Refactor the UTF-8 hot paths in src/mbyte.c so codepoint decoding and… (PR #19649)

mattn

unread,

12:20 AM (13 hours ago) 12:20 AM

to vim/vim, Subscribed

Refactor the UTF-8 hot paths in src/mbyte.c so codepoint decoding and byte-length validation are done once and reused by callers. This removes repeated utf_ptr2char()/utf_ptr2len() work in utf_ptr2cells(), utf_ptr2cells_len(), utfc_ptr2char(), utfc_ptr2char_len(), and the shared length helpers used by redraw and display-width code paths.

Benchmark: 8.31% faster in a redraw-heavy UTF-8 editing scenario with long lines full of Japanese text, emoji, combining characters, and full-width glyphs. The measured case used 4000 long UTF-8 lines with list, linebreak, number, and cursorline enabled, then repeatedly performed cursor moves, horizontal scrolling, virtcol('$'), and redraw! calls.

You can view, comment on, or merge this pull request online at:

https://github.com/vim/vim/pull/19649

Commit Summary

cb752ea Refactor the UTF-8 hot paths in src/mbyte.c so codepoint decoding and byte-length validation are done once and reused by callers. This removes repeated utf_ptr2char()/utf_ptr2len() work in utf_ptr2cells(), utf_ptr2cells_len(), utfc_ptr2char(), utfc_ptr2char_len(), and the shared length helpers used by redraw and display-width code paths.

File Changes

(1 file)

M src/mbyte.c (254)

Patch Links:

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

mattn

unread,

1:01 AM (12 hours ago) 1:01 AM

to vim/vim, Push

@mattn pushed 1 commit.

268537f perf/opt-mbyte-utf8-decode

—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.

mattn

unread,

2:04 AM (11 hours ago) 2:04 AM

to vim/vim, Push

@mattn pushed 1 commit.

ff922f8 fix: validate continuation bytes in utf_ptr2char_and_len_len() for truncated sequences

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

mattn

unread,

2:14 AM (11 hours ago) 2:14 AM

to vim/vim, Push

@mattn pushed 1 commit.

e9e0903 fix: suppress unused parameter warning for c1 in utf_iscomposinglike_char()

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

mattn

unread,

2:32 AM (11 hours ago) 2:32 AM

to vim/vim, Subscribed

mattn left a comment (vim/vim#19649)

Many call sites use multiple utf_ptr2xxx() helpers on the same bytes, for example to get both the decoded character and its length. That means we end up walking the same UTF-8 sequence more than once and doing redundant work in hot paths.

If we provide a way to get both values in a single pass, ASCII might become a little slower in some places due to the extra plumbing, but multibyte text should consistently benefit from it.