[vim/vim] Refactor the UTF-8 hot paths in src/mbyte.c so codepoint decoding and… (PR #19649)

4 views
Skip to first unread message

mattn

unread,
12:20 AM (13 hours ago) 12:20 AM
to vim/vim, Subscribed

Refactor the UTF-8 hot paths in src/mbyte.c so codepoint decoding and byte-length validation are done once and reused by callers. This removes repeated utf_ptr2char()/utf_ptr2len() work in utf_ptr2cells(), utf_ptr2cells_len(), utfc_ptr2char(), utfc_ptr2char_len(), and the shared length helpers used by redraw and display-width code paths.

Benchmark: 8.31% faster in a redraw-heavy UTF-8 editing scenario with long lines full of Japanese text, emoji, combining characters, and full-width glyphs. The measured case used 4000 long UTF-8 lines with list, linebreak, number, and cursorline enabled, then repeatedly performed cursor moves, horizontal scrolling, virtcol('$'), and redraw! calls.


You can view, comment on, or merge this pull request online at:

  https://github.com/vim/vim/pull/19649

Commit Summary

  • cb752ea Refactor the UTF-8 hot paths in src/mbyte.c so codepoint decoding and byte-length validation are done once and reused by callers. This removes repeated utf_ptr2char()/utf_ptr2len() work in utf_ptr2cells(), utf_ptr2cells_len(), utfc_ptr2char(), utfc_ptr2char_len(), and the shared length helpers used by redraw and display-width code paths.

File Changes

(1 file)

Patch Links:


Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19649@github.com>

mattn

unread,
1:01 AM (12 hours ago) 1:01 AM
to vim/vim, Push

@mattn pushed 1 commit.

  • 268537f perf/opt-mbyte-utf8-decode


View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19649/before/cb752ea2fc75d07e0cc0b86ea1649c2364e66ca2/after/268537ffef95541487adaac7a38d4c559d15b789@github.com>

mattn

unread,
2:04 AM (11 hours ago) 2:04 AM
to vim/vim, Push

@mattn pushed 1 commit.

  • ff922f8 fix: validate continuation bytes in utf_ptr2char_and_len_len() for truncated sequences

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19649/before/268537ffef95541487adaac7a38d4c559d15b789/after/ff922f86cb2f2a702625ec0f39af59b82ac2cc96@github.com>

mattn

unread,
2:14 AM (11 hours ago) 2:14 AM
to vim/vim, Push

@mattn pushed 1 commit.

  • e9e0903 fix: suppress unused parameter warning for c1 in utf_iscomposinglike_char()

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19649/before/ff922f86cb2f2a702625ec0f39af59b82ac2cc96/after/e9e0903556adaffca6f2485c5946ec574db161fd@github.com>

mattn

unread,
2:32 AM (11 hours ago) 2:32 AM
to vim/vim, Subscribed
mattn left a comment (vim/vim#19649)

Many call sites use multiple utf_ptr2xxx() helpers on the same bytes, for example to get both the decoded character and its length. That means we end up walking the same UTF-8 sequence more than once and doing redundant work in hot paths.

If we provide a way to get both values in a single pass, ASCII might become a little slower in some places due to the extra plumbing, but multibyte text should consistently benefit from it.


Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19649/c4044321797@github.com>

Reply all
Reply to author
Forward
0 new messages