[vim/vim] perf: fast path for split() with a single-byte literal separator (PR #19708)

2 views
Skip to first unread message

mattn

unread,
Mar 16, 2026, 12:05:43 AM (2 days ago) Mar 16
to vim/vim, Subscribed

split() with a single-byte literal separator (e.g. ",", ":", "/") is an extremely common pattern in Vim script, yet it currently goes through the full regexp compile-and-match path every time. This patch adds a fast path that detects a plain single-byte, non-metacharacter pattern and uses vim_strchr() to scan instead, skipping vim_regcomp() / vim_regexec() entirely.

Regexp patterns, multi-byte separators, and the default whitespace pattern are unaffected and still take the existing code path.

Benchmark: split(join(range(1000), ','), ',') × 10,000 iterations

Before After Speedup
Time 3.297 s 0.904 s 3.6×

You can view, comment on, or merge this pull request online at:

  https://github.com/vim/vim/pull/19708

Commit Summary

  • f9852ae perf: fast path for split() with a single-byte literal separator

File Changes

(1 file)

Patch Links:

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19708@github.com>

mattn

unread,
Mar 16, 2026, 12:31:44 AM (2 days ago) Mar 16
to vim/vim, Push

@mattn pushed 1 commit.

  • 6ff94ca perf: extend split() fast path to any literal string pattern

—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19708/before/f9852aed40113510df3e72373cad15314df597d8/after/6ff94cac6a06e34b273bc7eb9995e8e6809c082b@github.com>

mattn

unread,
Mar 16, 2026, 1:11:47 AM (2 days ago) Mar 16
to vim/vim, Push

@mattn pushed 1 commit.

  • eb071f9 fix: preserve empty split items

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19708/before/6ff94cac6a06e34b273bc7eb9995e8e6809c082b/after/eb071f9ace9829cc03bdc7790c468e472a69fac1@github.com>

mattn

unread,
Mar 16, 2026, 1:31:23 AM (2 days ago) Mar 16
to vim/vim, Push

@mattn pushed 1 commit.

  • 4f407db fix: avoid codestyle false positive

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19708/before/eb071f9ace9829cc03bdc7790c468e472a69fac1/after/4f407db0d54c8159234343c320a5e2252b5068f4@github.com>

Char

unread,
Mar 16, 2026, 2:50:24 AM (2 days ago) Mar 16
to vim/vim, Subscribed
char101 left a comment (vim/vim#19708)

How about adding a condition that the previous char is not \ in

for (p = pat; *p != NUL; p += mb_ptr2len(p))
	if (*p < 0x80
		&& vim_strchr((char_u *)".^$~[]\\*?+|{}()", *p) != NULL)
	    return FALSE;

that will make \.\. literal.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19708/c4065463432@github.com>

Christian Brabandt

unread,
Mar 17, 2026, 4:02:18 PM (13 hours ago) Mar 17
to vim/vim, Subscribed

@chrisbra commented on this pull request.


In src/evalfunc.c:

>  
     if (rettv_list_alloc(rettv) == FAIL)
 	goto theend;
     if (typeerr)
 	goto theend;
 
+    if (literal)
+    {
+	patlen = (int)STRLEN(pat);
+	while (*str != NUL || keepempty)
+	{
+	    p = (char_u *)strstr((char *)str, (char *)pat);
+	    end = p == NULL ? str + STRLEN(str) : p;

can we avoid the strlen() inside the loop?


In src/evalfunc.c:

>  
     if (rettv_list_alloc(rettv) == FAIL)
 	goto theend;
     if (typeerr)
 	goto theend;
 
+    if (literal)
+    {
+	patlen = (int)STRLEN(pat);
+	while (*str != NUL || keepempty)
+	{
+	    p = (char_u *)strstr((char *)str, (char *)pat);

Hm, does strstr() handle non utf-8 multibyte chars correctly?

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19708/review/3963423513@github.com>

Copilot

unread,
Mar 17, 2026, 4:10:29 PM (13 hours ago) Mar 17
to vim/vim, Subscribed

@Copilot commented on this pull request.

Pull request overview

This PR optimizes the Vimscript split() builtin by adding a fast path for purely-literal separator patterns, avoiding regex compilation/execution for common cases while leaving regexp and default-whitespace behavior on the existing code path.

Changes:

  • Add is_literal_pat() helper to detect patterns with no regexp metacharacters (with multibyte-safe scanning).
  • Implement a literal-separator split loop using strstr() and byte-length advancement instead of vim_regcomp()/vim_regexec().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.


In src/evalfunc.c:

> +					&& *str != NUL && p != NULL
+					&& end < p + patlen))

In the literal fast path, the condition end < p + patlen is redundant because end == p when p != NULL and patlen > 0 for literal patterns. Keeping this comparison makes the empty-item handling harder to read; consider simplifying the condition to the minimal checks needed to preserve split()'s "discard leading/trailing empties, keep middle empties" semantics.


In src/evalfunc.c:

> +    static int
+is_literal_pat(char_u *pat)
+{
+    char_u  *p;
+
+    if (pat == NULL || *pat == NUL)
+	return FALSE;
+
+    // Check that no character in the pattern has regexp meaning.
+    // Use mb_ptr2len() to skip over multi-byte characters safely so that
+    // trail bytes are never mistaken for ASCII metacharacters.
+    for (p = pat; *p != NUL; p += mb_ptr2len(p))
+	if (*p < 0x80
+		&& vim_strchr((char_u *)".^$~[]\\*?+|{}()", *p) != NULL)
+	    return FALSE;
+
+    return TRUE;
+}
+

The new helper is_literal_pat() is defined in the middle of the forward-declaration block (between builtin function prototypes). This breaks the pattern in this file where prototypes are grouped together and function bodies start after the list; consider moving is_literal_pat() either below the prototype section (after the last static void f_* declaration) or closer to f_split() to keep declarations and definitions clearly separated.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.Message ID: <vim/vim/pull/19708/review/3963483838@github.com>

Reply all
Reply to author
Forward
0 new messages