[vim/vim] perf: fast path for split() with a single-byte literal separator (PR #19708)

mattn

unread,

Mar 16, 2026, 12:05:43 AM (2 days ago) Mar 16

to vim/vim, Subscribed

split() with a single-byte literal separator (e.g. ",", ":", "/") is an extremely common pattern in Vim script, yet it currently goes through the full regexp compile-and-match path every time. This patch adds a fast path that detects a plain single-byte, non-metacharacter pattern and uses vim_strchr() to scan instead, skipping vim_regcomp() / vim_regexec() entirely.

Regexp patterns, multi-byte separators, and the default whitespace pattern are unaffected and still take the existing code path.

Benchmark: split(join(range(1000), ','), ',') × 10,000 iterations

	Before	After	Speedup
Time	3.297 s	0.904 s	3.6×

You can view, comment on, or merge this pull request online at:

https://github.com/vim/vim/pull/19708

Commit Summary

f9852ae perf: fast path for split() with a single-byte literal separator

File Changes

(1 file)

M src/evalfunc.c (39)

Patch Links:

—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.

mattn

unread,

Mar 16, 2026, 12:31:44 AM (2 days ago) Mar 16

to vim/vim, Push

@mattn pushed 1 commit.

6ff94ca perf: extend split() fast path to any literal string pattern

—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.

mattn

unread,

Mar 16, 2026, 1:11:47 AM (2 days ago) Mar 16

to vim/vim, Push

@mattn pushed 1 commit.

eb071f9 fix: preserve empty split items

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

mattn

unread,

Mar 16, 2026, 1:31:23 AM (2 days ago) Mar 16

to vim/vim, Push

@mattn pushed 1 commit.

4f407db fix: avoid codestyle false positive

—
View it on GitHub or unsubscribe.

You are receiving this because you are subscribed to this thread.

Char

unread,

Mar 16, 2026, 2:50:24 AM (2 days ago) Mar 16

to vim/vim, Subscribed

char101 left a comment (vim/vim#19708)

How about adding a condition that the previous char is not \ in

for (p = pat; *p != NUL; p += mb_ptr2len(p))
	if (*p < 0x80
		&& vim_strchr((char_u *)".^$~[]\\*?+|{}()", *p) != NULL)
	    return FALSE;

that will make \.\. literal.

—
Reply to this email directly, view it on GitHub.

You are receiving this because you are subscribed to this thread.

Christian Brabandt

unread,

Mar 17, 2026, 4:02:18 PM (13 hours ago) Mar 17

to vim/vim, Subscribed

@chrisbra commented on this pull request.

In src/evalfunc.c:

>  
     if (rettv_list_alloc(rettv) == FAIL)
 	goto theend;
     if (typeerr)
 	goto theend;
 
+    if (literal)
+    {
+	patlen = (int)STRLEN(pat);
+	while (*str != NUL || keepempty)
+	{
+	    p = (char_u *)strstr((char *)str, (char *)pat);
+	    end = p == NULL ? str + STRLEN(str) : p;

can we avoid the strlen() inside the loop?

In src/evalfunc.c:

>  
     if (rettv_list_alloc(rettv) == FAIL)
 	goto theend;
     if (typeerr)
 	goto theend;
 
+    if (literal)
+    {
+	patlen = (int)STRLEN(pat);
+	while (*str != NUL || keepempty)
+	{
+	    p = (char_u *)strstr((char *)str, (char *)pat);

Hm, does strstr() handle non utf-8 multibyte chars correctly?

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Copilot

unread,

Mar 17, 2026, 4:10:29 PM (13 hours ago) Mar 17

to vim/vim, Subscribed

@Copilot commented on this pull request.

Pull request overview

This PR optimizes the Vimscript split() builtin by adding a fast path for purely-literal separator patterns, avoiding regex compilation/execution for common cases while leaving regexp and default-whitespace behavior on the existing code path.

Changes:

Add is_literal_pat() helper to detect patterns with no regexp metacharacters (with multibyte-safe scanning).
Implement a literal-separator split loop using strstr() and byte-length advancement instead of vim_regcomp()/vim_regexec().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

In src/evalfunc.c:

> +					&& *str != NUL && p != NULL
+					&& end < p + patlen))

In the literal fast path, the condition end < p + patlen is redundant because end == p when p != NULL and patlen > 0 for literal patterns. Keeping this comparison makes the empty-item handling harder to read; consider simplifying the condition to the minimal checks needed to preserve split()'s "discard leading/trailing empties, keep middle empties" semantics.

In src/evalfunc.c:

> +    static int
+is_literal_pat(char_u *pat)
+{
+    char_u  *p;
+
+    if (pat == NULL || *pat == NUL)
+	return FALSE;
+
+    // Check that no character in the pattern has regexp meaning.
+    // Use mb_ptr2len() to skip over multi-byte characters safely so that
+    // trail bytes are never mistaken for ASCII metacharacters.
+    for (p = pat; *p != NUL; p += mb_ptr2len(p))
+	if (*p < 0x80
+		&& vim_strchr((char_u *)".^$~[]\\*?+|{}()", *p) != NULL)
+	    return FALSE;
+
+    return TRUE;
+}
+

The new helper is_literal_pat() is defined in the middle of the forward-declaration block (between builtin function prototypes). This breaks the pattern in this file where prototypes are grouped together and function bodies start after the list; consider moving is_literal_pat() either below the prototype section (after the last static void f_* declaration) or closer to f_split() to keep declarations and definitions clearly separated.

—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.

Reply all

Reply to author

Forward