split() with a single-byte literal separator (e.g. ",", ":", "/") is an extremely common pattern in Vim script, yet it currently goes through the full regexp compile-and-match path every time. This patch adds a fast path that detects a plain single-byte, non-metacharacter pattern and uses vim_strchr() to scan instead, skipping vim_regcomp() / vim_regexec() entirely.
Regexp patterns, multi-byte separators, and the default whitespace pattern are unaffected and still take the existing code path.
Benchmark: split(join(range(1000), ','), ',') × 10,000 iterations
| Before | After | Speedup | |
|---|---|---|---|
| Time | 3.297 s | 0.904 s | 3.6× |
  https://github.com/vim/vim/pull/19708
(1Â file)
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
@mattn pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@mattn pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@mattn pushed 1 commit.
—
View it on GitHub or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
How about adding a condition that the previous char is not \ in
for (p = pat; *p != NUL; p += mb_ptr2len(p)) if (*p < 0x80 && vim_strchr((char_u *)".^$~[]\\*?+|{}()", *p) != NULL) return FALSE;
that will make \.\. literal.
—
Reply to this email directly, view it on GitHub.
You are receiving this because you are subscribed to this thread.![]()
@chrisbra commented on this pull request.
In src/evalfunc.c:
>
if (rettv_list_alloc(rettv) == FAIL)
goto theend;
if (typeerr)
goto theend;
+ if (literal)
+ {
+ patlen = (int)STRLEN(pat);
+ while (*str != NUL || keepempty)
+ {
+ p = (char_u *)strstr((char *)str, (char *)pat);
+ end = p == NULL ? str + STRLEN(str) : p;
can we avoid the strlen() inside the loop?
In src/evalfunc.c:
>
if (rettv_list_alloc(rettv) == FAIL)
goto theend;
if (typeerr)
goto theend;
+ if (literal)
+ {
+ patlen = (int)STRLEN(pat);
+ while (*str != NUL || keepempty)
+ {
+ p = (char_u *)strstr((char *)str, (char *)pat);
Hm, does strstr() handle non utf-8 multibyte chars correctly?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()
@Copilot commented on this pull request.
This PR optimizes the Vimscript split() builtin by adding a fast path for purely-literal separator patterns, avoiding regex compilation/execution for common cases while leaving regexp and default-whitespace behavior on the existing code path.
Changes:
is_literal_pat() helper to detect patterns with no regexp metacharacters (with multibyte-safe scanning).strstr() and byte-length advancement instead of vim_regcomp()/vim_regexec().💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
In src/evalfunc.c:
> + && *str != NUL && p != NULL + && end < p + patlen))
In the literal fast path, the condition end < p + patlen is redundant because end == p when p != NULL and patlen > 0 for literal patterns. Keeping this comparison makes the empty-item handling harder to read; consider simplifying the condition to the minimal checks needed to preserve split()'s "discard leading/trailing empties, keep middle empties" semantics.
In src/evalfunc.c:
> + static int
+is_literal_pat(char_u *pat)
+{
+ char_u *p;
+
+ if (pat == NULL || *pat == NUL)
+ return FALSE;
+
+ // Check that no character in the pattern has regexp meaning.
+ // Use mb_ptr2len() to skip over multi-byte characters safely so that
+ // trail bytes are never mistaken for ASCII metacharacters.
+ for (p = pat; *p != NUL; p += mb_ptr2len(p))
+ if (*p < 0x80
+ && vim_strchr((char_u *)".^$~[]\\*?+|{}()", *p) != NULL)
+ return FALSE;
+
+ return TRUE;
+}
+
The new helper is_literal_pat() is defined in the middle of the forward-declaration block (between builtin function prototypes). This breaks the pattern in this file where prototypes are grouped together and function bodies start after the list; consider moving is_literal_pat() either below the prototype section (after the last static void f_* declaration) or closer to f_split() to keep declarations and definitions clearly separated.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you are subscribed to this thread.![]()