Problems of new regexp engine (that we know)

507 views
Skip to first unread message

mattn

unread,
May 20, 2013, 8:16:55 PM5/20/13
to vim...@googlegroups.com
* Some plugins doesn't work
* \%u is disabled
* test64 contains tests for multi-byte
* test95 doesn't pass without enc=utf-8

Here is japanese discusstion
https://github.com/vim-jp/issues/issues/390#issuecomment-18181263

Thanks.

Marc Weber

unread,
May 20, 2013, 9:30:10 PM5/20/13
to vim_dev
1) That the new regex *silently* fails if something is not supported is
no option - you should throw an error IMHO so that people know that
something goes wrong.

2) https://gist.github.com/MarcWeber/5616733
I've created an unfinished QuickCheck script to compare the old and the
new engine - however because the "new engine" is not documented other
than "should work on most syntax files" and "does not implement
everything" I'm not sure what to include in that text

Summary it looks promising. I expected to find more issues.

It found this cases behaving differently:

The first [] is always the regex, the second is the string to match
against (using matchall).

1)
RegexTests [\_F] ["\NULa"]
\NUL is the 0 byte - which is read by readfile() (not using 'b' flag)

new: ['a', '', '', '', '', '', '', '', '', '']
old: ['^@', '', '', '', '', '', '', '', '', '']

2) and all the others: they seem to be utf-8 related
echo '1' =~ '\%#=1\o{\?Ä\Z'
echo '1' =~ '\%#=2\o{\?Ä\Z'

From what I tested I got no segfault, and most generated tests seem to
pass. Please note that I consider "new engine finding something which is
not implemented" and "old engine does not parse regex" success.

Another test is this: (first is regex, second is the string to match
against):
[ú\Z] [""]

I cannot reproduce this using such viml code only:

let reg = 'ú\Z'
let t = ""
echo matchlist('\%#=1'.reg, t)
echo matchlist('\%#=2'.reg, t)

setting t to '1' however causes the difference

result of matchlist:
new: ['', '', '', '', '', '', '', '', '', '']
old: []

So maybe with t="" this is a readfile related issue, too?

RegexTests [\p\+] ["\236a"]
new: ['a', '', '', '', '', '', '', '', '', '']
old: ['ìa', '', '', '', '', '', '', '', '', '']
again an utf-8 issue as well as this:
RegexTests [¤\|\Z] ["a"]

Tested with version: Included patches: 1-981

I'm not sure what is influencing vim's utf-8 handling?
I have &encoding=utf-8 set.

client-server communication fails every 500 times or so - no bytes
are returned.

Marc Weber

Ken Takata

unread,
May 20, 2013, 9:45:50 PM5/20/13
to vim...@googlegroups.com
Hi,

2013/05/21 Tue 9:16:55 UTC+9 mattn wrote:
> * Some plugins doesn't work
> * \%u is disabled
> * test64 contains tests for multi-byte
> * test95 doesn't pass without enc=utf-8

* Strange condition in the line 1094:
if (*regparse == 'n' || *regparse == 'n')

\%u seems to be disabled because of lack of testing.
See the comment in the line 1104.

Thanks,
Ken Takata

Yasuhiro MATSUMOTO

unread,
May 21, 2013, 5:26:05 AM5/21/13
to vim...@googlegroups.com
One more. \ze does not work.
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups
> "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vim_dev+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>


--
- Yasuhiro Matsumoto

Bram Moolenaar

unread,
May 21, 2013, 6:04:55 AM5/21/13
to Marc Weber, vim_dev

Marc Weber wrote:

> 1) That the new regex *silently* fails if something is not supported is
> no option - you should throw an error IMHO so that people know that
> something goes wrong.

It happens if you set 'regexpengine' to 2.

> 2) https://gist.github.com/MarcWeber/5616733
> I've created an unfinished QuickCheck script to compare the old and the
> new engine - however because the "new engine" is not documented other
> than "should work on most syntax files" and "does not implement
> everything" I'm not sure what to include in that text
>
> Summary it looks promising. I expected to find more issues.

More testing is always good. I'm using a bunch of syntax highlighted
files. Found a few problems. And found one problem in the old engine!
It's not sufficient though.

> It found this cases behaving differently:
>
> The first [] is always the regex, the second is the string to match
> against (using matchall).
>
> 1)
> RegexTests [\_F] ["\NULa"]
> \NUL is the 0 byte - which is read by readfile() (not using 'b' flag)
>
> new: ['a', '', '', '', '', '', '', '', '', '']
> old: ['^@', '', '', '', '', '', '', '', '', '']

I fixed \F yesterday. Hmm, but you say you include patch 981.
What is the Vim command to reproduce this? The newline character should
represent a NUL. However, a string cannot contain a NUL.

> 2) and all the others: they seem to be utf-8 related
> echo '1' =~ '\%#=1\o{\?�ソス\Z'
> echo '1' =~ '\%#=2\o{\?�ソス\Z'

Yes, that looks like a bug.

> From what I tested I got no segfault, and most generated tests seem to
> pass. Please note that I consider "new engine finding something which is
> not implemented" and "old engine does not parse regex" success.
>
> Another test is this: (first is regex, second is the string to match
> against):
> [�ソス\Z] [""]
>
> I cannot reproduce this using such viml code only:
>
> let reg = '�ソス\Z'
> let t = ""
> echo matchlist('\%#=1'.reg, t)
> echo matchlist('\%#=2'.reg, t)
>
> setting t to '1' however causes the difference
>
> result of matchlist:
> new: ['', '', '', '', '', '', '', '', '', '']
> old: []
>
> So maybe with t="" this is a readfile related issue, too?

Isn't this the same for the old and the new engine?


> RegexTests [\p\+] ["\236a"]
> new: ['a', '', '', '', '', '', '', '', '', '']
> old: ['�ソスa', '', '', '', '', '', '', '', '', '']
> again an utf-8 issue as well as this:
> RegexTests [�ソス\|\Z] ["a"]

Yes, apparently \236 is not seen as a printable character.

> Tested with version: Included patches: 1-981
>
> I'm not sure what is influencing vim's utf-8 handling?
> I have &encoding=utf-8 set.

Need more testing...

> client-server communication fails every 500 times or so - no bytes
> are returned.

Timing issue?

--
JOHN CLEESE PLAYED: SECOND SOLDIER WITH A KEEN INTEREST IN BIRDS, LARGE MAN
WITH DEAD BODY, BLACK KNIGHT, MR NEWT (A VILLAGE
BLACKSMITH INTERESTED IN BURNING WITCHES), A QUITE
EXTRAORDINARILY RUDE FRENCHMAN, TIM THE WIZARD, SIR
LAUNCELOT
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

/// Bram Moolenaar -- Br...@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

mattn

unread,
May 21, 2013, 6:31:17 AM5/21/13
to vim...@googlegroups.com, Marc Weber
On Tuesday, May 21, 2013 7:04:55 PM UTC+9, Bram Moolenaar wrote:

Some syntax hilighighting won't work even though re=1.
For example:

1. type "google"
2. :syn clear | syn match ErrorMsg "\<go*\|go"

Using vim that older than 970:

http://go-gyazo.appspot.com/a16424e8636121c6.png

Using vim that 970 or later:

http://go-gyazo.appspot.com/4296eca38c88953d.png

Bram Moolenaar

unread,
May 21, 2013, 6:44:10 AM5/21/13
to mattn, vim...@googlegroups.com, Marc Weber
It works OK for me. Remember that after doing :set re=1
you need to do "syn clear", because the compiled pattern
is cached.

Nevertheless, it's a bug in the new engine:

echo matchlist('google', '\%#=1\<go*\|go')
echo matchlist('google', '\%#=2\<go*\|go')

--
NEIL INNES PLAYED: THE FIRST SELF-DESTRUCTIVE MONK, ROBIN'S LEAST FAVORITE
MINSTREL, THE PAGE CRUSHED BY A RABBIT, THE OWNER OF A DUCK

mattn

unread,
May 21, 2013, 6:48:34 AM5/21/13
to vim...@googlegroups.com, mattn, Marc Weber
On Tuesday, May 21, 2013 7:44:10 PM UTC+9, Bram Moolenaar wrote:
> Yasuhiro Matsumoto wrote:
>
>
>
> > On Tuesday, May 21, 2013 7:04:55 PM UTC+9, Bram Moolenaar wrote:
>
> >
>
> > Some syntax hilighighting won't work even though re=1.
>
> > For example:
>
> >
>
> > 1. type "google"
>
> > 2. :syn clear | syn match ErrorMsg "\<go*\|go"
>
> >
>
> > Using vim that older than 970:
>
> >
>
> > http://go-gyazo.appspot.com/a16424e8636121c6.png
>
> >
>
> > Using vim that 970 or later:
>
> >
>
> > http://go-gyazo.appspot.com/4296eca38c88953d.png
>
>
>
> It works OK for me. Remember that after doing :set re=1
>
> you need to do "syn clear", because the compiled pattern
>
> is cached.
>
>
>
> Nevertheless, it's a bug in the new engine:
>
>
>
> echo matchlist('google', '\%#=1\<go*\|go')
>
> echo matchlist('google', '\%#=2\<go*\|go')
>
>

Sorry I mistaken to write this post.

> Some syntax hilighighting won't work even though re=1.

Some syntax hilighighting won't work even though re=0.

mattn

unread,
May 21, 2013, 7:00:31 AM5/21/13
to vim...@googlegroups.com, mattn, Marc Weber
> you need to do "syn clear", because the compiled pattern

I tested this after restarting vim each times.

Bram Moolenaar

unread,
May 21, 2013, 7:27:28 AM5/21/13
to mattn, vim...@googlegroups.com

Yasuhiro Matsumoto wrote:

> * Some plugins doesn't work

Can you please come up with reproducible examples?

> * \%u is disabled

You mean it does not work:

echo matchlist('yes no', '\%#=1\%u0020')
echo matchlist('yes no', '\%#=2\%u0020')

However, this does work:
echo matchlist('yes no', '\%#=0\%u0020')

It falls back to the old engine automatically.
So, what is the problem?

> * test64 contains tests for multi-byte

Which ones? I suppose you mean that the file contains multi-byte
characters. I can fix that.

> * test95 doesn't pass without enc=utf-8

I sent out a patch for that. Please verify it works.

--
Did Adam and Eve have navels?

Marc Weber

unread,
May 21, 2013, 7:52:27 AM5/21/13
to Bram Moolenaar, vim...@vim.org
> I fixed \F yesterday. Hmm, but you say you include patch 981.
> What is the Vim command to reproduce this?
[1]

> > let reg = 'ú\Z'
> > let t = ""
> > echo matchlist('\%#=1'.reg, t)
> > echo matchlist('\%#=2'.reg, t)
> Isn't this the same for the old and the new engine?
right, I can no loger reproduce the "" case for whatever reason !?
strange. but t = "1" is still valid [2]

> Timing issue?


how to reproduce [1] and [2]?

run in Vim uncommenting either case:

python << EOF

def case(a,b):
f = open("/tmp/VIM_REGEX_TEST_STRINGS","w")
f.write(a)
f.close()
f = open("/tmp/VIM_REGEX_TEST","w")
f.write(b)
f.close()

# case 1
# case("\0a", "\\_F")

# case 2
case("1", "ú\\Z")
EOF

fun! RegexTest() abort
echom 'starting'
for regex in readfile('/tmp/VIM_REGEX_TEST')
for str in readfile('/tmp/VIM_REGEX_TEST_STRINGS')
try
try
let old = matchlist(str, '\%#=1'.regex)
catch /.*/
let old = ['error']
endtry
try
let new = matchlist(str, '\%#=2'.regex)
catch /.*/
let new = ['error']
endtry
if old == new || old == ['error'] || new == []
continue
else
echom 'bad =='
echom 'new: '.string(new)
echom 'old: '.string(old)
echom 'returning 0'
return '0'
endif
catch /.*/
echoe v:exception
echom 'returning exception 0'
return '0'
endtry
endfor
endfor
echom 'returning 1'
return '1'
endf

echo RegexTest()

Bram Moolenaar

unread,
May 21, 2013, 9:55:19 AM5/21/13
to Marc Weber, vim...@vim.org

Marc Weber wrote:

> > I fixed \F yesterday. Hmm, but you say you include patch 981.
> > What is the Vim command to reproduce this?
> [1]
>
> > > let reg = 'ú\Z'
> > > let t = ""
> > > echo matchlist('\%#=1'.reg, t)
> > > echo matchlist('\%#=2'.reg, t)
> > Isn't this the same for the old and the new engine?
> right, I can no loger reproduce the "" case for whatever reason !?
> strange. but t = "1" is still valid [2]

The test has the arguments swapped. It should be:

let reg = 'ú\Z'
let t = "t"
echo matchlist(t, '\%#=1'.reg)
echo matchlist(t, '\%#=2'.reg)

Still fails.

> > Timing issue?
>
>
> how to reproduce [1] and [2]?
>
> run in Vim uncommenting either case:
>
> python << EOF

That's complicated. This should do the same:

echo matchlist("1", '\%#=1ú\Z')
echo matchlist("1", '\%#=2ú\Z')

It's the same problem as above.

The second one doesn't work like that, requires the NUL to be in a
buffer line? No, I can't reproduce this. Can you reproduce the
difference with any Vim search or expression?


--
FATAL ERROR! SYSTEM HALTED! - Press any key to continue doing nothing.

Bram Moolenaar

unread,
May 21, 2013, 10:27:34 AM5/21/13
to Ken Takata, vim...@googlegroups.com

Ken Takata wrote:

> 2013/05/21 Tue 9:16:55 UTC+9 mattn wrote:
> > * Some plugins doesn't work
> > * \%u is disabled
> > * test64 contains tests for multi-byte
> > * test95 doesn't pass without enc=utf-8
>
> * Strange condition in the line 1094:
> if (*regparse == 'n' || *regparse == 'n')

Yeah, that's clearly wrong. Now I wonder what it was supposed to be.
'r' perhaps?

> \%u seems to be disabled because of lack of testing.
> See the comment in the line 1104.

This is only inside a range. Yeah, this function is much too long.
So this is about \d, \o, etc. inside []. Not about \%d etc.
Look in line 846.

--
People who want to share their religious views with you
almost never want you to share yours with them.

Bram Moolenaar

unread,
May 21, 2013, 11:51:47 AM5/21/13
to Yasuhiro MATSUMOTO, vim...@googlegroups.com

Yasuhiro Matsumoto wrote:

> One more. \ze does not work.

I disabled that, because it's flawed in the new engine, so the automatic
selection should use the old engine, right?

--
Mushrooms always grow in damp places and so they look like umbrellas.

Christian Brabandt

unread,
May 21, 2013, 3:21:10 PM5/21/13
to vim...@googlegroups.com
Hi Bram!

On Di, 21 Mai 2013, Bram Moolenaar wrote:

>
> Ken Takata wrote:
>
> > 2013/05/21 Tue 9:16:55 UTC+9 mattn wrote:
> > > * Some plugins doesn't work
> > > * \%u is disabled
> > > * test64 contains tests for multi-byte
> > > * test95 doesn't pass without enc=utf-8
> >
> > * Strange condition in the line 1094:
> > if (*regparse == 'n' || *regparse == 'n')
>
> Yeah, that's clearly wrong. Now I wonder what it was supposed to be.
> 'r' perhaps?

Yes from the context I think so.

regards,
Christian
--
Der sogannte Aberglaube beruht auf einer viel gr��eren Tiefe und
Delikatesse als der Unglaube.
-- Johann Wolfgang von Goethe (zu Riemer 12. Dez. 1806)

John Marriott

unread,
May 21, 2013, 4:05:45 PM5/21/13
to vim...@googlegroups.com
Hi all,

I apologise if this has already been covered elsewhere, but there is so much going on with the new regexp engine that I'm finding it difficult to keep up.

I have all patches for 7.3 from 1 to 1000. Setting regexpengine to 2 in my .vimrc on HP-UX and _vimrc on Win64 gives this this message when opening a c source file (say main.c from vim's source):
Error detected while processing /trace/tjmt1/vim/73/vim73/runtime/syntax/c.vim:
line  154:
E475: Invalid argument: cBracket^Itransparent start='\[\|<::\@!' end=']\|:>' end='}'me=s-1 contains=ALLBUT,cBlock,@cParenGroup,cErrInParen,cCppParen,cCppBracket,cCppString,@Spell
line  344:
E475: Invalid argument: cCppOutWrapper^Istart="^\s*\(%:\|#\)\s*if\s\+0\+\s*\($\|//\|/\*\|&\)" end=".\@=\|$" contains=cCppOutIf,cCppOutElse fold
Press ENTER or type command to continue

After the file is loaded closing square brackets are highlighted.

Opening the syntax file c.vim gives:
Error detected while processing /trace/tjmt1/vim/73/vim73/runtime/syntax/vim.vim:
line  114:
E475: Invalid argument: vimInsert^Imatchgroup=vimCommand start="^[: \t]*\(\d\+\(,\d\+\)\=\)\=a\%[ppend]$"^Imatchgroup=vimCommand end="^\.$""
line  115:
E475: Invalid argument: vimInsert^Imatchgroup=vimCommand start="^[: \t]*\(\d\+\(,\d\+\)\=\)\=c\%[hange]$"^Imatchgroup=vimCommand end="^\.$""
line  116:
E475: Invalid argument: vimInsert^Imatchgroup=vimCommand start="^[: \t]*\(\d\+\(,\d\+\)\=\)\=i\%[nsert]$"^Imatchgroup=vimCommand end="^\.$""
line  120:
E475: Invalid argument: vimBehave^I"\<be\%[have]\>" skipwhite nextgroup=vimBehaveModel,vimBehaveError
line  128:
E475: Invalid argument: vimFiletype^I"\<filet\%[ype]\(\s\+\I\i*\)*"^Iskipwhite contains=vimFTCmd,vimFTOption,vimFTError
line  141:
E475: Invalid argument: vimAugroup^Istart="\<aug\%[roup]\>\s\+\h\w*" end="\<aug\%[roup]\>\s\+[eE][nN][dD]\>"^Icontains=vimAugroupKey,vimAutoCmd,@vimAugroupList keepend

<a lot of lines removed>

line  733:
E475: Invalid argument: vimEmbedError start=+mz\%[scheme]\s*<<\s*\z(.*\)$+ end=+^\z1$+
line  749:
E475: Invalid argument: vimAugroupSyncA^Igroupthere NONE^I"\<aug\%[roup]\>\s\+[eE][nN][dD]"
Press ENTER or type command to continue

Of course this doesn't happen when regexpengine is 1.

Cheers
John

Christian Brabandt

unread,
May 21, 2013, 4:12:58 PM5/21/13
to vim...@googlegroups.com
Hi John!

On Mi, 22 Mai 2013, John Marriott wrote:

> Hi all,
>
> I apologise if this has already been covered elsewhere, but there is so much
> going on with the new regexp engine that I'm finding it difficult to keep up.
>
> I have all patches for 7.3 from 1 to 1000. Setting regexpengine to 2 in my
> .vimrc on HP-UX and _vimrc on Win64 gives this this message when opening a c
> source file (say main.c from vim's source):
> Error detected while processing /trace/tjmt1/vim/73/vim73/runtime/syntax/c.vim:
> line 154:
> E475: Invalid argument: cBracket^Itransparent start='\[\|<::\@!' end=']\|:>'

\@ assertions are not supported yet by the new engine.

> line 114:
> E475: Invalid argument: vimInsert^Imatchgroup=vimCommand start="^[: \t]*\(\d\+\
> (,\d\+\)\=\)\=a\%[ppend]$"^Imatchgroup=vimCommand end="^\.$""

\%[...] is not supported yet by the new engine

And also not supported yet are the many \%X atoms and \_[...]
collections.

I am not sure, if all those need to get fixed before the release.

regards,
Christian
--
Letzte Worte der Mutter:
"Ich hab mal deine Disketten sortiert."

James McCoy

unread,
May 21, 2013, 4:28:35 PM5/21/13
to vim_dev


On May 21, 2013 4:06 PM, "John Marriott" <basi...@internode.on.net> wrote:
> I have all patches for 7.3 from 1 to 1000. Setting regexpengine to 2 in my .vimrc

> [...]

> Of course this doesn't happen when regexpengine is 1.

As I understand it, the default setting is 0 for a reason.  The new engine isn't going to handle all the cases the existing engine does. The intention of the default setting is to use the new one where it's better and fallback to the old one when needed, so I don't think you should generally be running with only the new engine active.

Cheers,
James

Bram Moolenaar

unread,
May 21, 2013, 4:45:53 PM5/21/13
to John Marriott, vim...@googlegroups.com

John Marriott wrote:

> <html>
> <head>
> <meta content="text/html; charset=ISO-8859-1"
> http-equiv="Content-Type">
> </head>

Can you please send text messages?

As mentioned, don't set 'regexpengine' to 2 except for testing.

--
THEOREM: VI is perfect.
PROOF: VI in roman numerals is 6. The natural numbers < 6 which divide 6 are
1, 2, and 3. 1+2+3 = 6. So 6 is a perfect number. Therefore, VI is perfect.
QED
-- Arthur Tateishi

Christian Brabandt

unread,
May 21, 2013, 5:24:27 PM5/21/13
to vim...@googlegroups.com

On Di, 21 Mai 2013, Christian Brabandt wrote:

> And also not supported yet are \_[...]
> collections.

BTW: This patch enables the \_[...] collections.

diff --git a/src/regexp_nfa.c b/src/regexp_nfa.c
--- a/src/regexp_nfa.c
+++ b/src/regexp_nfa.c
@@ -679,9 +679,7 @@

/* "\_[" is collection plus newline */
if (c == '[')
- /* TODO: make this work
- * goto collection; */
- return FAIL;
+ goto collection;

/* "\_x" is character class plus newline */
/*FALLTHROUGH*/
@@ -891,7 +889,7 @@
}
break;

-/* collection: */
+collection:
case Magic('['):
/*
* Glue is emitted between several atoms from the [].


regards,
Christian
--
Skepsis ist das Zeichen - und sogar die Pose - des gebildeten
Verstandes.
-- John Dewey

h_east

unread,
May 22, 2013, 5:28:02 AM5/22/13
to vim...@googlegroups.com
Hi, Bram

I found NFA regexp engine bug.

Prerequisite
vim 7.3.1004
set re=0

How to reproduce
vim -N -u NONE -i NONE --noplugin
o<Esc>
/\n* or /\_.*

Actual
gone infinite loop.

How to take back control. (on linux)
pgrep -lf
kill -9 nnn <-- nnn is vim process number checked by pgrep.


I attached patch.
Please check this.

Best regards,
Hirohito Higashi

nfa_regexp_infiniteloop.patch

Bram Moolenaar

unread,
May 22, 2013, 8:38:16 AM5/22/13
to Christian Brabandt, vim...@googlegroups.com

Christian Brabandt wrote:

> On Di, 21 Mai 2013, Christian Brabandt wrote:
>
> > And also not supported yet are \_[...]
> > collections.
>
> BTW: This patch enables the \_[...] collections.
>
> diff --git a/src/regexp_nfa.c b/src/regexp_nfa.c
> --- a/src/regexp_nfa.c
> +++ b/src/regexp_nfa.c
> @@ -679,9 +679,7 @@
>
> /* "\_[" is collection plus newline */
> if (c == '[')
> - /* TODO: make this work
> - * goto collection; */
> - return FAIL;
> + goto collection;
>
> /* "\_x" is character class plus newline */
> /*FALLTHROUGH*/
> @@ -891,7 +889,7 @@
> }
> break;
>
> -/* collection: */
> +collection:
> case Magic('['):
> /*
> * Glue is emitted between several atoms from the [].
>

That just enables it, doesn't fix the problems caused.

I believe the pattern that failed was \_[0-9]\?\>.
Somehow the \_[] eats the line break and then \> fails to match.

Discovered using the dnsmasq syntax, especially the DnsmasqIPv4
match. Using syntax files as a wild test for regexp works quite well.
But it does not find everything (and I still have too few files to test
with).

Here are some more patterns that still fail:

Multi-byte problem? Marc Weber
echo matchlist('1', '\%#=1\o{\?Ä\Z')
echo matchlist('1', '\%#=2\o{\?Ä\Z')

Difference in matching this pattern: (Marc Weber)
echo matchlist("t", '\%#=1ú\Z')
echo matchlist("t", '\%#=2ú\Z')

Difference in matching this pattern:
echo matchlist('google', '\%#=1\<go*\|go')
echo matchlist('google', '\%#=2\<go*\|go')

Difference in matching this pattern: (Marc Weber)
echo matchlist("\na", '\%#=1\_F')
echo matchlist("\na", '\%#=0\_F')
echo matchlist("\na", '\%#=2\_F')

--
If you don't get everything you want, think of
everything you didn't get and don't want.

Bram Moolenaar

unread,
May 22, 2013, 10:39:26 AM5/22/13
to h_east, vim...@googlegroups.com
Great to see a patch for a regexp bug. I'll try it out tonight.

It would be nice if we can add a test for every problem we find,
so that we make sure it doesn't come back later.

--
CART DRIVER: Bring out your dead!
There are legs stick out of windows and doors. Two MEN are fighting in the
mud - covered from head to foot in it. Another MAN is on his hands in
knees shovelling mud into his mouth. We just catch sight of a MAN falling
into a well.
"Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

Marc Weber

unread,
May 22, 2013, 10:50:22 AM5/22/13
to vim_dev
Hi Bram,

I'd be interested in a high level overview about what can/should be done
for the release to happen.

This way people who are willing to help (eventually me) can pick a task,
document that they are working on it, and get it done.

Does it exist? If a titanpad or public google documents would be a
nice a fit.

Thoughts?

Marc Weber

Yasuhiro MATSUMOTO

unread,
May 22, 2013, 1:02:19 PM5/22/13
to vim...@googlegroups.com, h_east
Currently, I'm trying to fix \%# \%23c #%<23c .

https://gist.github.com/mattn/5626661

I'll send a patch in later.

mattn

unread,
May 23, 2013, 8:12:38 AM5/23/13
to vim...@googlegroups.com, h_east
Bram, I fixed:

* \%#
* \%23c \%23l
* \%<23c \%<23l
* \%>23c \%>23l
* Treat leading "*" as star character
* \_[
* \@!

https://gist.github.com/mattn/5626661

Please check this patch.

- Yasuhiro Matsumoto

Bram Moolenaar

unread,
May 23, 2013, 1:47:18 PM5/23/13
to Yasuhiro MATSUMOTO, vim...@googlegroups.com, h_east

Yasuhiro Matsumoto wrote:

> Currently, I'm trying to fix \%# \%23c #%<23c .
>
> https://gist.github.com/mattn/5626661
>
> I'll send a patch in later.

I very much appreciate the help.

Please also write tests. Lots of bugs slipped through because we don't
have sufficient testing.


--
DENNIS: Look, strange women lying on their backs in ponds handing out
swords ... that's no basis for a system of government. Supreme
executive power derives from a mandate from the masses, not from some
farcical aquatic ceremony.

Bram Moolenaar

unread,
May 23, 2013, 1:47:20 PM5/23/13
to Marc Weber, vim_dev
Main work now is fix bugs in the new regexp engine and make it faster.

There are items at the top of the todo file, but most of them are
patches that I need to review.

Another thing is to discuss the recent chanages to the Python API and
what should still be included in the coming week.

Friday is the deadline for new features and larger changes. After that
it's bug fixing only!

--
OLD WOMAN: Well, how did you become king, then?
ARTHUR: The Lady of the Lake, her arm clad in the purest shimmering samite,
held Excalibur aloft from the bosom of the water to signify by Divine
Providence ... that I, Arthur, was to carry Excalibur ... That is
why I am your king!

Marc Weber

unread,
May 23, 2013, 2:44:45 PM5/23/13
to Bram Moolenaar, vim_dev
As I've said: having vim read plugin/*.vim files and adding some
runtimepath/python-lib dir if it exists to sys.dir would be genious.

Then people could use plugin/plugin.py only and have it lazily load its
code using "import"

Marc Weber

Ben Fritz

unread,
May 23, 2013, 3:16:17 PM5/23/13
to vim...@googlegroups.com, Marc Weber
On Thursday, May 23, 2013 12:47:20 PM UTC-5, Bram Moolenaar wrote:
>
>
> Friday is the deadline for new features and larger changes. After that
>
> it's bug fixing only!
>

Friday in 8 days, I assume, not Friday tomorrow?

Bram Moolenaar

unread,
May 23, 2013, 3:53:21 PM5/23/13
to Ben Fritz, vim...@googlegroups.com, Marc Weber
Yes, end of May.

--
Hanson's Treatment of Time:
There are never enough hours in a day, but always too
many days before Saturday.

Yasuhiro MATSUMOTO

unread,
May 23, 2013, 7:11:55 PM5/23/13
to vim...@googlegroups.com, h_east
OK, I'll do it.
However I doubt that there are some cases that new regexp engine is
slower than original. This change make be possible to work with html
file syntax, but re=2 seems to be slower than re=1.

Bram Moolenaar

unread,
May 24, 2013, 9:15:47 AM5/24/13
to Yasuhiro MATSUMOTO, vim...@googlegroups.com, h_east

Yasuhiro Matsumoto wrote:

> OK, I'll do it.
> However I doubt that there are some cases that new regexp engine is
> slower than original. This change make be possible to work with html
> file syntax, but re=2 seems to be slower than re=1.

The NFA engine is known to be slower on simple patterns, but much faster
on complicated patterns. Especially patterns with "*" or "\+" should be
faster, because the backtracking engine makes many attempts and retries,
while the NFA engine explores all possible solutions in parallel.

After tuning and fixing bugs I want to only use the NFA engine for
complicated patterns, then we should really see the advantage.
The line length also matters.

Of course we can't use the NFA engine for items it does not support,
thus a mix of "*" and any not supported item would result in falling
back to the (slow) old engine.


--
It is too bad that the speed of light hasn't kept pace with the
changes in CPU speed and network bandwidth. -- <wie...@porcupine.org>

mattn

unread,
May 26, 2013, 9:23:11 PM5/26/13
to vim...@googlegroups.com, Yasuhiro MATSUMOTO, h_east
On Friday, May 24, 2013 2:47:18 AM UTC+9, Bram Moolenaar wrote:
> Please also write tests. Lots of bugs slipped through because we don't
> have sufficient testing.


https://gist.github.com/mattn/5626661

I added test96

Bram Moolenaar

unread,
May 27, 2013, 5:32:51 AM5/27/13
to mattn, vim...@googlegroups.com, h_east
Great, thanks.

I fixed all the known problems in the NFA engine. I was looking into
making it work faster. However, one syntax that is known to be slow,
XML, falls back to the old engine. Thus we need to add the missing
features first.

Adding the \@<= item will be difficult though. I wonder if there is any
not-backtracking regexp engine that does something like this.

--
I learned the customs and mannerisms of engineers by observing them, much the
way Jane Goodall learned about the great apes, but without the hassle of
grooming.
(Scott Adams - The Dilbert principle)
Reply all
Reply to author
Forward
0 new messages