RFE: support POSIX standard and developing RE's

123 views
Skip to first unread message

L. A. Walsh

unread,
Mar 24, 2016, 6:27:35 AM3/24/16
to Vim Users, vim...@googlegroups.com
Posix, has 2 official RE's already, the modern REs( like in
grep -E, (extended RE's)
and "obsolete RE's" as found in ed, called "basic REs".

Additionally for the past few years, more gnu utils (like grep -P)
have started supporting a third type of RE's called
PCRE [Perl Compatible RE's] that seem to be on their way
to becoming a 3rd official type of RE.

Would it be possible to add the 3 RE's (w/appropriate flags)
to invoke those standardized expressions (not as a replacement
for any of the existing RE's), but w/different flags.

This would allow those who know the posix-compat RE's that
are becoming more wide spread in usage, and would allow for
easier, direct usage (cut&paste) of the alternate RE's specifically
to make it easer to define these expressions in shell-vars and/or
vim-macros to allow for easier portability and usability between
vim and other posix & gnu utils? Note in the past few years,
the pcreRE's have also added python-specific features to the
syntax to allow for easier porting of python features.

Probably (or maybe) best of all, as all of these RE's are
becoming more prevalent in posix, unix and linux environments,
it would be a great benefit for people to be able to switch
to alternate RE's based on familiarity and and the greater
uniformity in these classes.

Seems this would lower the learning curve for RE usage in
vim where it often, idiosyncratically differs from such,
requiring much trial and error and wasted time to get
equivalent vim-compat-RE's that are equivalent to other
industry standard RE's.

Anyway, thought I'd mention this, since vim already has
multiple incompatible RE's with existing standards and
thought that providing a few "new POSIX-compat RE's" would
only help in making vim easier to use.

Thanks for your time!
-linda


Of course,

Christian Brabandt

unread,
Mar 24, 2016, 8:04:02 AM3/24/16
to Vim Users, vim...@googlegroups.com
There is https://github.com/vim/vim/issues/99
You might want to check, if this works for you.

Best,
Christian
--
Mögest Du leben, solange Du willst - und wollen, solange Du lebst!

BPJ

unread,
Mar 24, 2016, 8:13:16 AM3/24/16
to vim...@googlegroups.com, vim...@googlegroups.com
If I'm not mistaken that's "extended" as in /x,  a different sense from "extended" as in ERE.

i would like to have "extended as in /x" FWIW.
 

Best,
Christian
--
Mögest Du leben, solange Du willst - und wollen, solange Du lebst!

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

shawn wilson

unread,
Mar 24, 2016, 2:06:42 PM3/24/16
to vim...@googlegroups.com, vim...@googlegroups.com

Instead of implementing one or another regex type in core, it might be better to know about and hook into libs for their regex engines. For example, libperl for perl's engine when +perl or libpcre as another option. IDK you can do the same with python, I think you can with ruby and IIRC Lua uses libpcre.

L. A. Walsh

unread,
Apr 12, 2016, 1:46:02 PM4/12/16
to Vim Users, vim...@googlegroups.com
Christian Brabandt wrote:
There is https://github.com/vim/vim/issues/99
You might want to check, if this works for you.
  
----
    If vim supported posix extended RE's, then, like, say grep,
it could also support Perl RE's, from the PCRE library.  Perl supports
the "/x" to ignore whitespace for readability.  I.e. the author was saying
they wanted to implement some flavor of PCRE's, but really wanted the "/x"
feature, which would have been alot easier to do from Vim's current
feature set.

    If Vim could _at least_ support extended 'RE's, and if it was done
in a modular fashion, then it seems adding other 'RE' engines would be
easier.  Note, I don't know about current benchmarks, but PCRE was the
fastest 'RE' engine out of any of the standard 'RE' engines as well, by
far, the most expressive.  Perl even bent over backwards to implement
Python-RE specific features to make it easy to port Python-RE's along
with all the POSIX RE's.

-----------------------------------------------------------------
BPJ wrote:
There is https://github.com/vim/vim/issues/99
You might want to check, if this works for you.

If I'm not mistaken that's "extended" as in /x,  a different sense from "extended" as in ERE.

i would like to have "extended as in /x" FWIW.
If vim could include the PCRE engine (then you'd have this automatically).
And you are right "/x" is not the same as POSIX extended RE's, but is the
same as PCRE's "/x" switch.



L. A. Walsh

unread,
Apr 12, 2016, 2:00:01 PM4/12/16
to vim...@googlegroups.com, vim...@googlegroups.com
shawn wilson wrote:
>
> Instead of implementing one or another regex type in core, it might be
> better to know about and hook into libs for their regex engines. For
> example, libperl for perl's engine when +perl or libpcre as another
> option. IDK you can do the same with python, I think you can with ruby
> and IIRC Lua uses libpcre.
>
----
I mostly agree. While I would *want* and love the +perl/libpcre
option @ compile time, I wouldn't want it to change the standard/current
vim RE, else millions of lines of current macros would break. Choosing
which RE, whether it be POSIX ex-RE's, or PCRE's (which I prefer even
more), needs to be done in a _sane_ orthogonal way.

Sadly, the method grep uses -{F,G,E,P} = fixed-strings, basic (often
called obsolete RE's, though supported as default in most GNU
tools for compat. purposes), Extended, and PCRE's.

Maybe having "\X" & "\P" for extended and pcre's would be a start,
though I'd _like_ to see a way of choosing different RE's for
use in macros & .vim files (for compat), and a 2nd option for
interactive RE's (thus eliminating the need for the "/[vmMVXP]"
on each search or substitute).

Note that PCRE's are frequently provided as a 4th option to the
3 POSIX RE's in gnu and other tools -- they are not just used in
Perl (as the name might lead some to think). I think that's one
reason why Perl added the python-specific features so it could
still retain its "most feature rich" status (but that's speculation
on my part).

The thing that blew me away -- PCRE's are faster than even
plain-text searches (let alone the Basic and Extended ones).




Christian Brabandt

unread,
Apr 12, 2016, 3:15:30 PM4/12/16
to Vim Users, vim...@googlegroups.com
On Di, 12 Apr 2016, L. A. Walsh wrote:

> Christian Brabandt wrote:
> >There is https://github.com/vim/vim/issues/99
> >You might want to check, if this works for you.
> ----
> If vim supported posix extended RE's, then, like, say grep,
> it could also support Perl RE's, from the PCRE library. Perl supports
> the "/x" to ignore whitespace for readability. I.e. the author was saying
> they wanted to implement some flavor of PCRE's, but really wanted the "/x"
> feature, which would have been alot easier to do from Vim's current
> feature set.

The thing is, Vims RE support atoms, that other RE engines do not
support. Think about e.g. \_. \< \%l \%'m

That makes adding another RE engine hard.


Best,
Christian
--
Hallo Meerschweinchenbesitzer!

Erik Christiansen

unread,
Apr 14, 2016, 6:14:46 AM4/14/16
to Vim Users, vim...@googlegroups.com
On 12.04.16 21:15, Christian Brabandt wrote:
> On Di, 12 Apr 2016, L. A. Walsh wrote:
> > If vim supported posix extended RE's,

Some of us have been asking for that for around a decade now.
So many unix utilities support POSIX "Modern" EREs, that it is the best
standard to conform to. There's then only one regex dialect to learn.
(Queue horn fanfare and singing angels)

> The thing is, Vims RE support atoms, that other RE engines do not
> support. Think about e.g. \_. \< \%l \%'m

Never heard of 'em, and don't waste wet RAM on dialect tricks which
won't work in grep and awk, and ... , as it just leads to frustration.

> That makes adding another RE engine hard.

If so, it's only hard once, not every day, as with cross-tool regex
chaos.

It is Unix which "is the IDE", not any single application. The "Eclectic
Rubbish Lister" has wandered off into insular dialect land.
Unfortunately Vim has committed the same folly. Now it is time to pay
the piper.

Where to from here, then? To bring order, can we not _finally_ adopt
POSIX EREs, adding the parochial \_. \< \%l \%'m stuff as extensions?

I did compile Vim with a POSIX ERE regex engine many years ago. It
worked fine, but the help broke, there wasn't time to fix that, and I
only used it for a couple of months.

So substituting an improved RE engine is not difficult. Extending that
to add vimishness might take a little longer, but it has been done in
the existing engine. It would be wonderful if that could be done in my
lifetime.

Personally, I'd settle for a compile option which simply substituted
POSIX EREs, without breaking the help. The vimishness could then go
hang. Would that also suffice for the others advocating POSIX ?

A subsequent step might then be to add vimishness, and make the new
engine mainstream? VEREs anyone?

Erik

Christian Brabandt

unread,
Apr 14, 2016, 8:40:30 AM4/14/16
to vim...@googlegroups.com
Am 2016-04-14 12:14, schrieb Erik Christiansen:
> On 12.04.16 21:15, Christian Brabandt wrote:
>> On Di, 12 Apr 2016, L. A. Walsh wrote:
>> > If vim supported posix extended RE's,
>
> Some of us have been asking for that for around a decade now.

Well, some of us have been asking for other features even longer
and have even contributed code. Look at the vartabs feature

> So many unix utilities support POSIX "Modern" EREs, that it is the best
> standard to conform to. There's then only one regex dialect to learn.
> (Queue horn fanfare and singing angels)

And that is an argument for what, considering that vi comes from a time,
where BRE where the default RE dialect?

>> The thing is, Vims RE support atoms, that other RE engines do not
>> support. Think about e.g. \_. \< \%l \%'m
>
> Never heard of 'em, and don't waste wet RAM on dialect tricks which
> won't work in grep and awk, and ... , as it just leads to frustration.

See, if you really want to discuss seriously, you should try to be
polite and
do not troll. Then you should know, that just because you don't need a
feature,
does not mean, we should not implement it. And perhaps you should spend
a little
time in :h pattern.txt and read what those patterns are for, before you
come to the conclusion that this is not needed.

>
>> That makes adding another RE engine hard.
>
> If so, it's only hard once, not every day, as with cross-tool regex
> chaos.

Look at all the bugs, that were needed to be fixed when integrating the
second engine, before you say this please.

> It is Unix which "is the IDE", not any single application. The
> "Eclectic
> Rubbish Lister" has wandered off into insular dialect land.
> Unfortunately Vim has committed the same folly. Now it is time to pay
> the piper.
>
> Where to from here, then? To bring order, can we not _finally_ adopt
> POSIX EREs, adding the parochial \_. \< \%l \%'m stuff as extensions?

Sure. Codes speakes louder than words. And someone has to make the
effort.
And the fact that this has not been done could mean, that nobody really
cared about POSIX ERE compatibility.

> I did compile Vim with a POSIX ERE regex engine many years ago. It
> worked fine, but the help broke, there wasn't time to fix that, and I
> only used it for a couple of months.


I really really doubt this was ever possible. Please tell us exactly
what
version this was and what POSIX ERE engine you used. The current
codebase
uses a lot of the vim specific regex functions, so I would be surprised,
that
this actually compiled.


> So substituting an improved RE engine is not difficult. Extending that
> to add vimishness might take a little longer, but it has been done in
> the existing engine. It would be wonderful if that could be done in my
> lifetime.
>
> Personally, I'd settle for a compile option which simply substituted
> POSIX EREs, without breaking the help. The vimishness could then go
> hang. Would that also suffice for the others advocating POSIX ?
>
> A subsequent step might then be to add vimishness, and make the new
> engine mainstream? VEREs anyone?

Well, nobody prevents you from contributing ;)

Best,
Christian

Erik Christiansen

unread,
Apr 14, 2016, 10:35:24 AM4/14/16
to vim...@googlegroups.com
On 14.04.16 14:40, Christian Brabandt wrote:
> Am 2016-04-14 12:14, schrieb Erik Christiansen:
> >So many unix utilities support POSIX "Modern" EREs, that it is the best
> >standard to conform to. There's then only one regex dialect to learn.
> >(Queue horn fanfare and singing angels)
>
> And that is an argument for what, considering that vi comes from a time,
> where BRE where the default RE dialect?

Consistent regexes across unix utilities. Perhaps I was not sufficiently
explicit in that regard? I note the deep attachment to obsolete BREs
expressed above, but the rest of the world has moved on to modern EREs.

O'Reilly's "Mastering Regular Expressions" mentions that "POSIX
standardized the workings of over 70 programs, including traditional
regex-wielding tools such as awk, ed, egrep, expr, grep, and sed."
(And mutt, lex, ...)

$ man 7 regex

> >>The thing is, Vims RE support atoms, that other RE engines do not
> >>support. Think about e.g. \_. \< \%l \%'m
> >
> >Never heard of 'em, and don't waste wet RAM on dialect tricks which
> >won't work in grep and awk, and ... , as it just leads to frustration.
>
> See, if you really want to discuss seriously, you should try to be
> polite and do not troll.

Please accept my apologies for expressing a personal preference. I
didn't realise that you would not permit that.

> Then you should know, that just because you don't need a feature, does
> not mean, we should not implement it.

Excuse me, but now you are fantasising, I submit. How can you possibly
extrapolate that from one personal preference. (Admittedly expressed
without your prior permission)

> And perhaps you should spend a little time in :h pattern.txt and read
> what those patterns are for, before you come to the conclusion that
> this is not needed.

Where have I said that it is not needed? My consistent cross-tool use of
posix idiom means that I get along fine without it, but I have made no
suggestions about anyone else's usage. My migration proposal was careful
to avoid inconvenience to adherents of obsolete regex dialects.

> >
> >>That makes adding another RE engine hard.
> >
> >If so, it's only hard once, not every day, as with cross-tool regex
> >chaos.
>
> Look at all the bugs, that were needed to be fixed when integrating
> the second engine, before you say this please.

Hmmm ... that sounds slightly irritable. Even if several bugfixes follow
the initial porting of a posix regex engine, it is still a one-time
cycle, rather than endless years of user inconvenience. (I'm retired,
after several decades of software development, and do tend to forget the
pain of bugfixes after the product is in the field. But in embedded
systems we did an awful lot of testing, and few escaped to the wild.)

> >It is Unix which "is the IDE", not any single application. The
> >"Eclectic Rubbish Lister" has wandered off into insular dialect land.
> >Unfortunately Vim has committed the same folly. Now it is time to pay
> >the piper.
> >
> >Where to from here, then? To bring order, can we not _finally_ adopt
> >POSIX EREs, adding the parochial \_. \< \%l \%'m stuff as
> >extensions?
>
> Sure. Codes speakes louder than words. And someone has to make the
> effort. And the fact that this has not been done could mean, that
> nobody really cared about POSIX ERE compatibility.

Noted. It wasn't considered important in 2013 either, when the topic
last had a significant airing that I recall.

> >I did compile Vim with a POSIX ERE regex engine many years ago. It
> >worked fine, but the help broke, there wasn't time to fix that, and I
> >only used it for a couple of months.
>
>
> I really really doubt this was ever possible. Please tell us exactly
> what version this was and what POSIX ERE engine you used. The current
> codebase uses a lot of the vim specific regex functions, so I would be
> surprised, that this actually compiled.

If we doubt each other's truthfulness, then there is little to discuss.
Perhaps you have had a hard day, and are not thinking clearly. This
post:

https://groups.google.com/forum/#!msg/vim_use/s6cUfZs7SYo/w1_9QOXONpQJ

reminds me that some function renaming and parameter wrangly through
wrappers was used to hook up to the needed functions. A little makefile
tweak would have been in order. The result gave me posix regexes in
searches, which was what I sought. Help broke, and i didn't test
scripting. (If I need scripted regexes, I'll use awk or sed, thankyou.)

It was around a decade ago. There may be older related posts in the
archive.

>
> >So substituting an improved RE engine is not difficult. Extending
> >that to add vimishness might take a little longer, but it has been
> >done in the existing engine. It would be wonderful if that could be
> >done in my lifetime.
> >
> >Personally, I'd settle for a compile option which simply substituted
> >POSIX EREs, without breaking the help. The vimishness could then go
> >hang. Would that also suffice for the others advocating POSIX ?
> >
> >A subsequent step might then be to add vimishness, and make the new
> >engine mainstream? VEREs anyone?
>
> Well, nobody prevents you from contributing ;)

That is true. And in retirement, there should be time enough, in theory.

Erik

LCD 47

unread,
Apr 14, 2016, 11:00:43 AM4/14/16
to vim...@googlegroups.com
On 15 April 2016, Erik Christiansen <dva...@internode.on.net> wrote:
> On 14.04.16 14:40, Christian Brabandt wrote:
> > Am 2016-04-14 12:14, schrieb Erik Christiansen:
> > >So many unix utilities support POSIX "Modern" EREs, that it is the
> > >best standard to conform to. There's then only one regex dialect to
> > >learn. (Queue horn fanfare and singing angels)
> >
> > And that is an argument for what, considering that vi comes from a
> > time, where BRE where the default RE dialect?
>
> Consistent regexes across unix utilities. Perhaps I was not
> sufficiently explicit in that regard? I note the deep attachment to
> obsolete BREs expressed above, but the rest of the world has moved on
> to modern EREs.
>
> O'Reilly's "Mastering Regular Expressions" mentions that "POSIX
> standardized the workings of over 70 programs, including traditional
> regex-wielding tools such as awk, ed, egrep, expr, grep, and sed."
> (And mutt, lex, ...)
[...]

"Consistent regexes across unix utilities"? You sir are either a
troll, or simply have no idea what you're talking about.

Take a look here:

https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

So why ERE instead of PCRE? Oniguruma? RE2?

Actually, try something simpler:

$ grep --version
grep (GNU grep) 2.24
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

$ echo 'foo bar' | egrep -o '[[:<:]]bar'
grep: Invalid character class name

That's because GNU grep has its own \< and \> instead of POSIX
[[:<:]] and [[:>:]]. Have you considered starting a crusade to convince
GNU people to adhere to POSIX conventions, in the name of consistenty?

As for Vim: its regexes have features not present in any other
language. People use them, and thousands of plugins and syntax files
rely on them. You're asking to break all of them because you _prefer_
something else? Wake up please. It would have been nice if Vim regexes
had a nicer syntax. It's some 20 years too late to change them now.

/lcd

Ben Fritz

unread,
Apr 14, 2016, 11:25:17 AM4/14/16
to vim_use, vim...@googlegroups.com

I wonder if a different approach might help.

Vim already has :perldo, :pydo, etc. Perhaps a :perlmatch, :pymatch, etc. could be added for basic searching in those languages?

There is also a patch in the todo list for :bvimgrep. Maybe a :bgrep command could also be added. I think that would allow searching the current buffer using whatever tool you like.

Danek Duvall

unread,
Apr 14, 2016, 1:07:04 PM4/14/16
to Vim Users, vim...@googlegroups.com
It sounded to me like the request wasn't for a new RE engine, but a new RE
syntax, and one that would only be turned on explicitly by a flag.

Perhaps this conflicts with existing syntax, I'm not sure, but you could
say something like "/<ERE>/SE", which would search for the ERE "<ERE>",
because the Syntax flag was set to Extended regular expression. Similarly,
"/<PCRE>/SP". Those regular expressions would be handed off to the
appropriate engine (if support were available), and if those syntaxes
didn't support all the features of Vim's regular expression language, no
problem, no problem. The person writing those expressions would know their
limitations, and not use features not available in those languages.

That's vastly different than the recent change to the new engine, where it
needed to support the same features as the old one. And much more
tractable, I'd think. More a matter of defining the interface and setting
appropriate expectations with regard to availability and differences from
"normal" vim regexes.

Danek

Erik Christiansen

unread,
Apr 15, 2016, 5:54:58 AM4/15/16
to vim...@googlegroups.com
On 14.04.16 18:00, LCD 47 wrote:
>
> As for Vim: its regexes have features not present in any other
> language. People use them, and thousands of plugins and syntax files
> rely on them. You're asking to break all of them because you _prefer_
> something else?

Please quote a whole message where I have done anything remotely like
that. (Nothing out of context, please.) My proposed renovation was
careful to avoid _any_ impact on existing uses, but providing POSIX
only as a compile option - i.e. totally unseen by unenthused users.
(Last two paragraphs of my first post on this thread.)

Looking back on the wording, I can see that it was incautious, and not
remotely inducive to winning over adherents of the old BREs. It did not
occur to me that it would result in misunderstanding through partial
reading.

> Wake up please. It would have been nice if Vim regexes
> had a nicer syntax. It's some 20 years too late to change them now.

While POSIX conformity is not desired, it will also be a thousand years
too soon. ;-)

Anyone who had actually read both my posts would understand that I do
not advocate replacement, but addition.

Erik

LCD 47

unread,
Apr 15, 2016, 7:24:25 AM4/15/16
to vim...@googlegroups.com
On 15 April 2016, Erik Christiansen <dva...@internode.on.net> wrote:
> On 14.04.16 18:00, LCD 47 wrote:
> >
> > As for Vim: its regexes have features not present in any other
> > language. People use them, and thousands of plugins and syntax
> > files rely on them. You're asking to break all of them because you
> > _prefer_ something else?
>
> Please quote a whole message where I have done anything remotely like
> that. (Nothing out of context, please.) My proposed renovation was
> careful to avoid _any_ impact on existing uses, but providing POSIX
> only as a compile option - i.e. totally unseen by unenthused users.
> (Last two paragraphs of my first post on this thread.)
[...]

Oh, you didn't state it explicitly, you just aren't thinking things
to their last consequences. To spell it out for you:

(1) You can't change the syntax for / and s/// because that would break
just about every script out there.

(2) You can't add an _option_ to change the syntax for / and s/// on the
fly either, because all scripts out there would have to be aware of
the new option.

(3) This leaves you with adding new commands, that would do / and s///
with a different regex syntax. However, all !@#$... keys are
already taken, and are widely in use by other thousands of scripts.

(4) Which means you'd have to implement new / and s/// as : commands.
However, you already have :perldo and :pydo for that, yet nobody
seems to be in a hurry to write wrappers for them as replacement for
/ and s///.

Now, why would (4) be. Because nobody can think of any reasonable
user interface for it? Not really, Tim Pope's "abolish" has a :S///
command:

https://github.com/tpope/vim-abolish

Then perhaps because the existing regexes are actually usable after
all, despite being weird at first?

I'm just scratching the surface here. Something that neither you
nor any of the other proponents of improved regex engines seem to see is
that s/// is actually _very_ different from search alone. There are a
zillion of libraries for regex search (all slightly incompatible to one
another, by the way), but none for search and replace. Sure, they all
have APIs that allow in principle to implement replacing, but the common
APIs still make it very hard to have equivalents for Vim's \zs and \ze.
The closest you can get to \zs is Perl's \K, but that's quirky, and
few people trust it enough to use it. Well, if you'd dig through the
sources of Perl, Ruby, and the like, you'd eventually find out that Vim
has actually done it right, while everything else is still way behind.

Did it occur to you that what you're asking for might not actually
make as much sense as it first seems, rather than Vim developpers being
stubborn about it?

/lcd

Nikolay Aleksandrovich Pavlov

unread,
Apr 15, 2016, 3:38:04 PM4/15/16
to vim...@googlegroups.com
About this I think the following:

1. If implemented, PCRE syntax support should be like current engine
select option support, though without possibility to use `set re` to
switch to PCRE globally. I.e. `\%#=P` as the first charaters in
pattern will enable PCRE syntax and select libpcre (or whatever will
be used) as an engine. But `set re=P` is a error and `P` is literal
`P` and not some number. This way it will not break anything existing.
2. `g/` and `g?` are still free. As well as `z/` and `z?`. These
should be reserved for users because it is natural to map `g/` to
`/\%#=P`. But it is not needed to create such a standard mapping:
maybe some time later better re syntax (e.g. perl6) will be added; and
this mapping is very trivial for users to write. Mapping should be in
the vimrc_example though.
3. Most of bugs created by introducing new re engine were bugs in the
new re engine, not in the code that selects engine to use. If it is
possible to employ libpcre or regex engine from libc then most of
these bugs will not occur because new code is code that integrates
regex engine. Integration code is not going to be as huge as regex
engine itself, and regex engines like libpcre are thoroughly tested.
4. Even though Vim regex syntax sucks, it does not have many features
missing. Basically I miss named capturing groups (i.e. `(?<name>…)`),
recursive regular expressions (i.e. `(?R…)`), unicode support (i.e.
\p{character set description based on unicode properties}), sometimes
I think that I may make use of (?{expr}) (zero-width atoms that
executes perl (perl only, it is not PCRE feature) code when this atom
is matched, and this atom always matches when it has a chance to). In
applications which use libpcre I miss non-fixed-width look-behinds:
this is the only way I know to match non-escaped something (i.e. the
only way I know to check whether number of preceding backslashes is
even or odd).

So as @LCD 47 said “what you are asking for might not actually
make as much sense as it first seems”. It is better to have these
features in core then to have PCRE syntax support.
5. One cannot implement PCRE-enabled s/// based on :perldo or :pydo.
Some regular expressions may match more then one line (and I *do* use
these), so one needs the whole buffer. This is going to be slow on
large buffers. This is also going to be hacky: one cannot take the
whole buffer contents, run regular expression and assign buffer
contents back: such action will make Vim think that the whole buffer
was replaced. So it is needed to compute where change starts and ends
and replace only these lines. Likely :undojoin also needs not be
forgotten.

So it is not too easy to make :s command with PCRE syntax and such
action does not provide much benefit. Of course there is no hurry.

Eric Christopherson

unread,
Apr 15, 2016, 4:03:51 PM4/15/16
to vim_use
Just FYI:

The name Perl-Compatible Regular Expressions is a misnomer. PCRE is not strictly Perl-compatible (and I'm guessing Perl doesn't deal 100% appropriately when fed PCRE either, although it has picked up at least some of PCRE's extensions). It's not part of the Perl project.
 

shawn wilson

unread,
Apr 15, 2016, 6:58:55 PM4/15/16
to vim...@googlegroups.com

That's why I listed libperl *and* libpcre. I definitely find libpcre lacking (being a perl user).

Linda W

unread,
Apr 27, 2016, 7:09:18 PM4/27/16
to lcd...@gmail.com, vim...@googlegroups.com
LCD 47 wrote:
On 15 April 2016, Erik Christiansen <dva...@internode.on.net> wrote:
  
On 14.04.16 14:40, Christian Brabandt wrote:
    
Am 2016-04-14 12:14, schrieb Erik Christiansen:
      
So many unix utilities support POSIX "Modern" EREs, that it is the best standard to conform to.
        
And that is an argument for what, considering that vi comes from a
time, where BRE where the default RE dialect?
      
----
    The argument for that: vim descends from 'vi' which was the visual
editor version of 'ed'.  Use of 'ed' has nearly evaporated, however,
'sed' the stream version of 'ed' (both gnu and early unix utils) DOES offer ERE's as an option, thus answering your question -- i.e. if 'vim'
stayed current with current versions of its ancestors, it would already
have the option.  'sed', 'grep' and others have done what any living program does -- they evolve.  'vim' has yet to evolve in this area.


Consistent regexes across unix utilities. Perhaps I was not
sufficiently explicit in that regard? I note the deep attachment to
obsolete BREs expressed above, but the rest of the world has moved on
to modern EREs.
    
BRE's are compatible with ERE's.  If you *only* use BRE syntax, then any
prog using ERE's "should" still work the same for you.  PCRE isn't 100% backwards compatible because they chose to make it slightly easier to use
than ERE's.  Example, supporting '/x' at the end to allow & ignore embedded whitespace for readability.  Second example:  "\" *ALWAYS* means to take the next character as a literal -- thus no special cases and no special rules to remember (except on the fifth Thursday that falls after
a new moon in February...:-) ).

O'Reilly's "Mastering Regular Expressions" mentions that "POSIX
standardized the workings of over 70 programs, including traditional
regex-wielding tools such as awk, ed, egrep, expr, grep, and sed."
(And mutt, lex, ...)
    
[...]

    "Consistent regexes across unix utilities"?  You sir are either a
troll, or simply have no idea what you're talking about.
  
----
    And you are an aggressive nerdbutt!  Did you even bother to fact check
your statement before calling names?  It does say "POSIX -- An attempt at standardization -- and how they got most programs that POSIX described,
to support BRE's, ERE's or both.  It also brought up the problem of
locale's and unicode.  To date, I believe Perl's RE has the most comprehensive coverage of unicode of any RE by a wide margin.  You can specify chars by charname, codepoint, or just "typing them in".  If your
favorite RE doesn't handle the basics -- like upper & lower case of
all the characters handled in all the languages included in Unicode, it doesn't begin to handle the needs of a multi-lingual world.


    Take a look here:

https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines

    So why ERE instead of PCRE?  Oniguruma?  RE2?
  
Actually ERE is at least a standard, and, at least has that going for it,
thought it keeps looking like PCRE will join the group, POSIX moves at
a glacial pace except on matters of great unimportance.  Actually POSIX is
really 'dead', as the current entity calling itself POSIX doesn't believe in or adhere to the original POSIX's mission statement of being a declarative body (telling people what is there and the commonalities they can rely on), vs. the new POSIX's mission is to dumb down the interface and *prescribe* behaviors that weren't there before, but to talk about that would raise my BP by about 20 points and be fairly worthless.


    Actually, try something simpler:

$ grep --version
grep (GNU grep) 2.24
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.

$ echo 'foo bar' | egrep -o '[[:<:]]bar'
grep: Invalid character class name
  
What is [[:<:]]?  What standard is it a part of?  Never seen it before.
Um:
perl -we 'use strict; use P;
> my $a="now < there";
> $a =~ /[[:<:]] there/;
> '
POSIX class [:<:] unknown in regex; marked by <-- HERE in m/[[:<:] <-- HERE ] there/ at -e line 3.
----
Oh, You made up your own syntax!   I see.  It's not part of any standard.

egrep doesn't support PCRE's extended character classes -- and it is
fully compliant in it's documenting the fact that it only supports
text-matching (grep -f), BRE (grep), ERE (grep -E) and PCRE's (grep -P).

What documentation or standard are you claiming grep isn't following in
regards to it's RE engine?

Could you point me at the bug report?



I
    That's because GNU grep has its own \< and \> instead of POSIX
[[:<:]] and [[:>:]].  Have you considered starting a crusade to convince
GNU people to adhere to POSIX conventions, in the name of consistenty?

    As for Vim: its regexes have features not present in any other
language.
Such as?  You could support a vim-compatible RE in perl if you wanted to
write one -- it allows plugins and compatibility -- can Vim support
PCRE's -- that library is already to be plugged in, so show me how
wonderful vim's static and arcane RE syntax is better than, say, perl's?

You can embed code in the middle of a perl RE, to handle any matching case (there are also many security provisions that you must comply with to use
such features, but they are there.
  People use them, and thousands of plugins and syntax files
rely on them. 
Yeah -- well people use PCRE's and millions of people rely on them.
Javascript's RE was almost entirely derived from perl's when it was
implemented.  Show me 1 webbrowser that has builtin support for full
vimscript and vim's RE.

BTW, I use vim every day as my code editor -- so I'm not exactly knocking it -- only your ignorance in how superior it is.



 You're asking to break all of them because you _prefer_
something else?  
----
    Who is asking anyone to break anything?   I wrote the original note on this topic, and I made no mention nor wanted to have new RE's replace
the current ones.  I even made suggestions about integration in my previous email like:



Maybe having "\X" & "\P" for extended and pcre's would be a start,
though I'd _like_ to see a way of choosing different RE's for
use in macros & .vim files (for compat), and a 2nd option for
interactive RE's (thus eliminating the need for the "/[vmMVXP]"
on each search or substitute).



Wake up please. 

Learn how to read before you awaken others to your ignorant state.

-l

LCD 47

unread,
Apr 28, 2016, 4:09:04 AM4/28/16
to vim...@googlegroups.com
On 27 April 2016, Linda W <v...@tlinx.org> wrote:
[...]
> BRE's are compatible with ERE's. If you *only* use BRE syntax, then
> any prog using ERE's "should" still work the same for you.

Right, I suppose I should have stopped reading here. :) I didn't
(against my better), presumably because:

[...]
> And you are an aggressive nerdbutt!

Thank you for noticing. It might be wise to keep that in mind for
the purpose of further interactions. :)

On 27 April 2016, Linda W <v...@tlinx.org> wrote:
> LCD 47 wrote:
[...]
> > Actually, try something simpler:
> >
> > $ grep --version
> > grep (GNU grep) 2.24
> > Copyright (C) 2016 Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.
> >
> > Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.
> >
> > $ echo 'foo bar' | egrep -o '[[:<:]]bar'
> > grep: Invalid character class name
> >
> What is [[:<:]]? What standard is it a part of? Never seen it before.
> Um:
> perl -we 'use strict; use P;
> > my $a="now < there";
> > $a =~ /[[:<:]] there/;
> > '
> POSIX class [:<:] unknown in regex; marked by <-- HERE in m/[[:<:] <--
> HERE ] there/ at -e line 3.
> ----
> Oh, You made up your own syntax!

I didn't, honestly. I didn't have to either, BSD came up with it
first:

http://man.openbsd.org/re_format.7
https://www.freebsd.org/cgi/man.cgi?query=re_format&sektion=7
http://netbsd.gw.com/cgi-bin/man-cgi?re_format+7+NetBSD-current

> I see. It's not part of any standard.

I could have sworn it was part of POSIX.1, and OpenBSD's
re_format(7) also seemed to imply that it is:

http://man.openbsd.org/re_format.7

Other people seem to believe the same:

http://www.regular-expressions.info/refwordboundaries.html

But I couldn't confirm that with the standard:

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

-- and NetBSD's re_format(7) actually notes that [:<:] and [:>:] are
extensions:

http://netbsd.gw.com/cgi-bin/man-cgi?re_format+7+NetBSD-current

So you're right, it isn't part of POSIX. My bad, I should have
digged deeper, sorry about that. Regardless, It might still make a
worthy crussade for you BRE slayers to fight: unification of grep(1)
syntax across the various UNIX platforms. Then turn to sed(1) next.
They are in even wider used than Vim. :)

[...]
> > As for Vim: its regexes have features not present in any other
> > language.
> Such as?
[...]

An example that comes to mind is \ze.

The other end of this ship seems to have sailed some two weeks ago,
so I won't waste your time with re-hashing old arguments.

/lcd

BPJ

unread,
Apr 28, 2016, 12:56:00 PM4/28/16
to vim_use
I wrote:
> The main drawback with using :perldo, which I do when I need to match
> Unicode properties or charge combining marks separately from their base
> characters is that it has no equivalent of Vim RE's /c modifier. I have
> actually considered trying to write a function emulating the functionality
> using Vim's perl API, if only I can get my head around the latter.

I have written a function which does a substitution over a range
with :perl s/// while emulatuing vim RE's /c flag as nearly as I
could manage:

<https://gist.github.com/bpj/e9ba4914dd269b30c620bf7cb030b292>

Feel free to try it out!

/bpj

L. A. Walsh

unread,
May 4, 2016, 4:34:25 PM5/4/16
to vim...@googlegroups.com, vim...@googlegroups.com
Ben Fritz wrote:
> I wonder if a different approach might help.
> Vim already has :perldo, :pydo, etc. Perhaps a :perlmatch, :pymatch, etc. could be added for basic searching in those languages?
>
> There is also a patch in the todo list for :bvimgrep. Maybe a :bgrep command could also be added. I think that would allow searching the current buffer using whatever tool you like.
>
----
the perldo/pydo approaches seem (from what earlier folks said),
to be too limited and perhaps be too slow to be that useful. Those
commands don't work on the actual text, but on an internal form w/o
the EOL that doesn't allow for splitting or combining lines.

Another ugly Q comes up -- what is the internal form of the chars?
I.e. UTF-8?. If so, that's at least good from a perl-compat point of
view, but if not that could be another whole mess. Even in perl --
depends on what I'm doing, but I'll often localize the 'end-of-line' char
to 'null', and allow a single 'read' to read in an entire file into a
single scalar. Sometimes it's faster to do manipulations on the whole
buffer than try to use multiple manipulations on each line.

Ex: a mail message which has an empty line at the end of the header.
It's easier for mail processing to combine header lines that have been
split over multiple lines into 1 line. Trying to special case the
indents each time you pass over the headers for something really slows
down the traversal. But its usually necessary (at least for sanity's
sake) to make more than one pass over the header lines for sorting and
routing, but at the end of the header processing, the rest of the text
can be dumped out w/1 write statement -- a big win over writing things
out a line at a time, especially when writing over a network.

I still feel 'pain' in older mailers that use a 4K r/W size on network
I/O that is positively painful with modern networks that use Jumbo
packets of 9k or more. You are literally throwing away 55% of the
bandwidth on 1G or faster networks, and that's just the loss from the
low end. By the time you've progressed up the network stack, a multi-meg
email with a few pics attached and I'll see 30-60s send-times --- most
of it between sendmail and local client over a dedicated 10gb connection,
whereas using SMB, I've seen (*past tense*, w/all the new win10 updates
being forced upon Win7 users) file transfer speeds range between
400-600MB/s (w/a mostly cpu-bound limit). As MS has changed their
network stack to allow multi-cores to use more than one connection to a
server, that, I'm told, will only benefit win8 and above, they've really
done harm
to win7, which has the multi-streaming code put in its stack, but
is administratively prohibited from using it.

I'm not sure what would be faster -- trying to split file mods by line, or
handing back a huge array of pointers to text blocks (where each text
block would have to be marked as line-delineated, or 'raw'). That
sounds more like something where a tunable heuristic would be the most
flexible route -- not something likely to be seen in the near future, I'm
guessing.

Sigh.

Reply all
Reply to author
Forward
0 new messages