Alternative regexp patterns for perl \K.

26 views
Skip to first unread message

hongy...@gmail.com

unread,
Mar 14, 2022, 10:23:15 PMMar 14
to
I want to find some alternative regexp patterns of my following operation:

$ echo 'public , save :: NKPTS ! - max. no. of kpoints' | grep -Po '::[ ]*\K[^ ]+'
NKPTS

Any hints?

Regards,
HZ

Ed Morton

unread,
Mar 15, 2022, 12:40:06 PMMar 15
to
google. dear god learn to use google.

hongy...@gmail.com

unread,
Mar 17, 2022, 9:27:07 PMMar 17
to
I tried googling it, and it seems that this feature of Perl is a very peculiar implementation that hasn't direct counterpart in other languages.

HZ

Janis Papanagnou

unread,
Mar 17, 2022, 9:55:16 PMMar 17
to
Try: man pcrepattern

If you want help you should either explain what you actually want to
match, or tell us what \K is supposed to do in above grep context,
especially in case of non-standard patterns and not widely supported
extensions.

Janis

>
> HZ
>

Janis Papanagnou

unread,
Mar 17, 2022, 10:29:11 PMMar 17
to
Avoid non-standard options (like grep's -P and -o), use standard tools

sed 's/.*::[ ]*\([^ ]\+\).*/\1/'


Janis

>
> Regards,
> HZ
>

hongy...@gmail.com

unread,
Mar 17, 2022, 10:39:38 PMMar 17
to
Thank you. The following version is clearer:

$ echo 'public , save :: NKPTS ! - max. no. of kpoints' | sed -E 's/.*::[ ]*([^ ]+).*/\1/'
NKPTS

HZ

hongy...@gmail.com

unread,
Mar 17, 2022, 10:43:45 PMMar 17
to
On Friday, March 18, 2022 at 9:55:16 AM UTC+8, Janis Papanagnou wrote:
> On 18.03.2022 02:27, hongy...@gmail.com wrote:
> > On Wednesday, March 16, 2022 at 12:40:06 AM UTC+8, Ed Morton wrote:
> >> On 3/14/2022 9:23 PM, hongy...@gmail.com wrote:
> >>> I want to find some alternative regexp patterns of my following operation:
> >>>
> >>> $ echo 'public , save :: NKPTS ! - max. no. of kpoints' | grep -Po '::[ ]*\K[^ ]+'
> >>> NKPTS
> >>>
> >>> Any hints?
> >>>
> >>> Regards,
> >>> HZ
> >> google. dear god learn to use google.
> >
> > I tried googling it, and it seems that this feature of Perl is a very
> > peculiar implementation that hasn't direct counterpart in other
> > languages.
> Try: man pcrepattern

Yes. It does include the following related description:

$ man pcrepattern |grep -A14 -B2 'The escape sequence \\K'
Resetting the match start

The escape sequence \K causes any previously matched characters not to be included in the final matched sequence. For example, the pattern:

foo\Kbar

matches "foobar", but reports that it has matched "bar". This feature is similar to a lookbehind assertion (described below). However, in this case, the
part of the subject before the real match does not have to be of fixed length, as lookbehind assertions do. The use of \K does not interfere with the set‐
ting of captured substrings. For example, when the pattern

(foo)\Kbar

matches "foobar", the first substring is still set to "foo".

Perl documents that the use of \K within assertions is "not well defined". In PCRE, \K is acted upon when it occurs inside positive assertions, but is ig‐
nored in negative assertions. Note that when a pattern such as (?=ab\K) matches, the reported start of the match can be greater than the end of the match.

Best,
HZ

Janis Papanagnou

unread,
Mar 19, 2022, 9:44:23 PMMar 19
to
On 18.03.2022 03:39, hongy...@gmail.com wrote:
> On Friday, March 18, 2022 at 10:29:11 AM UTC+8, Janis Papanagnou wrote:
>> On 15.03.2022 03:23, hongy...@gmail.com wrote:
>>> I want to find some alternative regexp patterns of my following operation:
>>>
>>> $ echo 'public , save :: NKPTS ! - max. no. of kpoints' | grep -Po '::[ ]*\K[^ ]+'
>>> NKPTS
>>>
>>> Any hints?
>> Avoid non-standard options (like grep's -P and -o), use standard tools
>>
>> sed 's/.*::[ ]*\([^ ]\+\).*/\1/'
>
> Thank you. The following version is clearer:

Maybe clearer but obviously non-standard.

>
> $ echo 'public , save :: NKPTS ! - max. no. of kpoints' | sed -E 's/.*::[ ]*([^ ]+).*/\1/'
> NKPTS

What is option -E doing?
It's neither defined by POSIX nor available in my version of sed.
(I suppose it makes the regexp meta-character escapes unnecessary,
and you instead would have to escape the meta-characters that are
used literally?)

Janis

>
> HZ
>

hongy...@gmail.com

unread,
Mar 19, 2022, 11:02:40 PMMar 19
to
$ sed --help | grep -A2 -- '^[ ]*-E'
-E, -r, --regexp-extended
use extended regular expressions in the script
(for portability use POSIX -E).

$ sed --version
sed (GNU sed) 4.7

HZ

Keith Thompson

unread,
Mar 20, 2022, 5:14:01 PMMar 20
to
That appears to be an error in GNU sed. Here's the relevant excerpt
from sed's "info" documentation:

'-E'
'-r'
'--regexp-extended'
Use extended regular expressions rather than basic regular
expressions. Extended regexps are those that 'egrep' accepts; they
can be clearer because they usually have fewer backslashes.
Historically this was a GNU extension, but the '-E' extension has
since been added to the POSIX standard
(http://austingroupbugs.net/view.php?id=528), so use '-E' for
portability. GNU sed has accepted '-E' as an undocumented option
for years, and *BSD seds have accepted '-E' for years as well, but
scripts that use '-E' might not port to other older systems. *Note
Extended regular expressions: ERE syntax.

The austingroupbugs.net web page is an enhancement request, not an
actual update to POSIX. POSIX itself:

https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html

does not mention the "-E" option.

I'll submit a bug report for GNU sed.

The enhancement request was submitted in 2011. The resolution is
"Accepted As Marked" and the status is "Applied", so I'm not entirely
sure what's going on. But in any case, The Open Group Base
Specifications Issue 7, 2018 edition doesn't mention "-E".

(Janis, what version of sed are you using?)

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

Janis Papanagnou

unread,
Mar 21, 2022, 12:38:50 AMMar 21
to
On 20.03.2022 22:13, Keith Thompson wrote:
>
> (Janis, what version of sed are you using?)

I'm working on a "legacy" (sort of) system...

$ sed --version
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.

My statement "nor available in my version of sed"
was meant as "not documented in the sed man page"
and also not displayed as option when calling sed
without arguments.

Given your quote that "GNU sed has accepted '-E'
as an undocumented option for years" I confirmed
its (undocumented) existence also on my system.

Janis

Geoff Clare

unread,
Mar 21, 2022, 10:11:09 AMMar 21
to
Keith Thompson wrote:

> The enhancement request was submitted in 2011. The resolution is
> "Accepted As Marked" and the status is "Applied", so I'm not entirely
> sure what's going on. But in any case, The Open Group Base
> Specifications Issue 7, 2018 edition doesn't mention "-E".

"Applied" means the edits have been made in the (troff) source of SUS.
In this specific case the edit was applied long enough ago that it
was included in the latest draft (2.1) of the next revision (Issue 8)
that was made available to reviewers in August 2021.

--
Geoff Clare <net...@gclare.org.uk>

Keith Thompson

unread,
Mar 21, 2022, 5:35:16 PMMar 21
to
Reply all
Reply to author
Forward
0 new messages