Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

gawk regexp question

25 views
Skip to first unread message

Kenny McCormack

unread,
Dec 1, 2022, 4:04:20 AM12/1/22
to
I have a regexp like:

/^.*[?/]word[=/]/

and it seems to work as expected. Notice that neither of the weird/special
characters (? or /) are escaped (I.e., preceded with \) inside of [].

Am I correct in assuming this is OK? Is there a list anywhere of what is
and isn't "special" (i.e., needing to be escaped) inside of []?

--
When I was growing up we called them "retards", but that's not PC anymore.
Now, we just call them "Trump Voters".

The question is, of course, how much longer it will be until that term is also un-PC.

Manuel Collado

unread,
Dec 1, 2022, 5:30:51 AM12/1/22
to
El 01/12/2022 a las 10:04, Kenny McCormack escribió:
> I have a regexp like:
>
> /^.*[?/]word[=/]/
>
> and it seems to work as expected. Notice that neither of the weird/special
> characters (? or /) are escaped (I.e., preceded with \) inside of [].
>
> Am I correct in assuming this is OK? Is there a list anywhere of what is
> and isn't "special" (i.e., needing to be escaped) inside of []?
>

The gawk manual says:

"To include one of the characters ‘\’, ‘]’, ‘-’, or ‘^’ in a bracket
expression, put a ‘\’ in front of it. For example:
[d\]]
matches either ‘d’ or ‘]’. Additionally, if you place ‘]’ right after
the opening ‘[’, the closing bracket is treated as one of the characters
to be matched."

Don't know if this also applies to other awk variants.

--
Manuel Collado - http://mcollado.z15.es

Kenny McCormack

unread,
Dec 1, 2022, 7:40:44 AM12/1/22
to
In article <651e12f0-d7c7-fc22...@users.sourceforge.net>,
Manuel Collado <m-co...@users.sourceforge.net> wrote:
...
>The gawk manual says:
>
>"To include one of the characters \, ], -, or ^ in a bracket
>expression, put a \ in front of it. For example:
> [d\]]
>matches either d or ]. Additionally, if you place ] right after
>the opening [, the closing bracket is treated as one of the characters
>to be matched."

OK, so it is just those 4 (\]-^).

I think "-" is also OK (i.e., doesn't need to be escaped) if it is the first
character inside of [].

>Don't know if this also applies to other awk variants.

Nobody cares anymore about "other awk variants".
(This is a Good Thing...)

--
People who say they'll vote for someone else because Obama couldn't fix
*all* of Bush's messes are like people complaining that he couldn't cure
cancer, so they'll go and vote for (more) cancer.

Janis Papanagnou

unread,
Dec 3, 2022, 7:46:13 PM12/3/22
to
On 01.12.2022 11:30, Manuel Collado wrote:
>
> The gawk manual says:
>
> "To include one of the characters ‘\’, ‘]’, ‘-’, or ‘^’ in a bracket
> expression, put a ‘\’ in front of it. For example:
> [d\]]
> matches either ‘d’ or ‘]’. Additionally, if you place ‘]’ right after
> the opening ‘[’, the closing bracket is treated as one of the characters
> to be matched."
>
> Don't know if this also applies to other awk variants.

The old Awk "Bible" says:
"Inside a character class, all characters have their literal meaning,
except for the quoting character \ , ^ at the beginning, and - between
two characters."

And for meta-characters generally it says that single meta-characters
match themselves, and otherwise need to be \-escaped to preserve their
literal meaning.

I suppose that's what we could expect from other including older awks.
(Test cases might be []], [[], vs. [\]], [\[].)

For more recent tools POSIX defines BRE bracket expressions for POSIX
awk, also mentioning the brackets. (WRT the bracket symbols it gets a
bit more complicated, though, with the collating syntaxes.)

Janis

0 new messages