Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Question about regex with negated character class

14 views
Skip to first unread message

Roger L Costello

unread,
Apr 25, 2022, 12:33:10 PM4/25/22
to
Hi Folks,

On page 12 of the Flex specification it says this:

"A negated character class such as [^A-Z] will match a newline
unless \n (or an equivalent escape sequence) is one of the characters
explicitly present
in the negated character class (e.g., [^A-Z\n]). This is unlike how many other
regular
expression tools treat negated character classes ..."

Is that last sentence true? Does Flex behaves differently from other regex
engines, with regard to negated character class?

I just tested the [^A-Z] regex at (https://regex101.com/) and every regex
engine on that web page matches a string containing a newline. In other words,
Flex behaves just like all the other regex engines. I conclude that that last
sentence in the Flex manual is not correct. Do you agree?

/Roger
[It may have been true 30 years ago but they all match \n in a pattern
now. On the other hand, grep won't match a newline because it does the
matching one line at a time. -John]

Kaz Kylheku

unread,
Apr 25, 2022, 10:40:35 PM4/25/22
to
On 2022-04-25, Roger L Costello <cost...@mitre.org> wrote:
> Hi Folks,
>
> On page 12 of the Flex specification it says this:
>
> "A negated character class such as [^A-Z] will match a newline
> unless \n (or an equivalent escape sequence) is one of the characters
> explicitly present
> in the negated character class (e.g., [^A-Z\n]). This is unlike how many other
> regular expression tools treat negated character classes ..."

I suspect this is a documentation mistake (in terms of the the remark it
makes about other regex implementations).

There is something special in Flex with regard to newlines: namely the
any-character regular expression . (dot) does not match any character:
it excludes the newline. The documenter might have momentarily gotten
their wires crossed, misremembering what is the special behavior.

Or else, I also agree with John that it may in fact be a remark about
regex implementations in line-oriented text processing utilities, which
(in their standrad forms, e.g. POSIX) don't have multi-line matching
features in which \n appears as a character.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
0 new messages