Hyphens never adjacent to char type in char class?

18 views
Skip to first unread message

David Wahlstedt

unread,
Aug 1, 2024, 1:24:57 PM8/1/24
to PCRE2 discussion list
Hello,

It seems as if character types are *never* allowed adjacent to character types, even in cases where it could make sense:

echo -n '[\W--a]'|test_pcre2 a
PCRE2 version 10.45-DEV 2024-06-09 (8-bit)
/[\W--a]/debug
Failed: error 150 at offset 3: invalid range in character class
a

(I have a little test script `test_pcre2` that uses pcre2test given a parttern and a subject)

Here, the \W is not part of the range: the renge strats with '-', so the error message is not entirely correct. If I instead put `[--a\W]`, it works fine, of course.

The reason I ask is that I am working on a parser for PCRE2, and want to know exectly how hyphens and character types may or may not be adjacent to each other in character classes. I believe it is never allowed, right? I can always let the parser be forgiving, and have a post-check that rejects certain expressions.

Best regards,
David


David Wahlstedt

unread,
Aug 1, 2024, 1:31:45 PM8/1/24
to PCRE2 discussion list
And Posix named sets as well as character types, i forgot to say ... Correct?

David Wahlstedt

unread,
Aug 1, 2024, 1:42:30 PM8/1/24
to PCRE2 discussion list
Sorry for the noise, I was wrong. The expression `[ --\W]` is valid, and has a hyphen adjacent to a character type.
My previous example was wrong since in [\W--a], \W-- is probably seens as an attempt to form a range, which is wrong. Alternativley, one could decide that hyphens can never form ranges with character classes, and therefore the range is --a, preceded by \W, and that's how I thought about it.

Philip Hazel

unread,
Aug 2, 2024, 9:27:29 AM8/2/24
to David Wahlstedt, PCRE2 discussion list

I was wrong. The expression `[ --\W]` is valid, and has a hyphen adjacent to a character type.

Indeed; a quick run of pcre2test shows how PCRE2 interprets it:

PCRE2 version 10.44 2024-06-07 (8-bit)
`[ --\W]`
------------------------------------------------------------------
  0  36 Bra
  3     [\x00-/:-@[-^`{-\xff] (neg)
 36  36 Ket
 39     End
------------------------------------------------------------------

\W-- is probably seens as an attempt to form a range, which is wrong

Yes. It seemed to me that that was likely to be the case, which it why I caused PCRE2 to fail it. Interestingly, Perl is more relaxed and treats a minus following \W as a literal, as you assumed it was. Guess I should take another look at the Perl documentation in this area.

Regards,
Philip

Reply all
Reply to author
Forward
0 new messages