Name syntax for backtracking control?

19 views
Skip to first unread message

David Wahlstedt

unread,
Aug 4, 2024, 6:14:37 AM8/4/24
to PCRE2 discussion list
Hi,
I wonder what characters are allowed in the names provided in (*VERB:NAME)
in backtracking conrtrol in PCRE2?
It seems as if it is everything except '/' and ')', is that correct?
For instance,
(*MARK: 5 46(5|46&T😀)
Works fine. (name starts with space and \t).

Are the leading spaces included in the name?

Bets regards,
David

Philip Hazel

unread,
Aug 4, 2024, 7:10:25 AM8/4/24
to David Wahlstedt, PCRE2 discussion list
I quote from the pcre2pattern man page:

QUOTE
By default, for compatibility with Perl, a name is any sequence of characters
that does not include a closing parenthesis. The name is not processed in
any way, and it is not possible to include a closing parenthesis in the name.  
This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result
is no longer Perl-compatible.                                              
                                                                       
When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names
and only an unescaped closing parenthesis terminates the name. However, the
only backslash items that are permitted are \Q, \E, and sequences such as
\x{100} that define character code points. Character type escapes such as \d
are faulted.    

A closing parenthesis can be included in a name either as \) or between \Q
and \E. In addition to backslash processing, if the PCRE2_EXTENDED or    
PCRE2_EXTENDED_MORE option is also set, unescaped whitespace in verb names is
skipped, and #-comments are recognized, exactly as in the rest of the pattern.
PCRE2_EXTENDED and PCRE2_EXTENDED_MORE do not affect verb names unless    
PCRE2_ALT_VERBNAMES is also set.                                              
                                                                          
The maximum length of a name is 255 in the 8-bit library and 65535 in the    
16-bit and 32-bit libraries. If the name is empty, that is, if the closing
parenthesis immediately follows the colon, the effect is as if the colon were
not there. Any number of these verbs may occur in a pattern. Except for
(*ACCEPT), they may not be quantified.
ENDQUOTE

Leading spaces are significant in PCRE2; I think they are in Perl as well but I haven't checked.

Regards,
Philip


--
You received this message because you are subscribed to the Google Groups "PCRE2 discussion list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pcre2-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pcre2-dev/fcce8c96-a7ba-476a-b601-ba3b15369389n%40googlegroups.com.

David Wahlstedt

unread,
Aug 4, 2024, 9:04:10 AM8/4/24
to PCRE2 discussion list
Thank you!
Sorry, I should have read that more carefully. However, '/' seems to not be allowed, at least in pcre2test:

PCRE2 version 10.45-DEV 2024-06-09 (8-bit)
/(*MARK:a/y_)/debug
** Unrecognized modifier 'y' in 'y_)/debug'

Best regards,
David

Philip Hazel

unread,
Aug 4, 2024, 11:18:55 AM8/4/24
to David Wahlstedt, PCRE2 discussion list
No, in your example, / is being taken as the delimiter of the pattern.

$ pcre2test zz
PCRE2 version 10.44 2024-06-07 (8-bit)
"(*MARK:a/y_)"debug
------------------------------------------------------------------
  0  10 Bra
  3     *MARK a/y_
 10  10 Ket
 13     End
------------------------------------------------------------------
Capture group count = 0
May match empty string
Subject length lower bound = 0


Regards,
Philip


David Wahlstedt

unread,
Aug 4, 2024, 1:08:38 PM8/4/24
to PCRE2 discussion list
But doesn't that mean '/' cannot be part of the name?

Philip Hazel

unread,
Aug 5, 2024, 3:21:07 AM8/5/24
to David Wahlstedt, PCRE2 discussion list
No, '/' is just like any other character in the name. In your example that failed, you used / as the delimiter of the pattern so your pattern was prematurely terminated and  y_)/debug  was being interpreted as a modifier list. My example uses " as the pattern delimiter so then / is part of the pattern and is treated like any other character in the MARK name. 
Regards,
Philip


Reply all
Reply to author
Forward
0 new messages