Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

regexp standards

3 views
Skip to first unread message

ben.c...@gmail.com

unread,
Jun 7, 2007, 2:08:09 AM6/7/07
to
Can anyone tell me if TCL 8.4 conforms to any of the various regex
standards? (POSIX, PCRE etc)
Is there anyway to force compatibility?
cheers,
Ben

Larry W. Virden

unread,
Jun 7, 2007, 7:50:13 AM6/7/07
to


Do you know of a test suite which would permit someone to test
compliance?

As far as I know, the original code used in Tcl's regular expression
code was not written with compliance to any of the "standards" (most
of which are not really standards) as a goal. And in fact, Tcl's
regular expressions were the first (or nearly so) to handle unicode ...

Michael Schlenker

unread,
Jun 7, 2007, 3:04:16 PM6/7/07
to
b...@spam.com schrieb:
IIRC someone wrote a wrapper for pcre once..., might just use that if
its important.

Tcl surely does not conform to PCRE. Posix i don't know, might be, at
least the re_syntax man page mentions posix once or twice.


Michael

Bryan Oakley

unread,
Jun 7, 2007, 3:10:18 PM6/7/07
to

Shouldn't that be "PCRE doesn't conform to Tcl REs"? :-)


--
Bryan Oakley
http://www.tclscripting.com

bill...@alum.mit.edu

unread,
Jun 7, 2007, 4:58:14 PM6/7/07
to
Tcl regexps can be pretty well characterized as POSIX with some but
not all PERL extensions, which doesn't fit any standard exactly. In
any case, most standards do not define what happens beyond the ASCII
range.

Allodoxaphobia

unread,
Jun 7, 2007, 6:10:22 PM6/7/07
to

Standards are GREAT! And, there's soooooooo many to choose from!!

Kevin Kenny

unread,
Jun 12, 2007, 9:59:22 AM6/12/07
to

As I understand it, Posix supports three sorts of regular expressions,
'simple' regular expressions (deprecated), 'basic' regular expressions
and 'extended' regular expressions. Basic regular expressions are what
you see in 'sed' and 'grep', while extended regular expressions are
what you get with 'grep -E' or 'egrep'.

The Tcl commands that accept regular expressions can be made to use
Posix-compliant ones by beginning the regular expression with embedded
options. A regular expression that begins with the four characters
(?b) is interpreted as a Posix-style basic regular expression;
a regular expression that begins with (?e) is a Posix-style extended
regular expression. Tcl's default is to use 'advanced' regular
expressions, which are similar to the ones used in Perl and Python.
They are *not* identical; Tcl's are superior for Unicode handling;
Perl's arguably handle "mixed greediness" better, but I don't
claim to understand Perl's ill-documented rules for that case.

PCRE is a specific regular expression library written by
Philip Hazel. It is of note that Perl itself does not use the
PCRE library, and in fact the two are not fully compatible;
thus Perl's own regular expressions are not "Perl-compatible!"

Virtually all of the additional functionality of Tcl's
advanced regular expression syntax is added in
such a way that it would be either a syntax error (for example,
'(?:...)' or a nonsensical request (for example, '...*?') in
conventional ERE's, so few users ever need to request POSIX-
compatible RE's explicitly, but '(?e)' is there if you need it.

--
73 de ke9tv/2, Kevin

0 new messages