Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[perl #37371] [TODO] word boundary cclass

0 views
Skip to first unread message

Leopold Toetsch

unread,
Oct 7, 2005, 5:11:28 AM10/7/05
to bugs-bi...@rt.perl.org
# New Ticket Created by Leopold Toetsch
# Please include the string: [perl #37371]
# in the subject line of all future correspondence about this issue.
# <URL: https://rt.perl.org/rt3/Ticket/Display.html?id=37371 >


During opcode cleanup the find_word_boundary opcode ceased to exist
(there was no is_word_boundary).

We probably want to have this as a builtin "character class".

1) create a new enum entry in include/parrot/cclass.h
enum_cclass_word_boundary

2) implement it inside charset/*.c: is_cclass / find_cclass /
find_not_cclass, by calling these functions with enum_cclass_word as
char class.

See also perldoc perlre /\\b

leo

Patrick R. Michaud via RT

unread,
Nov 16, 2005, 9:41:56 AM11/16/05
to perl6-i...@perl.org
> [leo - Fri Oct 07 02:11:28 2005]:

>
> During opcode cleanup the find_word_boundary opcode ceased to exist
> (there was no is_word_boundary).
>
> We probably want to have this as a builtin "character class".

I think we can just deprecate (or omit) find_word_boundary altogether as
an opcode. I don't know the full history of find_word_boundary, but I
speculate it was created in anticipation of being useful for regular
expression matching. However, PGE doesn't use find_word_boundary, nor
do I see anywhere that it'll likely be used.

Someone who needs find_word_boundary can generally get there via the
find_cclass and find_not_cclass ops. In particular:

.local int pos
# ...$S0 is string to search...
$I0 = length $S0
$I1 = find_cclass .CCLASS_WORD, $S0, pos, $I0
$I2 = find_not_cclass .CCLASS_WORD, $S0, pos, $I0

the greater of $I1 and $I2 gives the next word boundary after pos. Of
course, if we already know from previous information that the character
at pos is .CCLASS_WORD (or not .CCLASS_WORD), we can get the next word
boundary with just one call to find_not_cclass (or find_cclass).

So, I vote to just eliminate find_word_boundary altogether.

Pm

0 new messages