can't get this regex to work

2 views
Skip to first unread message

inhahe

unread,
Oct 10, 2009, 9:48:19 PM10/10/09
to Regex
Hi, i'm working with python and trying to interpret mirc color/
decoration codes

i can do it this way:

mircre = re.compile ('(?:\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
'(?:'
',(?:10|11|12|13|14|15|0\\d|\\d)'
')?)|\x02|\xA5|\xA2')

but it's kind of awkward because as i iterate over the matches i don't
have the text ahead of the match in order to format it so i have to
format the text behind the match according to the previous match, and
that means the last text won't be processed unless there's a match at
the end.

so i tried doing this instead, in order to match (beginning of line or
mirc code)(text), so that i'll always get pairs of (mirc_code,
following_text), or ('', following_text) if it's the first text on the
line and it doesn't begin with an mirc code:

mircre = re.compile ('('
'(?:'
'(?:\x0b(?:10|11|12|13|14|15|0\\d|\\d)'
'(?:'
',(?:10|11|12|13|14|15|0\\d|\\d)'
')?'
')'
'|'
'\x02'
'|'
'\xA5'
'|'
'\xA2'
')'
'|'
'^'
')'
'(.*?)')


the only problem is it doesn't work. the (.*?) always yields ''. can
anybody tell me why? thanks..

inhahe

unread,
Oct 12, 2009, 11:28:06 AM10/12/09
to Regex
ok, so i kind of solved it. i found out that the reason it won't work
is that the non-greedy match won't match anything if nothing comes
after it.

my new regex (with some other things modified too) is this:

mircre = re.compile("""
(
(?:
\x03\\d{1,2}
(?:,\\d{1,2})?
)
|\x02|\x1F|\x16|^
)
([^\x03\x02\x1F\x16]*)
""", re.VERBOSE)

The only problem is that now, if we received a string with a \x03 in
it without a code, we won't capture it in either group. so i'm not
sure how else i could do it. maybe i could do a look-ahead with the
whole regex repeated, but that's kind of lame.
really i just want a way to find all occurrences of a regex that ends
with a non-greedy term and have the term consume anything up until the
next occurrence.

is that possible in python?

inhahe

unread,
Oct 12, 2009, 12:16:03 PM10/12/09
to Regex
ok i found a general solution to the problem

>>> re.findall("(a.*?)(?=\\1|$)", "abcabcabc")
['abc', 'abc', 'abc']

seems a little hackish, but it's better than repeating the whole regex
in a look-ahead

if anyone know of a better way, though, that'd be good

inhahe

unread,
Oct 13, 2009, 12:15:32 AM10/13/09
to Regex
I noticed a problem, though.  

mircre = re.compile(""" 
                      ( 
                        (?:
                          \x03(?:1[0-5]|0?\\d)
                          (?:,(?:1[0-5]|0?\\d))?
                        )
                        |\x02|\x1F|\x16|^)
                      (.*?)(?=\\1|$)
                    """, re.VERBOSE)  

won't return ('', sometext) for sometext that occurs at the beginning of the string.  instead for the first match it returns ('', '') and sometext is lost.

i had to resort to fixing it by doing:

mircre = re.compile(""" 
                      ( 
                        (?:
                          \x03(?:1[0-5]|0?\\d)
                          (?:,(?:1[0-5]|0?\\d))?
                        )
                        |\x02|\x1F|\x16|^)
                      (.*?)
                      (?=(?:\x03(?:1[0-5]|0?\\d)(?:,(?:1[0-5]|0?\\d))?)|\x02|\x1F|\x16|$)
                    """, re.VERBOSE)  

basically having to repeat the whole string except for the ^
i'd like to think there's a way around that..?


Reply all
Reply to author
Forward
0 new messages