Problem with regular expression

Joan Miller

unread,

Mar 7, 2010, 5:32:28 AM3/7/10

to

I would to convert the first string to upper case. But this regular
expression is not matching the first string between quotes.

re.sub("'(?P<id>\w+)': [^{]", "\g<id>FOO", str)

# string to non-matching
'foo': {

# strings to matching
'bar': 'bar2'
'bar': None
'bar': 0
'bar': True

So, i.e., from the first string I would to get:
'BAR': 'bar2'

Any help? please
Thanks in advance

News123

unread,

Mar 7, 2010, 7:52:13 AM3/7/10

to

Hi Joan,

I'm a little slow today and don't exactly understand your question.

Could you perhaps add some examples of input lines and what you would
like to extract?

example:
input = "first word to Upper"
output = "FIRST word to Upper"

bye

N

Steve Holden

unread,

Mar 7, 2010, 8:03:08 AM3/7/10

to pytho...@python.org

Well your pattern is identifying the right bits, but re.sub() replaces
everything it matches:

>>> import re
>>> strings = """\
... 'bar': 'bar2'
... 'bar': None
... 'bar': 0
... 'bar': True""".split("\n")
>>> for s in strings:
... print re.sub("'(?P<id>\w+)': [^{]", "\g<id>FOO", s)
...
barFOObar2'
barFOOone
barFOO
barFOOrue
>>>

What you need to fix is the replacement. Take a look at the
documentation for re.sub: you will need to provide a function to apply
the upper-case transformation, and the example there should show you how.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010 http://us.pycon.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS: http://holdenweb.eventbrite.com/

Tim Chase

unread,

Mar 7, 2010, 8:35:31 AM3/7/10

to Joan Miller, pytho...@python.org

Joan Miller wrote:
> I would to convert the first string to upper case. But this regular
> expression is not matching the first string between quotes.
>
> re.sub("'(?P<id>\w+)': [^{]", "\g<id>FOO", str)

Well, my first thought is that you're not using raw strings, so
you're not using the regexps and replacements you think you are.

r"'(?P<id>\w+)': [^{]"

will match the lines of interest. The replacement will eat the
opening & closing single-quote, colon, and first character.

> # string to non-matching
> 'foo': {
>
> # strings to matching
> 'bar': 'bar2'
> 'bar': None
> 'bar': 0
> 'bar': True
>
> So, i.e., from the first string I would to get:
> 'BAR': 'bar2'

I think you'd have to use a function/lambda to do the
case-conversion:

re.sub(
r"'(?P<id>\w+)(?=': [^{])",
lambda m: "'" + m.group('id').upper(),
string_of_interest
)

Or you could just forgo regexps and use regular string functions
like split(), startswith(), and upper()

-tkc

Paul McGuire

unread,

Mar 7, 2010, 9:10:55 AM3/7/10

to

On Mar 7, 4:32 am, Joan Miller <pelok...@gmail.com> wrote:
> I would to convert the first string to upper case. But this regular
> expression is not matching the first string between quotes.
>

Is using pyparsing overkill? Probably. But your time is valuable,
and pyparsing let's you knock this out in less time than it probably
took to write your original post.

Use pyparsing's pre-defined expression sglQuotedString to match your
entry key in quotes:

key = sglQuotedString

Add a parse action to convert to uppercase:

key.setParseAction(lambda tokens:tokens[0].upper())

Now define the rest of your entry value (be sure to add the negative
lookahead so we *don't* match your foo entry):

entry = key + ":" + ~Literal("{")

If I put your original test cases into a single string named 'data', I
can now use transformString to convert all of your keys that don't
point to '{'ed values:

print entry.transformString(data)

Giving me:

# string to non-matching
'foo': {

# strings to matching
'BAR': 'bar2'
'BAR': None
'BAR': 0
'BAR': True

Here's the whole script:

from pyparsing import sglQuotedString, Literal

key = sglQuotedString
key.setParseAction(lambda tokens:tokens[0].upper())
entry = key + ":" + ~Literal("{")

print entry.transformString(data)

And I'll bet that if you come back to this code in 3 months, you'll
still be able to figure out what it does!

-- Paul