REGEX Question

37 views
Skip to first unread message

md

unread,
Jul 30, 2014, 3:03:12 PM7/30/14
to python_in...@googlegroups.com
Hey guys,

I am fairly new to regular expressions.

When I want to get the delimiter between a filename and framenumber I am using the following regex.

f = myawesome_filename_v1_0001.exr"

                    pat = re.compile('[\.|\_]\d+[\.]\w{3}')
                    x = re.search(pat, f)
                    print x.group()
                    delimiter = x.group()[0]
                    print 'delimiter : ' + delimiter

This works specifically for the case where you might have filename.####.ext or filename_####.ext. 

Question is ... is this a decent/efficient way to do this ? 








Paul Molodowitch

unread,
Jul 30, 2014, 3:56:12 PM7/30/14
to python_inside_maya
Well... string parsing is always going to be be a bit of a crapshoot.  Especially if you're trying to come up with something like parsing out the framenumbers, for a very general case, there will always be things that break it.  In general, it's best if you can enforce some sort of uniformity on your filenames, so you can have a certain amount of assurance that it's going to work as intended.

However - to your specific regex - if you're looking specifically to catch cases that look like:

 filename.####.ext
filename_####.ext.

...you're on the right track, but there's a few things I would fix.

First, some general advice - ALWAYS get in the habit of prefixing your regex pattern strings with "r", like:   r'myPattern'
The 'r' marks it as a raw string, meaning backslashes aren't treated as escape sequences.  Since python keeps the backslash for sequences it doesn't recognize, MOST of the time it won't make a difference, but it's best to just get in the habit now.  I also like to always triple quote my regex patterns, for cases when I need to search for a quote mark...

Second - you probably want to make sure that your expression only matches at the END of the string - easiest way to do that is with a "$" character at the end.  otherwise, if your file is:

f = "myawesome_filename_ver_1_frame.0001.exr"

...you'll get the wrong result.  Also, I wouldn't limit your file extensions to only 3 characters - ".jpeg" and ".tiff" pop up pretty frequently. So at least allow 4 characters (and possibly just make it 1+).  Finally, you'll probably want to start using groups - at minimum, around the framenumbers.  So my final take would look something like:

pat = re.compile(r'''[.|_](\d+)\.(\w{3,4})$''')
print pat.search(f).groups()

If you want to get fancy, python regular expressions allow you to name your groups, which I like.  The main downside is that this is a python only extension, so if you try to copy your regex somewhere else, it won't work:

pat = re.compile(r'''[.|_](?P<frame>\d+)\.(?P<ext>\w{3,4})$''')
print pat.search(f).groupdict()


- Paul

PS - Is that Kurt Russel from "The Thing", or do you just happen to look just like him?  Either way, it's awesome...

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/91317257-669e-49ae-8394-b26f4ea1b441%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Justin Israel

unread,
Jul 30, 2014, 4:19:40 PM7/30/14
to python_in...@googlegroups.com
As Paul mentioned, your regex can either cover a limited amount of cases, or as general of cases as possible. Yours was pretty limited to specific delimiters and 3-character extensions. Bot yours and Paul's corrections also don't account for negative frame numbers, if you have sequences that maybe are simulation ramp-ups. You could do something like this to go one step farther:

rx = re.compile(r'[._](-?\d+)\.([0-9a-z]+)$', re.I)
print rx.search("foo_bar_-1234.JPG").groups()

While the delimiter is still limited to two types of characters, it now only allows numbers and letters for the extension (of any length) and it still insensitive. 
 


Justin Israel

unread,
Jul 30, 2014, 4:58:50 PM7/30/14
to python_in...@googlegroups.com
Also, did you happen to take a glance at some of the examples being used in that fileseq lib I had previously recommended? In that lib, he splits off the extension first and just deals with parsing the end of the basename. It has some regex examples in there for dealing with both concrete file paths and ranges. 
Message has been deleted

md

unread,
Jul 30, 2014, 5:08:40 PM7/30/14
to python_in...@googlegroups.com
This is really great info ... much appreciated.

Paul ... that is indeed KR from The Thing  !  I used to have hair like that .. a decade ago ! hahaha.

Risto Jankkila

unread,
Aug 1, 2014, 4:18:01 PM8/1/14
to python_in...@googlegroups.com
There's been a similar thread in the past.

Anyway.. fileseq lib is great, but if you want to write the code yourself I recommend the link below:

Cheers,
Risto



On Thu, Jul 31, 2014 at 12:08 AM, md <acco...@mdonovan.com> wrote:
This is really great info ... much appreciated.

Paul ... that is indeed KR from The Thing  !  I used to have hair like that .. a decade ago ! hahaha.

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
-------------------------------------------
Risto Jankkila
mob. +44 (0)77 6741 9890 (UK)
mob. +358 (0)40 5422 625 (FI)
ristoj...@gmail.com
-------------------------------------------

Reply all
Reply to author
Forward
0 new messages