grep command for random digits separated by periods, in parentheses

987 views
Skip to first unread message

MSimbron

unread,
Apr 1, 2011, 1:43:51 PM4/1/11
to TextWrangler Talk
First time reader, hopefully with a simple problem...

I have a document thats got hundreds of pages and lots of random #'s
(its a technical document). I've been tasked to extract and catalog
the images, which all have a # that looks like

(01.01.001) or
(07.02.004) or
(03.10.015)

When i do a search for
\d.\d.\d

it normally gives me mostly the results I need. This time though, I'm
getting lots of random measurements in my results, making the task
much more difficult. Per chance does anyone have an idea of how to do
a specific search for a left parenthesis followed by a random number,
a period, random number, period, random number, then closed
parenthesis?

In a nutshell, HELP

Thanks!

Max

Christopher Bort

unread,
Apr 1, 2011, 6:53:33 PM4/1/11
to textwr...@googlegroups.com

Two observations about your search pattern \d.\d.\d. First, it
doesn't have any quantifiers. Second, . matches any character,
not just a literal '.'. Your pattern matches a single numeric
digit followed by any single character, followed by a single
digit, followed by any single character, followed by a single
digit. For example, it would match the string '1#2M3', which is
clearly not like your intended target strings. To match patterns
as in your example, you'd use something more like:

\(\d{2}\.\d{2}\.\d{3}\)

Which works like so:

\( matches an open parenthesis
\d{2} matches exactly two numeric digits
\. matches a single literal .
\d{3} matches exactly three digits
\) matches a close parenthesis

It might also be written something like:

\((?:\d{2}\.){2}\d{3}\)

If the number of digits in each part of the pattern might vary,
you would want to use the + quantifier instead of the more
specific {2} and {3}:

\(\d+\.\d+\.\d+\)

>In a nutshell, HELP

In a nutshell, the Searching with Grep chapter in the TW manual
is an excellent primer on regular expressions.
--
Christopher Bort
<top...@thehundredacre.net>

Roy McCoy

unread,
Apr 1, 2011, 6:55:56 PM4/1/11
to textwr...@googlegroups.com
MSimbron wrote:

\d finds only one digit and an unescaped period finds any character.
Your task looks very easy, you just have to escape the points and the
parentheses. Is

\(\d+\.\d+\.\d+\)

what you want?


Roy McCoy
UEA, Rotterdam NL

Jean-Christophe Helary

unread,
Apr 29, 2011, 10:28:23 AM4/29/11
to textwr...@googlegroups.com

On 29 avr. 11, at 07:33, MSimbron wrote:

> This worked GREAT. Thanks again. I will reread the TW manual, but I
> admit it didn't make much sense to me.

http://mac4translators.blogspot.com/2011/04/introduction-to-regular-expressions.html

That should be simple enough to get you started. But the manual is really well written and easy to read.


Jean-Christophe Helary
----------------------------------------
fun: http://mac4translators.blogspot.com
work: http://www.doublet.jp (ja/en > fr)
tweets: http://twitter.com/brandelune

MSimbron

unread,
Apr 28, 2011, 6:33:52 PM4/28/11
to TextWrangler Talk
This worked GREAT. Thanks again. I will reread the TW manual, but I
admit it didn't make much sense to me.

On Apr 1, 3:53 pm, Christopher Bort <top...@thehundredacre.net> wrote:

Christopher Bort

unread,
Apr 29, 2011, 2:09:06 PM4/29/11
to textwr...@googlegroups.com
On 4/28/11 at 3:33 PM, msim...@gmail.com (MSimbron) wrote:

>This worked GREAT. Thanks again. I will reread the TW manual, but I
>admit it didn't make much sense to me.

Regular expressions is one of those topics that tends to be
fairly opaque when you first encounter them, until you have some
'Aha!' moment and then they all of a sudden make perfect sense.
Read the manual slowly and deliberately, starting with the
basics and don't move to the 'advanced' sections until you're
understanding the basics. As you read, you can have TW running
with some (any) text document and the Find dialog open. Then, as
you're reading, you can easily try out simple patterns to see
how they work. When you're writing more complex real-world
patterns, it's sometimes helpful to build them one simple part
at a time, testing as you go by performing finds on your actual
target text until you have a pattern that fully matches the text
you're looking for. Once you've learned regular expressions as
TW implements them, using the primer in the TW manual, you can
move on to many other sources, such as the O'Reilly book
'Mastering Regular Expressions,' to learn about how they're
implemented in other environments. If you work with text a lot,
regex is well worth the effort to learn; once you get it, you'll
wonder how you ever got by without it.
--
Christopher Bort
<top...@thehundredacre.net>

Reply all
Reply to author
Forward
0 new messages