Deleting lines that meet a criteria

sequoyah

unread,

Feb 10, 2009, 1:53:20 PM2/10/09

to TextWrangler Talk

How do I use TextWrangler to delete lines in a text file according to
some criteria? I want to delete lines that begin with a certain string
AND also contain another string.

I have tried filters in the search options, but these are for files,
not lines. Does this require Grep? If so, can someone help me with
that.

Thanks,

Al

Patrick Woolsey

unread,

Feb 10, 2009, 3:17:53 PM2/10/09

to textwr...@googlegroups.com

sequoyah <sequoyah...@sbcglobal.net> sez:

[...]

Although you can do this with a search, using:

Text -> Process Lines Containing

will probably be easier.

(You can also employ a grep pattern with this command; whether or not you
need to, will depend on what you want to do.)

Regards,

Patrick Woolsey
==
Bare Bones Software, Inc. <http://www.barebones.com>
P.O. Box 1048, Bedford, MA 01730-1048

sequoyah

unread,

Feb 10, 2009, 3:51:40 PM2/10/09

to TextWrangler Talk

Thanks for the response Patrick,

I can get that to work for one string of text, but how do I make it
work for lines containing two or more text strings criteria?

Example: lines that start with "%SSiPrshMark" AND also contain a
string such as "|fillcros.eps|".

Also can Text -> Process Lines Containing be used on a folder of
files?

Thanks,

Al

On Feb 10, 12:17 pm, Patrick Woolsey <pwool...@barebones.com> wrote:

Tom Robinson

unread,

Feb 10, 2009, 4:27:03 PM2/10/09

to textwr...@googlegroups.com

> I can get that to work for one string of text, but how do I make it
> work for lines containing two or more text strings criteria?
>
> Example: lines that start with "%SSiPrshMark" AND also contain a
> string such as "|fillcros.eps|".

The regular expression you're after is:

^%SSiPrshMark.*\|fillcros.eps\|.*

The caret anchors the pattern to the start of the line, .* searches
for any string of characters, the backslash escapes your vertical bar
so it's not interpreted as part of the RE.

> Also can Text -> Process Lines Containing be used on a folder of
> files?

No, but you could AppleScript TextWrangler to process a group of files.

Cheers

Patrick Woolsey

unread,

Feb 10, 2009, 4:41:11 PM2/10/09

to textwr...@googlegroups.com

sequoyah <sequoyah...@sbcglobal.net> sez:

[...]

>I can get that to work for one string of text, but how do I make it
>work for lines containing two or more text strings criteria?
>
>Example: lines that start with "%SSiPrshMark" AND also contain a
>string such as "|fillcros.eps|".

To do this, you will want to use a grep pattern; for example:

^%SSiPrshMark.+?\|fillcros.eps\|.+?$

(Note in particular that I've used ^ and $ to anchor the match to the line
start and end respectively, and \ to escape the vertical bars | which occur
in the second string.)

>Also can Text -> Process Lines Containing be used on a folder of
>files?

Not directly, though you can apply it to multiple files via an AppleScript,
or by running a text factory created in BBEdit.

sequoyah

unread,

Feb 10, 2009, 9:58:58 PM2/10/09

to TextWrangler Talk

Thank you Patrick and Tom,

Both of your Grep REs work equally well both in Text > Process Lines
Containing and in Search/Replace where I replace with \r.

The Search/Replace method is preferable because it can process a whole
folder of files without needing a script.

Thanks again,

Al

sequoyah

unread,

Feb 10, 2009, 11:49:46 PM2/10/09

to TextWrangler Talk

Actually replace with \r does not do what I wanted. I wanted not to be
left with an empty line between the preceding and the following
lines.

So I still need help with that.

Thanks,

Al

Jan Erik Moström

unread,

Feb 11, 2009, 12:57:45 AM2/11/09

to textwr...@googlegroups.com

On 11 feb 2009, at 05:49, sequoyah wrote:

> So I still need help with that.

\r\r

would match a line followed by an empty line- I didn't read the start
of the thread so I'm not able to put this into context.

jem

sequoyah

unread,

Feb 11, 2009, 1:16:35 AM2/11/09

to TextWrangler Talk

Thanks for the response, but that puts in twice as many blank lines.

Al

Jan Erik Moström

unread,

Feb 11, 2009, 2:10:02 AM2/11/09

to textwr...@googlegroups.com

On 11 feb 2009, at 07:16, sequoyah wrote:

> Thanks for the response, but that puts in twice as many blank lines.

I meant that by searching for

\r\r+

and replacing with

\r

you would delete empty lines

sequoyah

unread,

Feb 11, 2009, 3:37:36 AM2/11/09

to TextWrangler Talk

OK, that does work!

Thank you very much.

Al

sequoyah

unread,

Feb 11, 2009, 1:28:38 PM2/11/09

to TextWrangler Talk

Synthesizing what I learned here:

searching with either

^%SSiPrshMark.+?\|fillcros.eps\|.+?$\r

or

^%SSiPrshMark.*\|fillcros.eps\|.*\r

and replacing with nothing, will delete these lines and not leave any
blank lines.

Thanks again to Patrick, Tom, and Jan.

Al

sequoyah

unread,

Feb 11, 2009, 2:26:44 PM2/11/09

to TextWrangler Talk

New question.

The sample line:

%SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 1
36.00000 0

has the general format:

%SSiPressSheet: <Width> <Height> <PunchX> <PunchY> <Style> <GuideDist>
<Flags> <CtrMarkLen>

I need a Grep pattern to change <GuideDist> parameter in all lines of
this general format to a value of 4, so that the sample line would
become:

%SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 4
36.00000 0

Note that the number of digits for some parameters is variable, so
that the first one in the saple line could be 2.50000 instead
of2520.00000.

Can someone help with this, or suggest a more appropriate forum for
this type of question?

Thanks in advance,

Al

Christopher Bort

unread,

Feb 11, 2009, 3:16:42 PM2/11/09

to textwr...@googlegroups.com

On 02/11/09 11:26, sequoyah...@sbcglobal.net (sequoyah) wrote:

> New question.
>
> The sample line:
>
> %SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 1
> 36.00000 0
>
> has the general format:
>
> %SSiPressSheet: <Width> <Height> <PunchX> <PunchY> <Style> <GuideDist>
> <Flags> <CtrMarkLen>
>
> I need a Grep pattern to change <GuideDist> parameter in all lines of
> this general format to a value of 4, so that the sample line would
> become:
>
> %SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 4
> 36.00000 0
>
> Note that the number of digits for some parameters is variable, so
> that the first one in the saple line could be 2.50000 instead
> of2520.00000.

Assuming the fields are delimited by single spaces use

match string:
^((?:[^ ]+ ){7})\d+

replacement string:
\14

--
Christopher Bort
<top...@thehundredacre.net>
<http://www.thehundredacre.net/>

sequoyah

unread,

Feb 11, 2009, 3:56:22 PM2/11/09

to TextWrangler Talk

Thank you for the response Christopher.

My test file is known to contain 5 lines with that general format but
with different values for the <GuideDist> field.

Using ^((?:[^ ]+ ){7})\d+ finds 68 instances because it is too general
(omits %SSiPressSheet).

Using ^%SSiPressSheet((?:[^ ]+ ){7})\d+ finds the correct number.

Now for the replacement,

\14 replaces the line with " 36.00000 0", which is not at all what I
wanted. I just need to change the value of the <GuideDist> field (the
8th field) to a value of 1. So the line needs to be replaced with
itself, but with a value of 1 for that field.

But thanks for trying.

Al

On Feb 11, 12:16 pm, Christopher Bort <top...@thehundredacre.net>
wrote:

Christopher Bort

unread,

Feb 11, 2009, 4:54:11 PM2/11/09

to textwr...@googlegroups.com

On 02/11/09 12:56, sequoyah...@sbcglobal.net (sequoyah) wrote:

>Thank you for the response Christopher.
>
>My test file is known to contain 5 lines with that general format but
>with different values for the <GuideDist> field.
>
>Using ^((?:[^ ]+ ){7})\d+ finds 68 instances because it is too general
>(omits %SSiPressSheet).

Really? It matches the correct string (%SSiPressSheet:
2520.00000 1656.00000 0.00000 0.00000 0 432.00000 ) for me in
your example. The match becomes the subpattern \01 in the
replacement string. The \d+ is whatever number is in the eighth
field, and it should be replaced with a literal 4. Hence, the
replacement string of \014 (see below).

>Using ^%SSiPressSheet((?:[^ ]+ ){7})\d+ finds the correct number.
>
>Now for the replacement,
>
>\14 replaces the line with " 36.00000 0", which is not at all what I
>wanted. I just need to change the value of the <GuideDist> field (the
>8th field) to a value of 1. So the line needs to be replaced with
>itself, but with a value of 1 for that field.

Sorry, my error. The string should be \014

sequoyah

unread,

Feb 11, 2009, 5:31:27 PM2/11/09

to TextWrangler Talk

OK Christopher,

Searching with ^%SSiPressSheet((?:[^ ]+ ){7})\d+

and replacing with

\014 almost works, but it leaves out "%SSiPressSheet" in the replaced
line.

replacing with

%SSiPressSheet\014

does the job I need so that all lines such as

%SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 1
36.00000 0

%SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 2
36.00000 0
%SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 3
36.00000 0

are replaced by

%SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 4
36.00000 0

And I noticed now that I have been making a mistake in the earlier
posts refering to the <GuideDist>, when it should be the <Flags>
field. I am a newbie at Grep, so I don't know why my modifications to
your search/replace terms work. If you would like to take a look, I
can email you the file I am working on.

Thanks for the help.

Al

On Feb 11, 1:54 pm, Christopher Bort <top...@thehundredacre.net>
wrote:

sequoyah

unread,

Feb 11, 2009, 6:37:03 PM2/11/09

to TextWrangler Talk

New question about deleting lines again.

The sample line:

%SSiPrshMark: 1255.50000 1669.50000 9.00000 9.00000 1 0.00000 ||
0.00000 0 100 100 100 100 7 1 1 1 0 0.00000 0.00000 0 0

has the general format:

%SSiPrshMark: <PointRX> <PointRY> <ExtentRDX> <ExtentRDY> <Type>
<Size>
<Name> <Rotation> <ColorType> <Color1> <Color2> <Color3> <Color4>
<DupFlags> <SigMod> <SigStart> <Delivery> <IParam2> <RParam1>
<RParam2>

I need a Grep pattern to find lines of this general form having a
value of 1 for the <Type> field (the 5th field after "%SSiPrshMark:")
so that I can use it to delete ONLY lines that have the value of 1 for
that field.

Can someone help me with this?

Thanks in advance,

Al

Christopher Bort

unread,

Feb 12, 2009, 11:49:45 AM2/12/09

to textwr...@googlegroups.com

On 02/11/09 14:31, sequoyah...@sbcglobal.net (sequoyah) wrote:

>OK Christopher,
>
>Searching with ^%SSiPressSheet((?:[^ ]+ ){7})\d+
>
>and replacing with
>
>\014 almost works, but it leaves out "%SSiPressSheet" in the replaced
>line.

That's because you're not capturing it with your match
expression. I.e., it's not included inside the parentheses.

Try using my original match expression with the corrected
replacement string:

^((?:[^ ]+ ){7})\d+

and

\014

This works for me with your example line

%SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0
432.00000 1 36.00000 0

If you want to change some field other than the <GuideDist> that
you originally specified, just change quantifier (the {7}) and
the replacement string accordingly.

>And I noticed now that I have been making a mistake in the earlier
>posts refering to the <GuideDist>, when it should be the <Flags>
>field. I am a newbie at Grep, so I don't know why my modifications to
>your search/replace terms work. If you would like to take a look, I
>can email you the file I am working on.

Sorry, I don't have that kind of time at the moment.

sequoyah

unread,

Feb 12, 2009, 1:03:11 PM2/12/09

to TextWrangler Talk

Hi Christopher,

You're right. Searching with
^(%SSiPressSheet(?:[^ ]+ ){7})\d+

and replacing with

\014

does change all instances of the sample line with any value in the 7th
field to instances with a 7th field value of 4.

I appreciate the time taken for this.

Could you in the near future give me a pattern to search for lines of
this form:

%SSiPrshMark: 1255.50000 1669.50000 9.00000 9.00000 1 0.00000 ||
0.00000 0 100 100 100 100 7 1 1 1 0 0.00000 0.00000 0 0

but ONLY if the 5th field is 1? I need to delete ONLY those from the
file. That will give me all I need to complete the project.

Thank you very much,

Al

On Feb 12, 8:49 am, Christopher Bort <top...@thehundredacre.net>
wrote:

Tom Robinson

unread,

Feb 12, 2009, 2:59:08 PM2/12/09

to textwr...@googlegroups.com

On 2009-02-13, at 07:03, sequoyah wrote:

> Could you in the near future give me a pattern to search for lines of
> this form:
>
> %SSiPrshMark: 1255.50000 1669.50000 9.00000 9.00000 1 0.00000 ||
> 0.00000 0 100 100 100 100 7 1 1 1 0 0.00000 0.00000 0 0
>
> but ONLY if the 5th field is 1? I need to delete ONLY those from the
> file. That will give me all I need to complete the project.

See how far you get with these clues ;-)

- Use a ^ to anchor your search to the start of a line.
- You can search for %SSiPrshMark: as is
- [.\d]+ will search for a group of digits (including decimal point):
The square brackets search for any character inside the brackets, \d
is shorthand for 0123456789, and + searches for whatever is inside the
brackets 1 or more times. i.e. This pattern will match one of your
fields above.

I suggest starting with a find command and selecting Use Grep.

Write back if you get stuck.

sequoyah

unread,

Feb 12, 2009, 3:19:31 PM2/12/09

to TextWrangler Talk

Thanks for responding Tom,

Well I don't get very far, because you gave me no clue for how to be
selective about the value in the 5th field. All I can do is

^%SSiPrshMark:[.\d]+

which results in

The Pattern ^%SSiPrshMark:[.\d]+ was not found.

But there are tons of those lines in the file, only some of which have
a 1 in the 5th field. which are the ones I want to delete.

More help please.

Al

On Feb 12, 11:59 am, Tom Robinson <barefootg...@tomrobinson.co.nz>
wrote:

J Walton

unread,

Feb 12, 2009, 3:56:14 PM2/12/09

to textwr...@googlegroups.com

On Thu, Feb 12, 2009 at 12:19 PM, sequoyah
<sequoyah...@sbcglobal.net> wrote:

> More help please.

Al,

I think you will get much more from this process if you actually learn
how to use regular expressions, rather than simply finish this one
task. That's more what this list is about, and I know I have benefited
from others taking their time to help me with my questions. But
before I send an email to the list I make sure I have exhausted
*every* other resource - web pages, text wrangler's documentation.

I can't contribute much to this list - there's too many people that
know way more than I do about this - but on other lists where I'm the
expert (printing, retouching, color, 3d graphics, etc.) I am always up
to help others as long as I'm not doing all of their work for them.
Even then I'll do their work for them if they've tried everything they
can possibly do already.

Eventually if you go to the well too much it runs dry.

So, does anybody have some good GREP learning resources? I have a good
site bookmarked at home, but I'm at work right now. It doesn't take
long to figure out what the commands do, and you can always use the
search dialog to test your expressions as you go.

I hope I'm not being too negative here, it just seems like we're all
painting Tom Sawyer's fence.

J

Tom Robinson

unread,

Feb 12, 2009, 4:49:51 PM2/12/09

to textwr...@googlegroups.com

On 2009-02-13, at 09:19, sequoyah wrote:

> Well I don't get very far, because you gave me no clue for how to be
> selective about the value in the 5th field. All I can do is
>
> ^%SSiPrshMark:[.\d]+

Did you try searching for ^%SSiPrshMark: by itself and seeing what
TextWrangler found?

Did you try searching for [.\d]+ and seeing what TextWrangler found?

Then think how you might search for 2 fields of numbers at the same
time and we'll go from there.

Similar to what 'J' wrote: I teach people to fish, not go down to the
shop, buy the fish, take it home, and cook it for them ;-)

sequoyah

unread,

Feb 12, 2009, 5:43:27 PM2/12/09

to TextWrangler Talk

OK Tom,

Here's where am at:

^%SSiPrshMark:.[.\d]+.[.\d]+.[.\d]+.[.\d]+.+?$\r

is finding almost all of them, but not if the first number field has a
negative (or positive) sign. Oddly, other field with signs are OK. Why
is the sign a problem only for the first numerical field only, and not
for the others?

And most important, I have not succeeded in finding ONLY lines with a

1 in the 5th field.

So how do I fix my fishing gear?

Al

On Feb 12, 1:49 pm, Tom Robinson <barefootg...@tomrobinson.co.nz>
wrote:

sequoyah

unread,

Feb 12, 2009, 6:12:50 PM2/12/09

to TextWrangler Talk

^%SSiPrshMark:.+[.\d]+.[.\d]+.[.\d]+.[.\d]+.[1]+.+?$\r

Adding a + sign in front of the first [.\d]+ seems to take care of the
sign problem (why?), but having a .[1]+ in the 5th position does not
select ONLY lines with a 1 in that position. Why not?

Al

Tom Robinson

unread,

Feb 12, 2009, 6:20:36 PM2/12/09

to textwr...@googlegroups.com

> Here's where am at:
>
> ^%SSiPrshMark:.[.\d]+.[.\d]+.[.\d]+.[.\d]+.+?$\r

Nice work :-)

> is finding almost all of them, but not if the first number field has a
> negative (or positive) sign. Oddly, other field with signs are OK. Why
> is the sign a problem only for the first numerical field only, and not
> for the others?

A dot searches for any character and should only be used when you want
a wildcard. You're searching for spaces so put them in the regular
expression. Wildcards can cause too much data to be found.

Remember [.\d] is a shortcut for [.0123456789] so you can add more
characters inside the brackets to include them in the search.

> And most important, I have not succeeded in finding ONLY lines with a
> 1 in the 5th field.

You're searching for a 1 so put that into the expression.

The remaining part of the line can be included with something like .*\r

Christopher Bort

unread,

Feb 12, 2009, 6:47:07 PM2/12/09

to textwr...@googlegroups.com

On 02/12/09 15:12, sequoyah...@sbcglobal.net (sequoyah) wrote:

>^%SSiPrshMark:.+[.\d]+.[.\d]+.[.\d]+.[.\d]+.[1]+.+?$\r
>
>Adding a + sign in front of the first [.\d]+ seems to take care of the
>sign problem (why?)

That's not what you've done. The + character has special meaning
in regular expressions, as does the dot (.), so immediately
before your first [.\d]+, you've got '.+', which matches one or
more of any character. To match a literal +, you need to escape
it as \+. Also, I think you really want the dot to match a
space, so why not use either a literal space or \s?

>but having a .[1]+ in the 5th position does not select ONLY lines
>with a 1 in that position. Why not?

.[1]+ matches any character followed by one or more 1. For
instance, it would match X111111111111. I don't believe this is
what your looking for. Also [1] is a character class that
contains only one character. It is equivalent to a literal 1. No
need to use a class.

I think that you could really benefit from a careful reading of
the chapter on searching with GREP in the TW user manual. The
above demonstrates fairly clearly that you are lacking an
understanding of the basic syntax, which is described rather
well in the section on writing search patterns. I understand
that you may be under time pressure to get your current task
done, but investing the time in a thorough reading of the docs
will save you much time in the longer term.

sequoyah

unread,

Feb 13, 2009, 12:35:51 AM2/13/09

to TextWrangler Talk

The criticisms are well taken. I have been doing some homework. Here
is where am at now:

^%SSiPrshMark: +[.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r

and this almost does it. But the files I am needing to process contain
instances of these lines that I need to delete in which the first
field sometimes has a negative sign in front, sometimes not. So it
seems to me that the first field needs to include some "OR" logic such
as .\d OR -.\d.

I could not find anything about this in the TextWrangler Help or in
the Manual's index. So by trial and error I stumbled on these:

^%SSiPrshMark: +[.-\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
^%SSiPrshMark: +[-.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r

both of which to my surprise find both the ones with the negative
signs as well as the ones with no sign.

So using either of those search patterns to search, and replacing with
nothing does the job.

These files are templates used with a prepress software called Preps,
and the lines I am deleting are a particular type of mark. The
template editor for this software has a graphical user interface in
which I can see that TextWrangler indeed removed those marks from the
template.

Why use TextWrangler for this when there is a Preps template editor?
Because with this method I can make the change to entire folders of
these files. With the Preps editor I would need to remove these marks
one by one, processing each template file one at a time.

So thank you all for the guidance given. But I would appreciate an
explanation of why my first +[.-\d] and +[+.\d] terms find both
occurrences with and without negative signs.

Regards,

Al

On Feb 12, 3:47 pm, Christopher Bort <top...@thehundredacre.net>
wrote:

Patrick Woolsey

unread,

Feb 13, 2009, 8:33:32 AM2/13/09

to textwr...@googlegroups.com

sequoyah <sequoyah...@sbcglobal.net> sez:

>The criticisms are well taken. I have been doing some homework. Here
>is where am at now:
>
>^%SSiPrshMark: +[.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
>
>and this almost does it. But the files I am needing to process contain
>instances of these lines that I need to delete in which the first
>field sometimes has a negative sign in front, sometimes not. So it
>seems to me that the first field needs to include some "OR" logic such
>as .\d OR -.\d.

The method for doing this is covered by the section "Using Alternation" in
Ch. 8 of the PDF manual.

sequoyah

unread,

Feb 13, 2009, 11:01:22 AM2/13/09

to TextWrangler Talk

Hi Patrick,

Thanks for the tip. Based on my reading of that section of the manual
I come up with

^%SSiPrshMark: +[-.\d|.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r

which indeed works for negative as well as unsigned instances of the
first term.

But why did these others

^%SSiPrshMark: +[.-\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
^%SSiPrshMark: +[-.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r

work for that as well?

Thanks,

Al

On Feb 13, 5:33 am, Patrick Woolsey <pwool...@barebones.com> wrote:

Christopher Bort

unread,

Feb 13, 2009, 6:48:02 PM2/13/09

to textwr...@googlegroups.com

It's a slow afternoon at $DAYJOB, so away we go...

On 02/13/09 08:01, sequoyah...@sbcglobal.net (sequoyah) wrote:

>Hi Patrick,
>
>Thanks for the tip. Based on my reading of that section of the manual
>I come up with
>
>^%SSiPrshMark: +[-.\d|.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r

That's getting to be quite a Frankenstein's monster. Let's look
at it one part at a time:

^ Anchors your expression to the beginning of a line.

%SSiPrshMark: Matches the literal string '%SSiPrshMark:'

<space>+ Matches one or more space characters. I think
this is not
what you're trying to do. If I understand
correctly, you
really want to match a single space as the
field delimiter
following '%SSiPrshMark:'. If I recall, you
added the + in
an attempt to handle the number in the first
field being
signed. It doesn't do that so it should be removed.

[-.\d|.\d]+ This is your attempt to use alternation to
account for a
possible negative sign (-), as Patrick suggested.
Alternation uses parentheses to enclose the alternate
strings, not square brackets. Square brackets
are used to
define character classes and [-.\d|.\d] doesn't
make any
sense as a character class. I think what you
really want
here is

-?[.\d]+

which matches zero or one hyphen (-) followed by
one or more digit or dot characters. That is,
it will match
a decimal that may or may not have a negative sign.

[.\d]+.[.\d]+.[.\d]+<space>
Each [.\d]+ matches one or more digit or dot characters.
It will match a decimal number. (It will also
match any
string of numbers with more than one dot, like 12.34.45.78,
but I take it that shouldn't be a problem for
your current
task.)

Each dot (.) in between matches one of any
character. I
think that you're really trying to match field delimiting
spaces, so you should replace them with literal
spaces, as
you've done with the last <space>.

Since this pattern repeats four times
(including the one
that precedes these three), you can compact
your expression
by using a quantifier. That is, replace

'-?[.\d]+ [.\d]+ [.\d]+ [.\d]+ '

with

-?(?:[.\d]+ ){4}

The (?: and ) group the enclosed expression without
capturing matches to a subpattern for
replacement. The
{4} makes it match exactly four repetitions of
the preceding
grouping. If any of the fields can be negative decimals,
rather than just the first one, move the -?
inside the
grouping:

(?:-?[.\d]+ ){4}

1+<space> Matches one or more 1 followed by a space. It
will match not
only '1 ' but also '1111111111111111111 ' which
is, I think,
not what you want. If you want to match only a
single 1,
remove the +.

.+? This is a syntax error. + and ? are quantifiers that
contradict each other. .+? would match zero or
one instances
of one or more of any character. You're
apparently trying
to match everything to the end of the line
here, so I'd
replace this with [^\r]* which will match zero
or more of
anything other than <return>.

$\r This is, at best, redundant. The $ anchors the
expression to
the end of a line, but then your match doesn't
include the
<return> that you want to delete along with the
line, so
you've added a \r to match that. However, \r
implies an end
of line, so the $ is superfluous. It may also
be a syntax
error that it's not at the end of the
expression, but I'm
not certain of that. In any case, remove it and
simply use
\r.

With the above in mind, the following works with the example
line you gave previously:

^%SSiPrshMark: (?:-?[.\d]+ ){4}1 [^\r]*\r

Testing it here with TextWrangler, it matches your example line of:

%SSiPrshMark: 1255.50000 1669.50000 9.00000 9.00000 1 0.00000 ||
0.00000 0 100 100 100 100 7 1 1 1 0 0.00000 0.00000 0 0

It also matches variations of the example where one or more of
the first four number fields are signed, and it only matches
variations where the fifth field is 1.

Season to taste and enjoy.

Christopher Bort

unread,

Feb 13, 2009, 7:10:43 PM2/13/09

to textwr...@googlegroups.com

Doh! Let's try this again with more rational line wraps. Sorry
for the noise. 8-(

-?[.\d]+

with

match not only '1 ' but also '11111111111111111 '

sequoyah

unread,

Feb 13, 2009, 7:42:50 PM2/13/09

to TextWrangler Talk

WOW! Thank you very very very much Christopher. Excellent dissection
of my Frankenstein's monster and well thought out explanation.

One of the problems for a beginner in reg ex like me is that after
digging through the Help and the Manual one comes up with stuff that
works and there's no reason to think there's anything wrong with it.
There's more than one incident of that in this thread. That's why I
was asking why my two earlier attempts at dealing with the signed
values had worked:

^%SSiPrshMark: +[.-\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
^%SSiPrshMark: +[-.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r

As I said earlier, these didn't just work in TextWrangler, but the
saved result presented no problem for the Preps application to open
and use the file. So I get reinforcement that I got it right.

I will study your post at length. Thank you very much for taking the
time.

Al

On Feb 13, 4:10 pm, Christopher Bort <top...@thehundredacre.net>
wrote:

> Doh! Let's try this again with more rational line wraps. Sorry
> for the noise. 8-(
>
> It's a slow afternoon at $DAYJOB, so away we go...
>

Christopher Bort

unread,

Feb 13, 2009, 9:17:22 PM2/13/09

to textwr...@googlegroups.com

On 02/13/09 16:42, sequoyah...@sbcglobal.net (sequoyah) wrote:

>WOW! Thank you very very very much Christopher. Excellent dissection
>of my Frankenstein's monster and well thought out explanation.

Sometimes, when leading a horse to the edge of the stream
doesn't work, one has to toss him in to get him to get him to
see where the water is... ;-)

>One of the problems for a beginner in reg ex like me is that after
>digging through the Help and the Manual one comes up with stuff that
>works and there's no reason to think there's anything wrong with it.
>There's more than one incident of that in this thread.

Regex syntax can be somewhat opaque and the initial learning
curve is rather steep. The best way to learn it is to write as
many expressions as you can and see what works and what doesn't,
with a good reference like the TW manual handy. Once you get
past that initial curve, though, it flattens out and you'll
wonder how you were ever able to stumble through the day without
regular expressions.

>That's why I was asking why my two earlier attempts at dealing with
>the signed values had worked:
>
>^%SSiPrshMark: +[.-\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
>^%SSiPrshMark: +[-.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
>
>As I said earlier, these didn't just work in TextWrangler, but the
>saved result presented no problem for the Preps application to open
>and use the file. So I get reinforcement that I got it right.
>
>I will study your post at length. Thank you very much for taking
>the time.

De nada. It was an itch that I had to scratch. 8^)

Reply all

Reply to author

Forward