[...]
Although you can do this with a search, using:
Text -> Process Lines Containing
will probably be easier.
(You can also employ a grep pattern with this command; whether or not you
need to, will depend on what you want to do.)
Regards,
Patrick Woolsey
==
Bare Bones Software, Inc. <http://www.barebones.com>
P.O. Box 1048, Bedford, MA 01730-1048
The regular expression you're after is:
^%SSiPrshMark.*\|fillcros.eps\|.*
The caret anchors the pattern to the start of the line, .* searches
for any string of characters, the backslash escapes your vertical bar
so it's not interpreted as part of the RE.
> Also can Text -> Process Lines Containing be used on a folder of
> files?
No, but you could AppleScript TextWrangler to process a group of files.
Cheers
[...]
>I can get that to work for one string of text, but how do I make it
>work for lines containing two or more text strings criteria?
>
>Example: lines that start with "%SSiPrshMark" AND also contain a
>string such as "|fillcros.eps|".
To do this, you will want to use a grep pattern; for example:
^%SSiPrshMark.+?\|fillcros.eps\|.+?$
(Note in particular that I've used ^ and $ to anchor the match to the line
start and end respectively, and \ to escape the vertical bars | which occur
in the second string.)
>Also can Text -> Process Lines Containing be used on a folder of
>files?
Not directly, though you can apply it to multiple files via an AppleScript,
or by running a text factory created in BBEdit.
> So I still need help with that.
\r\r
would match a line followed by an empty line- I didn't read the start
of the thread so I'm not able to put this into context.
jem
> Thanks for the response, but that puts in twice as many blank lines.
I meant that by searching for
\r\r+
and replacing with
\r
you would delete empty lines
> New question.
>
> The sample line:
>
> %SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 1
> 36.00000 0
>
> has the general format:
>
> %SSiPressSheet: <Width> <Height> <PunchX> <PunchY> <Style> <GuideDist>
> <Flags> <CtrMarkLen>
>
> I need a Grep pattern to change <GuideDist> parameter in all lines of
> this general format to a value of 4, so that the sample line would
> become:
>
> %SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0 432.00000 4
> 36.00000 0
>
> Note that the number of digits for some parameters is variable, so
> that the first one in the saple line could be 2.50000 instead
> of2520.00000.
Assuming the fields are delimited by single spaces use
match string:
^((?:[^ ]+ ){7})\d+
replacement string:
\14
--
Christopher Bort
<top...@thehundredacre.net>
<http://www.thehundredacre.net/>
>OK Christopher,
>
>Searching with ^%SSiPressSheet((?:[^ ]+ ){7})\d+
>
>and replacing with
>
>\014 almost works, but it leaves out "%SSiPressSheet" in the replaced
>line.
That's because you're not capturing it with your match
expression. I.e., it's not included inside the parentheses.
Try using my original match expression with the corrected
replacement string:
^((?:[^ ]+ ){7})\d+
and
\014
This works for me with your example line
%SSiPressSheet: 2520.00000 1656.00000 0.00000 0.00000 0
432.00000 1 36.00000 0
If you want to change some field other than the <GuideDist> that
you originally specified, just change quantifier (the {7}) and
the replacement string accordingly.
>And I noticed now that I have been making a mistake in the earlier
>posts refering to the <GuideDist>, when it should be the <Flags>
>field. I am a newbie at Grep, so I don't know why my modifications to
>your search/replace terms work. If you would like to take a look, I
>can email you the file I am working on.
Sorry, I don't have that kind of time at the moment.
> Could you in the near future give me a pattern to search for lines of
> this form:
>
> %SSiPrshMark: 1255.50000 1669.50000 9.00000 9.00000 1 0.00000 ||
> 0.00000 0 100 100 100 100 7 1 1 1 0 0.00000 0.00000 0 0
>
> but ONLY if the 5th field is 1? I need to delete ONLY those from the
> file. That will give me all I need to complete the project.
See how far you get with these clues ;-)
- Use a ^ to anchor your search to the start of a line.
- You can search for %SSiPrshMark: as is
- [.\d]+ will search for a group of digits (including decimal point):
The square brackets search for any character inside the brackets, \d
is shorthand for 0123456789, and + searches for whatever is inside the
brackets 1 or more times. i.e. This pattern will match one of your
fields above.
I suggest starting with a find command and selecting Use Grep.
Write back if you get stuck.
> More help please.
Al,
I think you will get much more from this process if you actually learn
how to use regular expressions, rather than simply finish this one
task. That's more what this list is about, and I know I have benefited
from others taking their time to help me with my questions. But
before I send an email to the list I make sure I have exhausted
*every* other resource - web pages, text wrangler's documentation.
I can't contribute much to this list - there's too many people that
know way more than I do about this - but on other lists where I'm the
expert (printing, retouching, color, 3d graphics, etc.) I am always up
to help others as long as I'm not doing all of their work for them.
Even then I'll do their work for them if they've tried everything they
can possibly do already.
Eventually if you go to the well too much it runs dry.
So, does anybody have some good GREP learning resources? I have a good
site bookmarked at home, but I'm at work right now. It doesn't take
long to figure out what the commands do, and you can always use the
search dialog to test your expressions as you go.
I hope I'm not being too negative here, it just seems like we're all
painting Tom Sawyer's fence.
J
> Well I don't get very far, because you gave me no clue for how to be
> selective about the value in the 5th field. All I can do is
>
> ^%SSiPrshMark:[.\d]+
Did you try searching for ^%SSiPrshMark: by itself and seeing what
TextWrangler found?
Did you try searching for [.\d]+ and seeing what TextWrangler found?
Then think how you might search for 2 fields of numbers at the same
time and we'll go from there.
Similar to what 'J' wrote: I teach people to fish, not go down to the
shop, buy the fish, take it home, and cook it for them ;-)
Nice work :-)
> is finding almost all of them, but not if the first number field has a
> negative (or positive) sign. Oddly, other field with signs are OK. Why
> is the sign a problem only for the first numerical field only, and not
> for the others?
A dot searches for any character and should only be used when you want
a wildcard. You're searching for spaces so put them in the regular
expression. Wildcards can cause too much data to be found.
Remember [.\d] is a shortcut for [.0123456789] so you can add more
characters inside the brackets to include them in the search.
> And most important, I have not succeeded in finding ONLY lines with a
> 1 in the 5th field.
You're searching for a 1 so put that into the expression.
The remaining part of the line can be included with something like .*\r
>^%SSiPrshMark:.+[.\d]+.[.\d]+.[.\d]+.[.\d]+.[1]+.+?$\r
>
>Adding a + sign in front of the first [.\d]+ seems to take care of the
>sign problem (why?)
That's not what you've done. The + character has special meaning
in regular expressions, as does the dot (.), so immediately
before your first [.\d]+, you've got '.+', which matches one or
more of any character. To match a literal +, you need to escape
it as \+. Also, I think you really want the dot to match a
space, so why not use either a literal space or \s?
>but having a .[1]+ in the 5th position does not select ONLY lines
>with a 1 in that position. Why not?
.[1]+ matches any character followed by one or more 1. For
instance, it would match X111111111111. I don't believe this is
what your looking for. Also [1] is a character class that
contains only one character. It is equivalent to a literal 1. No
need to use a class.
I think that you could really benefit from a careful reading of
the chapter on searching with GREP in the TW user manual. The
above demonstrates fairly clearly that you are lacking an
understanding of the basic syntax, which is described rather
well in the section on writing search patterns. I understand
that you may be under time pressure to get your current task
done, but investing the time in a thorough reading of the docs
will save you much time in the longer term.
>The criticisms are well taken. I have been doing some homework. Here
>is where am at now:
>
>^%SSiPrshMark: +[.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
>
>and this almost does it. But the files I am needing to process contain
>instances of these lines that I need to delete in which the first
>field sometimes has a negative sign in front, sometimes not. So it
>seems to me that the first field needs to include some "OR" logic such
>as .\d OR -.\d.
The method for doing this is covered by the section "Using Alternation" in
Ch. 8 of the PDF manual.
On 02/13/09 08:01, sequoyah...@sbcglobal.net (sequoyah) wrote:
>Hi Patrick,
>
>Thanks for the tip. Based on my reading of that section of the manual
>I come up with
>
>^%SSiPrshMark: +[-.\d|.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
That's getting to be quite a Frankenstein's monster. Let's look
at it one part at a time:
^ Anchors your expression to the beginning of a line.
%SSiPrshMark: Matches the literal string '%SSiPrshMark:'
<space>+ Matches one or more space characters. I think
this is not
what you're trying to do. If I understand
correctly, you
really want to match a single space as the
field delimiter
following '%SSiPrshMark:'. If I recall, you
added the + in
an attempt to handle the number in the first
field being
signed. It doesn't do that so it should be removed.
[-.\d|.\d]+ This is your attempt to use alternation to
account for a
possible negative sign (-), as Patrick suggested.
Alternation uses parentheses to enclose the alternate
strings, not square brackets. Square brackets
are used to
define character classes and [-.\d|.\d] doesn't
make any
sense as a character class. I think what you
really want
here is
-?[.\d]+
which matches zero or one hyphen (-) followed by
one or more digit or dot characters. That is,
it will match
a decimal that may or may not have a negative sign.
[.\d]+.[.\d]+.[.\d]+<space>
Each [.\d]+ matches one or more digit or dot characters.
It will match a decimal number. (It will also
match any
string of numbers with more than one dot, like 12.34.45.78,
but I take it that shouldn't be a problem for
your current
task.)
Each dot (.) in between matches one of any
character. I
think that you're really trying to match field delimiting
spaces, so you should replace them with literal
spaces, as
you've done with the last <space>.
Since this pattern repeats four times
(including the one
that precedes these three), you can compact
your expression
by using a quantifier. That is, replace
'-?[.\d]+ [.\d]+ [.\d]+ [.\d]+ '
with
-?(?:[.\d]+ ){4}
The (?: and ) group the enclosed expression without
capturing matches to a subpattern for
replacement. The
{4} makes it match exactly four repetitions of
the preceding
grouping. If any of the fields can be negative decimals,
rather than just the first one, move the -?
inside the
grouping:
(?:-?[.\d]+ ){4}
1+<space> Matches one or more 1 followed by a space. It
will match not
only '1 ' but also '1111111111111111111 ' which
is, I think,
not what you want. If you want to match only a
single 1,
remove the +.
.+? This is a syntax error. + and ? are quantifiers that
contradict each other. .+? would match zero or
one instances
of one or more of any character. You're
apparently trying
to match everything to the end of the line
here, so I'd
replace this with [^\r]* which will match zero
or more of
anything other than <return>.
$\r This is, at best, redundant. The $ anchors the
expression to
the end of a line, but then your match doesn't
include the
<return> that you want to delete along with the
line, so
you've added a \r to match that. However, \r
implies an end
of line, so the $ is superfluous. It may also
be a syntax
error that it's not at the end of the
expression, but I'm
not certain of that. In any case, remove it and
simply use
\r.
With the above in mind, the following works with the example
line you gave previously:
^%SSiPrshMark: (?:-?[.\d]+ ){4}1 [^\r]*\r
Testing it here with TextWrangler, it matches your example line of:
%SSiPrshMark: 1255.50000 1669.50000 9.00000 9.00000 1 0.00000 ||
0.00000 0 100 100 100 100 7 1 1 1 0 0.00000 0.00000 0 0
It also matches variations of the example where one or more of
the first four number fields are signed, and it only matches
variations where the fifth field is 1.
Season to taste and enjoy.
-?[.\d]+
with
match not only '1 ' but also '11111111111111111 '
>WOW! Thank you very very very much Christopher. Excellent dissection
>of my Frankenstein's monster and well thought out explanation.
Sometimes, when leading a horse to the edge of the stream
doesn't work, one has to toss him in to get him to get him to
see where the water is... ;-)
>One of the problems for a beginner in reg ex like me is that after
>digging through the Help and the Manual one comes up with stuff that
>works and there's no reason to think there's anything wrong with it.
>There's more than one incident of that in this thread.
Regex syntax can be somewhat opaque and the initial learning
curve is rather steep. The best way to learn it is to write as
many expressions as you can and see what works and what doesn't,
with a good reference like the TW manual handy. Once you get
past that initial curve, though, it flattens out and you'll
wonder how you were ever able to stumble through the day without
regular expressions.
>That's why I was asking why my two earlier attempts at dealing with
>the signed values had worked:
>
>^%SSiPrshMark: +[.-\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
>^%SSiPrshMark: +[-.\d]+.[.\d]+.[.\d]+.[.\d]+ 1+ .+?$\r
>
>As I said earlier, these didn't just work in TextWrangler, but the
>saved result presented no problem for the Preps application to open
>and use the file. So I get reinforcement that I got it right.
>
>I will study your post at length. Thank you very much for taking
>the time.
De nada. It was an itch that I had to scratch. 8^)