Having trouble with this grep - removing after commas

josh

unread,

Aug 12, 2010, 5:21:39 PM8/12/10

to TextWrangler Talk

I have this grep but i think i made a mistake somewhere:
find: ^(.*),.*$
replace \1

What im trying to do is remove anything after the first comma for
example:
josh,hello,there,now,today,tomorrow
hi,now,text,wrangler,let,go

to become:
josh
hi

Can someone tell me where i went wrong with that grep?
my grep only deletes the last comma on the line, not all the commas.
Thanks!

Lee Smith

unread,

Aug 12, 2010, 6:05:22 PM8/12/10

to textwr...@googlegroups.com

Can you post a sample of the text.

> --
> You received this message because you are subscribed to the
> "TextWrangler Talk" discussion group on Google Groups.
> To post to this group, send email to textwr...@googlegroups.com
> To unsubscribe from this group, send email to
> textwrangler...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/textwrangler?hl=en
> If you have a feature request or would like to report a problem,
> please email "sup...@barebones.com" rather than posting to the group.

josh

unread,

Aug 12, 2010, 8:11:28 PM8/12/10

to TextWrangler Talk

Here's a sample of what im dealing with:
jo...@aol.com,smith,,126.555.13.126,https://www.textjosh.com,8/12/10
0:00,export_2222
jo...@aol.com,smith,,126.555.13.126,https://www.textjosh.com,8/12/10
0:00,export_2222
jo...@aol.com,smith,,126.555.13.126,https://www.textjosh.com,8/12/10
0:00,export_2222

I may have some with more commas or less but thats pretty much what
most my files look like - and the format.
i'm using this grep:
find: ^(.*),.*$
replace \1

but it only replaces the last commas (and content after) i want to
replace all the content so that only the first part remains----in this
case results would lead to:
jo...@aol.com
jo...@aol.com
jo...@aol.com

thanks!

> > please email "supp...@barebones.com" rather than posting to the group.

Christopher Bort

unread,

Aug 12, 2010, 8:27:24 PM8/12/10

to textwr...@googlegroups.com

You want to use the non-greedy quantifier *? instead of the
greedy * . Also, since . does not match return characters by
default, you can omit the end-of-line anchor ($) if you like.
Change your search pattern to

^(.*?),.*

--
Christopher Bort
<top...@thehundredacre.net>
<http://www.thehundredacre.net/>
Skype: topherbort

wi...@serensoft.com

unread,

Aug 12, 2010, 8:18:51 PM8/12/10

to textwr...@googlegroups.com

Try

^([^,]*),.*$

please email "sup...@barebones.com" rather than posting to the group.

--
will trillich
"I just try to make sure that the laziest thing I can do at any moment is what I should be doing." -- matt.might.net

will trillich

unread,

Aug 12, 2010, 8:21:12 PM8/12/10

to textwr...@googlegroups.com

When your FIND pattern is (.*) it matches as MANY characters as it can. This is called being "greedy" (you'll see that term when you read about regular expressions, and this is what it's referring to.) So for (.*)(.*) which one "gobbles" up all the text? It's the first one. (.*) means zero or more of ANY character matching as MANY characters as possible, so we snag the whole string here. The second (.*) in this case matches the zero characters after the first match, so it's always empty.

You can specify non-greedy matches by adding a question-mark after the + or * such as (.*?)(.*) so that here, the first () group will match the FEWEST possible characters, and in this case the second (.*) will match the whole string. You may be able to

So (.*),(.*) will match as many characters as possible in the first () group, then the final comma in the string, and then whatever follows it.

What you're looking for is specifically the first comma, so the [^,] is a better solution here:

([^,]*),(.*)

That will match ANY character except a comma, then matches that first comma, then matches any ANY characters after the comma. The \1 will contain whatever was BEFORE the first comma (because we said to stop when we found a comma).

Make sense? Regular expressions -- pretty powerful stuff!

On Thu, Aug 12, 2010 at 4:40 PM, wi...@serensoft.com <wi...@serensoft.com> wrote:

Right -- ^(.*) means "at beginning-of-line" find zero or more of ANY character, which is what it's doing
then ,.*$ means find a comma and anything that follows to end-of-lilne

What you want is the carat inside the square [] character-list brackets:

^([^,]*),.*$

which means
^ at beginning of line
[^,] any character EXCEPT a comma
* zero or more times
,.* followed by a comma and then any character, zero or more times

$ up to the end-of-line

Note that if you have a line with NO comma, this will not match (but the end result may be what you're after anyway).

The square brackets are very interesting:

[a-f] finds a, b, c, d, e or f.
[^a-f] finds ANY character EXCEPT a, b, c, d, e or f
When ^ is the first item inside the [] character-list brackets, ^ means NOT, as in [^,] means anything BUT a comma

When ^ is outside of the [] brackets then it means beginning-of-line

(Hmm, I wonder if the ^ carat has a defined meaning if it's NOT at the beginning of a [] character-list, hmm...)

As a more general purpose solution where you're parsing simple CSV data (without quotes and embedded commas in field data) try this:

^([^,]*,){3}([^,]*)(,[^,]*){8}$

If you were looking for the fourth field of 12, it'd be in \2 (\1 would contain fields 1-3, \3 would contain the last 8. Modify the {#} numbers according to your needs.

--
You received this message because you are subscribed to the
"TextWrangler Talk" discussion group on Google Groups.
To post to this group, send email to textwr...@googlegroups.com
To unsubscribe from this group, send email to
textwrangler...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/textwrangler?hl=en
If you have a feature request or would like to report a problem,
please email "sup...@barebones.com" rather than posting to the group.

--
will trillich
"I just try to make sure that the laziest thing I can do at any moment is what I should be doing." -- matt.might.net

--
will trillich
"I just try to make sure that the laziest thing I can do at any moment is what I should be doing." -- matt.might.net

--
will trillich
"I think it would be worse to expect nothing than to be disappointed." -- Anne (with an 'e') Shirley

josh

unread,

Aug 12, 2010, 6:54:52 PM8/12/10

to TextWrangler Talk

jo...@aol.com,josh,123.555.22.555,https://www.josh.com,8/12/10
0:00,josh_now
jo...@aol.com,josh,123.555.22.555,https://www.josh.com,8/12/10
0:00,josh_now
jo...@aol.com,josh,123.555.22.555,https://www.josh.com,8/12/10
0:00,josh_now

pretty much what it looks like .. sometimes there are more commas or
less but this what the majority of my files look like

thanks!

On Aug 12, 6:05 pm, Lee Smith <leew1...@gmail.com> wrote:

> > please email "supp...@barebones.com" rather than posting to the group.

wi...@serensoft.com

unread,

Aug 12, 2010, 5:48:52 PM8/12/10

to textwr...@googlegroups.com

Well that was only a partial (but still useful) answer... Lemme try again:

--
You received this message because you are subscribed to the
"TextWrangler Talk" discussion group on Google Groups.
To post to this group, send email to textwr...@googlegroups.com
To unsubscribe from this group, send email to
textwrangler...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/textwrangler?hl=en
If you have a feature request or would like to report a problem,

please email "sup...@barebones.com" rather than posting to the group.

--
will trillich
"I just try to make sure that the laziest thing I can do at any moment is what I should be doing." -- matt.might.net

wi...@serensoft.com

unread,

Aug 12, 2010, 5:40:29 PM8/12/10

to textwr...@googlegroups.com

Right -- ^(.*) means "at beginning-of-line" find zero or more of ANY character, which is what it's doing

then ,.*$ means find a comma and anything that follows to end-of-lilne

What you want is the carat inside the square [] character-list brackets:

^([^,]*),.*$

which means

^ at beginning of line

[^,] any character EXCEPT a comma

* zero or more times

,.* followed by a comma and then any character, zero or more times

$ up to the end-of-line

Note that if you have a line with NO comma, this will not match (but the end result may be what you're after anyway).

The square brackets are very interesting:

[a-f] finds a, b, c, d, e or f.

[^a-f] finds ANY character EXCEPT a, b, c, d, e or f

When ^ is the first item inside the [] character-list brackets, ^ means NOT, as in [^,] means anything BUT a comma

When ^ is outside of the [] brackets then it means beginning-of-line

(Hmm, I wonder if the ^ carat has a defined meaning if it's NOT at the beginning of a [] character-list, hmm...)

As a more general purpose solution where you're parsing simple CSV data (without quotes and embedded commas in field data) try this:

^([^,]*,){3}([^,]*)(,[^,]*){8}$

If you were looking for the fourth field of 12, it'd be in \2 (\1 would contain fields 1-3, \3 would contain the last 8. Modify the {#} numbers according to your needs.

On Thu, Aug 12, 2010 at 4:21 PM, josh <joaso...@gmail.com> wrote:

--
You received this message because you are subscribed to the
"TextWrangler Talk" discussion group on Google Groups.
To post to this group, send email to textwr...@googlegroups.com
To unsubscribe from this group, send email to
textwrangler...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/textwrangler?hl=en
If you have a feature request or would like to report a problem,
please email "sup...@barebones.com" rather than posting to the group.

josh

unread,

Aug 12, 2010, 11:30:33 PM8/12/10

to TextWrangler Talk

Wow thanks so much for the in dept description- it works great and
will definitely know my way around more because of your explanation!!!
Thanks so much!!

> > textwrangler...@googlegroups.com<textwrangler%2Bunsu...@googlegroups.com>

> > For more options, visit this group at
> >http://groups.google.com/group/textwrangler?hl=en
> > If you have a feature request or would like to report a problem,

> > please email "supp...@barebones.com" rather than posting to the group.

Reply all

Reply to author

Forward