Deleting columns of a CSV file, Grep?

1,369 views
Skip to first unread message

Daniel Shepherd

unread,
Oct 11, 2013, 7:39:45 AM10/11/13
to textwr...@googlegroups.com
Hi,

I have a huge CSV file which, surprise surprise my actual spreadsheet program baulks at trying to open!

However it opens perfectly in Textwranger. I actually only need one column of data and would like to remove the rest. The column is actually after the third comma. Would there be anyway with Grep to delete things up tot he third comma and delete everything after the forth comma?

Kind Regards
Dan

Steve

unread,
Oct 11, 2013, 10:52:23 AM10/11/13
to textwr...@googlegroups.com
Absolutely.

Search:
^(?:"[^"\r]*",|[^,"]*,){3}("[^"\r]*",|[^,"]*,).*

Replace:
\1

Here is an example .csv that I tested it on:


FirstCol,SecondCol,ThirdCol,FourthCol,FifthCol,LifeUniverseEverything,Lolcatz
,,,,,,true
Money,Fame,Power,Cheese,Smurf,RainbowBrite,false
Gold,Fortune,AC,"Mouse, maybe?",Blue Man Group,Power Rangers,true
Bling,Bling,DC,"Smile, and show your teeth, for the camera!",Navi,"String with commas, like this, or that",false

I specifically chose the 4th column to contain commas (hence the quotes around it) to ensure this could capture even those cases (don't know what your data looks like, but it certainly couldn't hurt to work with edge cases). I also threw in a few spaces (shouldn't matter for comma-separated).

Here's a breakdown of the regex:

^

Start of line, so you don't start in any arbitrary location in the file.

(?:"[^"\r]*",|[^,"]*,){3}

The '?:' at the beginning of the group means that the '\1' in the replacement ignores it as a "saved group". It looks for either:
[quotation mark][characters that are NOT newlines/quotation marks][quotation mark][comma]
...or...
[characters that are NOT commas/quotation marks][comma]

and both repeat 3 times (modify the trailing {3} to whatever you want to select the nth column).

("[^"\r]*",|[^,"]*,).*

Same as above, except that this group is saved for the '\1' replacement.

.*

All columns from n+1 and forward, to the end of the line.


-Steve

Daniel Shepherd

unread,
Oct 11, 2013, 4:02:56 PM10/11/13
to textwr...@googlegroups.com
You are a genius Steve, it took  a minute to wrap my head around this one but I got - such a faster way to work with CSV files than memory inefficient database apps. Thanks so much - your posts are all getting saved to Evernote!

Dan Ramirez

unread,
Apr 22, 2014, 3:22:46 PM4/22/14
to textwr...@googlegroups.com
Is there a way to do these in reverse? I want to delete a specific column. Thanks.

Steve Fullmer

unread,
Apr 22, 2014, 3:40:34 PM4/22/14
to textwr...@googlegroups.com
Dan, so instead of removing columns 1-2 and 4-* and preserving column 3, you want to preserve columns 1-2 and 4-* and remove 3?

Search:
^((?:"[^"\r]*",|[^,"]*,){2})(?:"[^"\r]*",|[^,"]*,)(.*)$

Replace:
\1\2

This will keep columns 1-2 and 4-*, then delete the 3rd column (change the {2} to whatever 'n-1' column you want to remove, so {5} to remove column 6, {0} to remove column 1, etc.)

-Steve



--
This is the TextWrangler Talk public discussion group.
If you have a feature request or would like to report a problem,
please email "sup...@barebones.com" instead of posting here.
---
You received this message because you are subscribed to the Google Groups "TextWrangler Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to textwrangler...@googlegroups.com.

Dan Ramirez

unread,
Apr 22, 2014, 5:19:34 PM4/22/14
to textwr...@googlegroups.com
Thanks a lot, It worked per file, but does not work on "Multi-File Search". I'm on version 4.5.7

Dan Ramirez

unread,
Apr 22, 2014, 6:54:56 PM4/22/14
to textwr...@googlegroups.com
My bad, user error. My filters were not set correctly.

tchamba junias

unread,
Feb 6, 2018, 4:07:14 PM2/6/18
to TextWrangler Talk

Rich Siegel

unread,
Feb 6, 2018, 4:15:36 PM2/6/18
to textwr...@googlegroups.com
On 2/6/18 at 3:56 PM, tchamb...@gmail.com (tchamba junias) wrote:

>I have a huge CSV file which, surprise surprise my actual
>spreadsheet program baulks at trying to open!
>
>However it opens perfectly in Textwranger. I actually only need
>one column of data and would like to remove the rest. The
>column is actually after the third comma. Would there be anyway
>with Grep to delete things up tot he third comma and delete
>everything after the forth comma?

I recommend that you use BBEdit for this purpose, since it
contains a "Columns" command on the Edit menu which is
purpose-built for working with column-delimited data.

<https://www.barebones.com/products/bbedit/>

R.
--
Rich Siegel Bare Bones Software, Inc.
<sie...@barebones.com> <http://www.barebones.com/>

Someday I'll look back on all this and laugh... until they
sedate me.

Reply all
Reply to author
Forward
0 new messages