Split CSV into multiple smaller files

588 views
Skip to first unread message

Jake

unread,
Dec 15, 2014, 12:58:22 PM12/15/14
to textwr...@googlegroups.com
Hi There!

I have a very large CSV files that is really a combination of several tables. Each table is delineated in the file by lines at the start and end of each table, For example, the file is formatting like this (where the stuff in italics is sample fake data):

START_LEVELS
LEVES, TABLE, COLUMN, HEADERS
1,1,1,1
2,2,2,2
3,3,3,3
END_LEVELS
START_TARGETS
TARGETS, TABLE, COLUMN, HEADERS, EXAMPLE
4,4,4,4,4
5,5,5,5,5
6,6,6,6,6
7,7,7,7,7
END_TARGETS

I want to find a way to programmatically parse this file and split each table into it's own new .csv file. So the above example would result in one file called "'levels.csv" that looks like this:

LEVES, TABLE, COLUMN, HEADERS
1,1,1,1
2,2,2,2
3,3,3,3

and another file called "targets.csv" that looks like this:

TARGETS, TABLE, COLUMN, HEADERS, EXAMPLE
4,4,4,4,4
5,5,5,5,5
6,6,6,6,6
7,7,7,7,7

How do I do that in TextWrangler? Can I do it with an Applescript to make it fully automated?

Thanks!

Christopher Stone

unread,
Dec 18, 2014, 4:05:47 AM12/18/14
to TextWrangler-Talk
On Dec 15, 2014, at 11:58, Jake <ja...@jakedegroot.com> wrote:
I have a very large CSV files that is really a combination of several tables. Each table is delineated in the file by lines at the start and end of each table, For example, the file is formatting like this (where the stuff in italics is sample fake data):
How do I do that in TextWrangler? Can I do it with an Applescript to make it fully automated?
______________________________________________________________________

Hey Jake,

Well, that's not hard to do.

This script operates on the front window in TextWrangler.

-------------------------------------------------------------------------------------------
# Auth: Christopher Stone <script...@thestoneforge.com>
# dCre: 2014/12/18 00:05
# dMod: 2014/12/18 00:28 
# Appl: TextWrangler
# Task: Parse a single file collection of .csv data into separate files.
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @TextWrangler, @csv, @parse
-------------------------------------------------------------------------------------------

set csvFileContent to {}
tell application "TextWrangler"
  tell front text window
    set _file to its file as alias
    select insertion point before its text
    repeat
      set foundReco to find "(?ms)^START_.+?^END_.+?$" selecting match true ¬
        options {case sensitive:false, extend selection:false, returning results:true, search mode:grep, showing results:true}
      if found of foundReco = true then
        set end of csvFileContent to found text of foundReco
      else
        exit repeat
      end if
    end repeat
  end tell
end tell
tell application "Finder" to set _parent to parent of _file as text
repeat with i in csvFileContent
  set _start to (paragraph 1 of i)
  set _end to (paragraph -1 of i)
  if _start starts with "START_" and _end starts with "END_" then
    set AppleScript's text item delimiters to "_"
    set filePath to _parent & ((text items 2 thru -1 of _start) as text) & ".csv"
    if exTant(filePath) then error "File exists!"
    set AppleScript's text item delimiters to linefeed
    set _content to ((paragraphs 2 thru -2 of (contents of i)) as text) & linefeed
    writeUTF8(_content, filePath)
  else
    error "Something is wrong with the START_ or END_ delimiters of the data-set."
  end if
end repeat

-------------------------------------------------------------------------------------------
--» HANDLERS
-------------------------------------------------------------------------------------------
on exTant(_path) # Takes an HFS, Posix, or ~/Posix path as input.
  local _path
  try
    if _path starts with "~/" then
      set _path to (POSIX path of (path to home folder as text)) & text 3 thru -1 of _path
    end if
    if _path starts with "/" then
      alias POSIX file _path
    else if _path contains ":" then
      alias _path
    end if
    return true
  on error
    return false
  end try
end exTant
-------------------------------------------------------------------------------------------
on writeUTF8(_text, _file)
  try
    if _file starts with "~/" then
      set _file to POSIX path of (path to home folder as text) & text 3 thru -1 of _file
    end if
    set fRef to open for access _file with write permission
    set eof of fRef to 0
    write _text to fRef as «class utf8»
    close access fRef
  on error e number n
    try
      close access fRef
    on error e number n
      error "Error in writeUTF8() handler!" & return & return & e
    end try
  end try
end writeUTF8
-------------------------------------------------------------------------------------------

I don't know what your definition of very large file is, but if it wasn't monstrous I'd much rather do the job with AppleScript + the Satimage.osax and exclude TextWrangler from the equation.  Processing will be quite a lot faster.

This script operates on the selected file in the Finder.

-------------------------------------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2014/12/18 12:55
# dMod: 2014/12/18 01:04 
# Appl: TextWrangler
# Task: Parse a single file collection of .csv data into separate files.
# Libs: None
# Osax: Satimage.osax { REQUIRED! } { http://tinyurl.com/dc3soh }
# Tags: @Applescript, @Script, @TextWrangler, @csv, @parse
-------------------------------------------------------------------------------------------

tell application "Finder" to set _sel to selection as alias list
if length of _sel = 1 then set _file to item 1 of _sel
set AppleScript's text item delimiters to ":"
set _parent to ((text items 1 thru -2 of (_file as text)) as text) & ":"
set foundList to find text "(?m)^START_.+?$[\\r\\n](.+?)[\\r\\n]^END_.+?$" in _file using ¬
  "\\1" with regexp, all occurrences and string result
if foundList ≠ {} then
  set AppleScript's text item delimiters to ","
  repeat with i in foundList
    set _name to text item 1 of i
    set newFilePath to _parent & _name & ".csv"
    writetext i to file newFilePath
  end repeat
end if

-------------------------------------------------------------------------------------------

Neither one of these scripts has very much in the way of error-checking, and I would certain add more for a production script.

Both scripts put the new .csv files in the same folder as the original file.

--
Best Regards,
Chris

Christopher Stone

unread,
Dec 18, 2014, 8:46:37 PM12/18/14
to TextWrangler-Talk
On Dec 18, 2014, at 03:05, Christopher Stone <listm...@suddenlink.net> wrote:
This script operates on the selected file in the Finder.
______________________________________________________________________

Hey Jake,

I made some small changes to the Satimage.osax-based script (below) after getting my hands on an actual data-set (via MacScripter).

The TextWrangler-based script appears to work as is, and since there are only 5 splits in 'main.csv' processing is quite fast.

--
Best Regards,
Chris

-------------------------------------------------------------------------------------------
# Auth: Christopher Stone <script...@thestoneforge.com>
# dCre: 2014/12/18 12:55
# dMod: 2014/12/18 19:39
# Appl: TextWrangler
# Task: Parse a single file collection of .csv data into separate files.
# Libs: None
# Osax: Satimage.osax { REQUIRED! } { http://tinyurl.com/dc3soh }
# Tags: @Applescript, @Script, @TextWrangler, @csv, @parse
-------------------------------------------------------------------------------------------

tell application "Finder" to set _sel to selection as alias list
if length of _sel = 1 then set _file to item 1 of _sel
set AppleScript's text item delimiters to ":"
set _parent to ((text items 1 thru -2 of (_file as text)) as text) & ":"
set filePathList to find text "^START_(\\w+)$" in _file using "\\1" with regexp, all occurrences and string result
set filePathList to change "(.+)" into _parent & "\\1" & ".csv" in filePathList with regexp without case sensitive
set csvList to find text "(?m)^START_.+?$[\\r\\n](.+?)[\\r\\n]^END_.+?$" in _file using ¬
  "\\1" with regexp, all occurrences and string result
if (length of filePathList) = (length of csvList) then
  repeat with i from 1 to length of csvList
    writetext item i of csvList to file (item i of filePathList)
Reply all
Reply to author
Forward
0 new messages