Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parsing a .CSV file using Tcl

3,325 views
Skip to first unread message

Ignoramus

unread,
Jan 22, 2006, 8:37:54 PM1/22/06
to
Hi

I have about ~ 250KB of data in a spreadsheet (saved as a .CSV file).

Is there a way to write Tcl script, so that I can parse the file and
import the data from the spreadsheet into a MySQL database based on the
column /rows' fields of the spreadsheet? The data in the spreadsheet is
both text and numerics.

Thanks

Gerald W. Lester

unread,
Jan 23, 2006, 12:29:43 AM1/23/06
to

Take a look at the CSV module of TclLib.

suchenwi

unread,
Jan 23, 2006, 2:12:34 AM1/23/06
to
Besides using the csv package, you can also roll it yourself. CSV is
mostly easy to parse: each record is a line in a text file; fields are
separated by, well, comma (or semicolon...) You could do it like this:
set f [open $filename]
while {[gets $f line]>=0} {
set fields [split $line ","]
puts "first field is [lindex $fields 0]"
puts "tenth field is [lindex $fields 9]"
...
}
close $f

Michael Siemon

unread,
Jan 23, 2006, 2:22:36 AM1/23/06
to
In article <1138000354....@z14g2000cwz.googlegroups.com>,
"suchenwi" <richard.suchenw...@siemens.com> wrote:

Yes, but problems arise in cases where the individual fields contain
the separator character; a simple "split" won't work right in such
cases. If you know in advance that won't arise, split will work OK.

But then there are also issues of quotes being silently supplied
in the source of the CSV, and whether one should or should not
strip these from the fields. Where I can control the input, I'm
happy to use split and lindex the result; otherwise, ...

Steve Landers

unread,
Jan 23, 2006, 2:26:08 AM1/23/06
to

And hope that you're input file doesn't contain commas within fields.

IMO it is much better and safer to use the tcllib module.

Steve

Sean Woods

unread,
Jan 23, 2006, 9:16:34 AM1/23/06
to

In addition to the tcllib package "csv", you can also import text files
directly through MySQL using the LOAD DATA command:

http://www.modwest.com/help/kb.phtml?qid=253&cat=6

Krzysztof Meres

unread,
Feb 22, 2021, 12:34:10 PM2/22/21
to
I found quite straightforward and robust way of parsing a .CSV file (file.csv in this example) using Tcl (based on split command utilization). The key is to make a quoted contents of an Excel cell a sub-list element which is not subject to splitting the list with comma:

set lines [split [exec cat file.csv] \n]
foreach line $lines {
set noQuotes [split $line \"]
set lineList {}
foreach {unquoted quoted} $noQuotes {
eval lappend lineList [split [string trim $unquoted ,] ,]
lappend lineList $quoted
}
if [expr [llength $noQuotes] %2] then {set lineList [lrange $lineList 0 end-1]}

Resulting lineList is a Tcl list containing rows of columns of originally comma separated values with preserved quoted values even if they contain commas.

Regards,

Krzysztof Meres

Ted Nolan <tednolan>

unread,
Feb 22, 2021, 12:58:00 PM2/22/21
to
In article <82994a4d-aa5f-4e01...@googlegroups.com>,
I don't know if it existed in 2006, but there is a Tcllib package for
handling CSV data now:

https://core.tcl-lang.org/tcllib/doc/trunk/embedded/md/tcllib/files/modules/csv/csv.md


--
columbiaclosings.com
What's not in Columbia anymore..

Robert Heller

unread,
Feb 22, 2021, 2:00:40 PM2/22/21
to
At 22 Feb 2021 17:57:55 GMT t...@loft.tnolan.com (Ted Nolan <tednolan>) wrote:

>
> Content-Type: text/plain
The csv and struct::matrix packages that are part of tcllib are the go-to
packages whenever I need to do anything with a CSV file. *I* generally create
a SNIT type that delegates to a struct::matrix object installed as a component
and use the csv package's ::csv::read2matrix to load the CSV file into the
matrix.

>

--
Robert Heller -- Cell: 413-658-7953 GV: 978-633-5364
Deepwoods Software -- Custom Software Services
http://www.deepsoft.com/ -- Linux Administration Services
hel...@deepsoft.com -- Webhosting Services

Rich

unread,
Feb 22, 2021, 2:27:38 PM2/22/21
to
Do you really think that after 15 years, "Ignoramus" is still waiting
for a reply to this posting?

As for parsing csv, just use the Tcllib csv module, as it has already
been debugged over the ensuing 15 years and now should handle most all
of the odd edge cases in the CSV format.

http://tmml.sourceforge.net/doc/tcllib/csv.html

Now, some critiques of your code:

First - the snippet you posted does not run, due to syntax errors:

$ ./usenet-csv
missing close-brace
while executing
"foreach line $lines {"
(file "./usenet-csv" line 4)

Second - this part:

> set lines [split [exec cat file.csv] \n]

is a non-Tcl, non-portable way to read in the CSV file (it also
may or may not handle character encodings properly depending on the
rest of your system settings).

Third - for this part:

> eval lappend lineList [split [string trim $unquoted ,] ,]

eval is almost never necessary in modern Tcl. And for the few times it
is, you have to be careful to properly quote evaled variables lest you
open yourself up to Tcl injection attacks. This line should likely
work properly without the eval, as you are simply doing an lappend.

Fourth - here:

> set lineList {}

You reset lineList in the middle of the line reading loop, which forces
use of lineList before the loop iterates, otherwise only the last line
remains in the list when your outer loop terminates.

So, I created this bit of code to create a csv file:

#!/usr/bin/tclsh

package require csv
set fd [open file.csv {WRONLY CREAT TRUNC}]
# embedded " within a field
puts $fd [csv::join [list "col 1" "col 2" "col \"3\"" "col 4"]]
# embedded , within a field
puts $fd [csv::join [list c1 c2 c,3 c4]]
# both " and , embedded within a field
puts $fd [csv::join [list c1 c\",2 c3 c4]]
# newline within a field
puts $fd [csv::join [list c1 c2 c\n3 c4]]
close $fd

Running it creates this as "file.csv":

col 1,col 2,"col ""3""",col 4
c1,c2,"c,3",c4
c1,"c"",2",c3,c4
c1,c2,"c
3",c4

And I created this, using the Tcllib module, to show that the Tcllib
module correctly reads this file back:

#!/usr/bin/tclsh

package require csv
package require struct::queue
struct::queue q
set fd [open file.csv RDONLY]
csv::read2queue $fd q
set seq 0
foreach row [q get [q size]] {
puts "Row [incr seq]: $row"
}
close $fd

Running it results in:

Row 1: {col 1} {col 2} {col "3"} {col 4}
Row 2: c1 c2 c,3 c4
Row 3: c1 c\",2 c3 c4
Row 4: c1 c2 {c
3} c4

Which is a correct result.

Correcting your syntax error and appending a "puts [join $lineList]" to
the end of your code, then running it, results in only this for the
same input .csv file:

3 ,c4

Moving the "puts [join $lineList]" to inside your foreach loop (and
changing it to "puts "Row [incr seq]: $lineList" to get a "Row 1:"...
indicator) results in:

Row 1: {col 1} {col 2} {col } 3 {} {col 4}
Row 2: c1 c2 c,3 c4
Row 3: c1 c ,2 c3 c4
Row 4: c1 c2 c
Row 5: 3 ,c4

Which is an incorrect result. The first and third rows have had an
extra column falsely inserted, all of the embedded quotes have been
stripped away, the fourth row has been split into two separate rows,
and the false fifth row has had an extra comma inserted that does not
belong.

Moral: Just use the existing module. The CSV format has enough sharp
edge cases that it is by far more difficult to parse than it at first
appears. The existing module's already been through having to handle
the sharp edge cases, take advantage of that history.

0 new messages