Tcl performance with large files

Gregory Deal

unread,

Jul 7, 1999, 3:00:00 AM7/7/99

to

I'm using Tcl/Tk 8.0 to read a 90 MB file, pull some fields from specific
lines, and fill some arrays (4 arrays, each with about 200,000 items).
Unfortunately, this takes ~7 hrs (running at 100%) on an HP9000/J2240 with
plenty of memory, and no swapping. Is this typical performance? I pulled out
any redundant calculations from the loop. Should I preprocess the data with
perl and just have Tcl read that in and load up the arrays? Thanks for any
help or guidance.

Richard.Suchenwirth

unread,

Jul 7, 1999, 3:00:00 AM7/7/99

to Gregory Deal

Lots of data. Seems to call rather for a database than for a Tcl script.
But maybe you can ease the load for Tcl: if the Big one is a text file,
and those lines that interest you have a recognizable pattern, pipe it
through a sed, grep, or awk (or Perl, as you suggested). Here's some
shots into the darkness:

When you process lines in Tcl, it may help to replace several regsubs by
one regexp.

When calculating with expr, curlybrace the arguments:
expr {$foo + $bar * $baz} ;#instead of
expr $foo + $bar * $baz

Incrementing counters is best done with
incr me ;# instead of
set me [expr $me+1]

If you can extract a representative test sample from the Big file, do
that and time various program alternatives to find out what's fastest.
Also, by turning on or off parts of the test processing, identify the
parts that eat most of the time. They'll profit most from optimization.

Post non-confidential code examples here on c.l.t for more detailed help
(never guaranteed, but often coming).

--
Schoene Gruesse/best regards, Richard Suchenwirth -- tel. +49-7531-86
2703
> RC DT2, Siemens Electrocom GmbH, Buecklestr. 1-5, D-78467 Konstanz, Germany
> My opinions were not necessarily, or will not necessarily be, mine.

Bob Techentin

unread,

Jul 7, 1999, 3:00:00 AM7/7/99

to

Gregory Deal wrote:
>
> I'm using Tcl/Tk 8.0 to read a 90 MB file, pull some fields from specific
> lines, and fill some arrays (4 arrays, each with about 200,000 items).
> Unfortunately, this takes ~7 hrs (running at 100%) on an HP9000/J2240 with
> plenty of memory, and no swapping. Is this typical performance? I pulled out
> any redundant calculations from the loop. Should I preprocess the data with
> perl and just have Tcl read that in and load up the arrays? Thanks for any
> help or guidance.

Gregory,

Look at a discussion of Tcl Performance on
http://purl.org/thecliff/tcl/wiki/TclPerformance

It is hard to tell why your code seems to be slow without seeing it. My
guess would be that you are using the default buffer size when reading
the 90Mb file. See the section on "slurping up data files" towards the
bottom of the Tcl Performance page. If this is the primary issue, it
could make a 50x speed difference in your code.

Good luck,
Bob
--
Bob Techentin techenti...@mayo.edu
Mayo Foundation (507) 284-2702
Rochester MN, 55905 USA http://www.mayo.edu/sppdg/sppdg_home_page.html

Bryan Oakley

unread,

Jul 7, 1999, 3:00:00 AM7/7/99

to

Gregory Deal wrote:
>
> I'm using Tcl/Tk 8.0 to read a 90 MB file, pull some fields from specific
> lines, and fill some arrays (4 arrays, each with about 200,000 items).
> Unfortunately, this takes ~7 hrs (running at 100%) on an HP9000/J2240 with
> plenty of memory, and no swapping. Is this typical performance? I pulled out
> any redundant calculations from the loop. Should I preprocess the data with
> perl and just have Tcl read that in and load up the arrays? Thanks for any
> help or guidance.

My gut reaction is this program is taking waaaayyyyy too long to
execute. I would think an HP9K could process a 90mb file in a matter of
a few tens of minutes, max. For example, I can set 800,000 array
elements to a static value in under a minute on a Pentium II class
machine. I can read a 35mb file in less than two minutes. I find it hard
to believe that actually processing that data could take almost seven
hours. Well, ok, I can believe it, but I also believe that can be cut
down by an order of magnitude.

My guess is, there is a bottleneck in how you read the data in and/or
write it out Look for a long thread on I/O performance in the past
couple of weeks. The nutshell summary is, it's significantly faster to
use read with the actual size of the data to read, than to default to a
relatively small buffer size (eg: [read $fileid [file size $file]])

There may also be some subtle optimizations that could make a huge
difference. For example, putting curly braces around expressions,
limited use of eval, making sure you aren't causing a lot of
list->string conversions (or visa versa), that sort of thing. But
without seeing the code, it's hard to say.

--
Bryan Oakley mailto:oak...@channelpoint.com
ChannelPoint, Inc. http://purl.oclc.org/net/oakley

Education is full of oversimplified lies which can be
refined into the truth later.

Volker Hetzer

unread,

Jul 9, 1999, 3:00:00 AM7/9/99

to

Gregory Deal wrote:
>
> I'm using Tcl/Tk 8.0 to read a 90 MB file, pull some fields from specific
> lines, and fill some arrays (4 arrays, each with about 200,000 items).
> Unfortunately, this takes ~7 hrs (running at 100%) on an HP9000/J2240 with
> plenty of memory, and no swapping. Is this typical performance? I pulled out
> any redundant calculations from the loop. Should I preprocess the data with
> perl and just have Tcl read that in and load up the arrays? Thanks for any
> help or guidance.

Find out the file size beforehand and use a read-command with the appropriate number of bytes.
I tried this with a 63MB File.
Using a normal read it started swapping after 90min having allocated several
hundreds of MB.
Using a read with the right number of bytes it allocated 63MB of memory and read
the whole file in a few minutes.

Greetings!
Volker