Parallel.ForEach in Tcl

Nicolas Robert

unread,

Sep 17, 2020, 1:45:14 PM9/17/20

to

A few days ago I needed to read a file (> 8gb) , I tried with Tcl but unfortunately a little long time...
I tried with this class Parallel.ForEach (https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?redirectedfrom=MSDN&view=netcore-3.1) in Vb.net and and this is really much improved.
It looks like this to those who know :
Sub Main(args As String())

Dim po As New ParallelOptions
po.MaxDegreeOfParallelism = System.Environment.ProcessorCount

Dim value As Integer = 0

Parallel.ForEach(File.ReadLines(args(0)), po, Sub(line)

If line.Contains(args(1)) Then
value = 1
End If

End Sub)

Console.WriteLine(value)

End Sub

Do you know if this could do in TCL ?

Nicolas

Gerald Lester

unread,

Sep 17, 2020, 5:40:19 PM9/17/20

to

How did you try to read the file in Tcl?

--
+----------------------------------------------------------------------+
| Gerald W. Lester, President, KNG Consulting LLC |
| Email: Gerald...@kng-consulting.net |
+----------------------------------------------------------------------+

Nicolas Robert

unread,

Sep 18, 2020, 1:41:48 AM9/18/20

to

> How did you try to read the file in Tcl?

like this :
set fp [open $file r]

while {[gets $fp line] != -1} {
if {[string match $what $line]} {
# ...
}
}

close $fp

heinrichmartin

unread,

Sep 18, 2020, 4:35:17 AM9/18/20

to

On Thursday, September 17, 2020 at 7:45:14 PM UTC+2, Nicolas Robert wrote:
> A few days ago I needed to read a file (> 8gb) , I tried with Tcl but unfortunately a little long time...
> I tried with this class Parallel.ForEach (https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?redirectedfrom=MSDN&view=netcore-3.1) in Vb.net and and this is really much improved.
> It looks like this to those who know :
> Sub Main(args As String())
>
> Dim po As New ParallelOptions
> po.MaxDegreeOfParallelism = System.Environment.ProcessorCount
>
> Dim value As Integer = 0
>
> Parallel.ForEach(File.ReadLines(args(0)), po, Sub(line)
>
> If line.Contains(args(1)) Then
> value = 1
> End If

Are you looking for "grep --files-with-matches --fixed-strings" (grep -lF (small L, capital F))? Someone more familiar with scripting on Windows might know an equivalent command there.

Ralf Fassel

unread,

Sep 18, 2020, 6:30:58 AM9/18/20

to

* Nicolas Robert <nicolasro...@gmail.com>

Well, processing >8GB this way *will* take some time...
Did you put that code into a proc so it gets at least byte-compiled?

And for the example code I would surely add a 'break' after the first
match is found if all that's done is 'set value 1'.

HTH
R'

Nicolas Robert

unread,

Sep 18, 2020, 6:51:06 AM9/18/20

to

Le vendredi 18 septembre 2020 à 12:30:58 UTC+2, Ralf Fassel a écrit :
> * Nicolas Robert <nicolasro...@gmail.com>
> | > How did you try to read the file in Tcl?
> >
> | like this :
> | set fp [open $file r]
> >
> | while {[gets $fp line] != -1} {
> | if {[string match $what $line]} {
> | # ...
> | }
> | }
> >
> | close $fp
> Well, processing >8GB this way *will* take some time...
> Did you put that code into a proc so it gets at least byte-compiled?

Yes in the proc

> And for the example code I would surely add a 'break' after the first
> match is found if all that's done is 'set value 1'.

You are right , but this 2 examples (no final code)

> Are you looking for "grep --files-with-matches --fixed-strings" (grep -lF (small L, capital F))? Someone more familiar with scripting on Windows might know an equivalent command there.

Didn't think about that...

Maybe my question wasn’t clear?
How to process lines in multiple threads in parallel ?

Nicolas

Gerald Lester

unread,

Sep 18, 2020, 8:18:13 AM9/18/20

to

Is it reading the file or processing the lines where you are spending
your time?

Are you doing something that could be better optimized in the processing
of the matching lines?

If it is really in processing the contents and you can't do anything
better there, then you want to look at the threads package -- likely a
thread pool.

But first put in some measurements to verify that you are in fact
optimizing the correct thing.

ted brown

unread,

Sep 18, 2020, 2:32:19 PM9/18/20

to

On 9/18/2020 3:51 AM, Nicolas Robert wrote:

> Maybe my question wasn’t clear?
> How to process lines in multiple threads in parallel ?
>

I would look into getting Ashok's book, "The TCL programming Language"
where his chapter 22 has some good examples of doing thread pools. I'd
buy the pdf version so you can just copy/paste his examples as a
starting point.

However, as others have said, you should first be sure that threads is
the solution to your performance problem. I would do a few simple things
first, like timing that 8gb file read twice, once with the string match
commented out. If it turns out it's the [gets] that's causing all the
time, I don't think threads would help much. I would also compare that
to some other utility that can read the large file. Is the file just a
large text file?

Arjen Markus

unread,

Sep 18, 2020, 4:00:01 PM9/18/20

to

It may be that if you read the file in large chunks and split up the chunks in lines yourself,
you get a better performance. This is merely speculation, but it would mean
that you less fewer I/O overhead - at the cost of more complex processing.

Regards,

Arjen

ted brown

unread,

Sep 18, 2020, 4:13:35 PM9/18/20

to

On 9/18/2020 12:59 PM, Arjen Markus wrote:
> It may be that if you read the file in large chunks and split up the chunks in lines yourself,
> you get a better performance. This is merely speculation, but it would mean
> that you less fewer I/O overhead - at the cost of more complex processing.
>

Good idea, using [read]. In fact, depending on what the OP is trying to
do, perhaps those [string match]'s could be done on the large chunks
directly, without first breaking up the chunk into lines. More info from
OP is needed.

heinrichmartin

unread,

Sep 18, 2020, 4:29:03 PM9/18/20

to

Maybe even match in a separate process (e.g. [open "| grep ..."] or [spawn grep ... ; expect ...]).

Alexandru

unread,

Sep 18, 2020, 6:47:46 PM9/18/20

to

I thought, I'll give a try implementing a procedure that reads blocks of lines from large files using "read".
The result is pretty disapointing, thogh maybe it's possible to write the procedures more efficient than I did.
I tested my procedure on a large ASCII file of 700 MB and block reading was 7 seconds, line by line reading was 9 seconds.

## Read a block of "nchars" from specified channel "fid".
# \param fid channel identifier
# \param nchars number of chars to read from channel
# \param mainblockname variable name that will store the read block
# \param secblockname helper variable name only for internal use
proc FileReadBlockwise {fid nchars mainblockname secblockname} {
upvar $mainblockname lines1
upvar $secblockname lines2
set lines [split [read $fid $nchars] \n]
lset lines 0 [concat $lines2 [lindex $lines 0]]
set lines2 [lindex $lines end]
set lines1 [lrange $lines 0 end-1]]
return
}
proc ReadLargeFile {path nchars} {
set data1 ""
set data2 ""
set f [open $path r]
set count 0
while {1} {
incr count
FileReadBlockwise $f $nchars data1 data2
if {[eof $f]} {
set data1 [concat $data1 $data2]
break
}
}
close $f
puts $count
}
proc ReadSmallFile {path} {
set f [open $path r]
while {[gets $f line] >= 0} {
}
close $f
}
set path "c:/largefile.txt"
puts [time {ReadLargeFile $path 100000} 1]
puts [time {ReadSmallFile $path} 1]

Uwe Klein

unread,

Sep 19, 2020, 3:02:58 PM9/19/20

to

this example uses [fileevent]

run with
script.tcl pattern <list of files>

#!/usr/bin/tclsh

set ::HitCount 0
set ::Tags [ list A B C D E F G H I J K L M]

puts stderr $argv

set ::SearchTerm [ lindex $argv 0 ]
set files [ lrange $argv 1 end ]

proc feventhandler {handle tag} {
if {[eof $handle]} {
close $handle
} else {
set line [gets $handle]

puts -nonewline stderr $tag
if {[string first $::SearchTerm $line] >= 0} {
incr ::HitCount
}
}
}

# open all files and activate the eventhandler:

foreach file $files tag $::Tags {
if { $file == "" } break
set fd [ open $file RDONLY ]
fconfigure $fd -buffering line
fileevent $fd readable [list feventhandler $fd $tag]

puts stderr "$fd :: $file"

}

# you need the vwait, otherwise tclsh will just exit at once.
# no eventloop active by default.

catch {vwait forever}

puts "HitCount: $::HitCount"
exit
# end

Uwe

ted brown

unread,

Sep 19, 2020, 9:37:44 PM9/19/20

to

On 9/18/2020 3:47 PM, Alexandru wrote:

> I thought, I'll give a try implementing a procedure that reads blocks of lines from large files using "read".
> The result is pretty disapointing, thogh maybe it's possible to write the procedures more efficient than I did.
> I tested my procedure on a large ASCII file of 700 MB and block reading was 7 seconds, line by line reading was 9 seconds.

I created a 10 million line, 68 chars a line + cr/lf, -> 700 mb. in a
ramdisk

I run your program on my 4ghz i7 4790k on windows 10
4.5 and 7 seconds for the [read] vs. [gets] using 32 bit tcl

just the [read $fid $nchars], (no split etc.) the 4.5 reduces to 2.2.
With just the [read] and the [split], it takes 3.4 seconds.
The 2.2 for just [read] becomes 2.0 using 64bit tcl

# Using some old cygwin unix programs takes: (exec'd from tcl)
# 2.8 secs for a md5sum.exe
# .6 secs for grep.exe -c (the only match is the last line)
# .8 secs wc -l (wordcount just count lines)
# 3.2 secs cp a b
# .4 secs [file copy a b]

So, looks like using grep is the winner so far as Heinrichmartin suggested.

Nicolas Robert

unread,

Sep 21, 2020, 11:59:19 AM9/21/20

to

Thank you all for your proposals.
Thanks Alexandru and Uwe for sharing your code. I will try to inspire myself...

Nicolas

Nicolas Robert

unread,

Sep 23, 2020, 1:18:54 PM9/23/20

to

Alexandru ,
I tried your code but without success... the data is incomplete... but I used your code to look for different solutions
"read" splitting by chunk size + coroutine seems a good solution.

below my file :
set fp [open "file" w+]

for {set index 0} {$index < 130000000} {incr index} {
puts $fp "line > $index [string repeat "x" 44]"
}
close $fp

For reading a file of ~8 gb line by line by chunk size of 4096 , I have a time of ~44 seconds. (vb.net ~18 seconds)
Can this code be optimized ? what do you think?
Threads package can be helpful ?

Nicolas

# tclsh readlargefile.tcl "file"

proc DataSizeRead {chan size} {
# Read file by chunk size
# default : 4096

yield [info coroutine]

while 1 {
set data [read $chan $size]
if {$data eq ""} {
break
}

yield $data
}
}

proc ProcessDataLine {data} {
# process data line...
foreach line $data {
if {$line eq ""} continue
# ...
}
}

proc ReadFile {file {size 4096}} {

set dataline ""
set currentdata ""

set fp [open $file r]

coroutine readlargefile DataSizeRead $fp $size

while {[llength [info commands readlargefile]]} {

set data [readlargefile]

# merge currentdata or not...
if {$dataline ne ""} {
set currentdata ${dataline}${data}
} else {
set currentdata $data
}

set lines [split $currentdata "\n"]

# check if currentdata is complete
if {[string match "*\n" $currentdata]} {
set dataline ""
} else {
set dataline [lindex $lines end]
set lines [lrange $lines 0 end-1]
}

ProcessDataLine $lines ; # process data lines...

}

# last line...
if {$dataline ne ""} {
set currentdata $dataline

if {$currentdata ne ""} {
set lines [split $currentdata "\n"]

ProcessDataLine $lines ; # process data lines...
}
}

close $fp

}

set file [lindex $argv 0]

set timestart [clock clicks -milliseconds]
ReadFile $file
puts [expr {[clock clicks -milliseconds] - $timestart}]

Rich

unread,

Sep 23, 2020, 2:14:14 PM9/23/20

to

Nicolas Robert <nicolasro...@gmail.com> wrote:
> proc ProcessDataLine {data} {
> # process data line...
> foreach line $data {

This will not do what you think it does.

You are passing a string to a command that expects a list. This will
cause Tcl to perform implicit string to list conversion, and the result
will not be what you expect (i.e., $data will not be broken into "lines").

Additionally, depending upon what characters are in the string, the
conversion will sometimes work, and sometimes will abort with an error
(note that this abort with an error usually occurs after three months
of usage, after you've long forgotten much of what you wrote, when just
the right character pattern arrives).

What you really want here is something like this:

set lines [split $data \n]
foreach line [lrange $lines 0 end-1] {
...
}
# do something here with the partial line that us usually left at
# [lindex $lines end] because reading by 4k chunks does not always
# align with "line boundaries".

Nicolas Robert

unread,

Sep 23, 2020, 4:21:33 PM9/23/20

to

Rich ,

Sorry I don’t understand that you write

> You are passing a string to a command that expects a list. This will
> cause Tcl to perform implicit string to list conversion, and the result
> will not be what you expect (i.e., $data will not be broken into "lines").
>

What is the difference between this :

set lines [split $currentdata "\n"]

# check if currentdata is complete
if {[string match "*\n" $currentdata]} {
set dataline ""
} else {
set dataline [lindex $lines end]
set lines [lrange $lines 0 end-1]
}

ProcessDataLine $lines ; # process data lines...

and what you wrote :

> set lines [split $data \n]
> foreach line [lrange $lines 0 end-1] {
> ...
> }

> Additionally, depending upon what characters are in the string, the
> conversion will sometimes work, and sometimes will abort with an error
> (note that this abort with an error usually occurs after three months
> of usage, after you've long forgotten much of what you wrote, when just
> the right character pattern arrives).
>

I agree with you.
I have tried with a file where this type of characters is present and it works

line > 660468 åé,ààbØ<eÛeééééé =?a/Ø*é)§kj&klj´cklj,/e*d
line > 660469 kljbcÛ)/,ééééé/åe>/d<fbfÁa898bfkljÁÁÁ>a/
line > 660470 +é@§kjdklj,898@+f,/.b =/Á§kj&>c#+bcf/àà+
line > 660471 =´åc<Á+kljÛØakljeØ<éØ898c´§kj,bc =?))´@
line > 660472 *,&ààÁ§kj§kjØ898&é>ààÁ.&e<àà? =898@åbå).Û<

Can you show me a sequence of characters that will be not readable.

> # do something here with the partial line that us usually left at
> # [lindex $lines end] because reading by 4k chunks does not always
> # align with "line boundaries".

if {[string match "*\n" $currentdata]} {

...

It's not enough if I check like this ?

Nicolas

Rich

unread,

Sep 23, 2020, 5:04:41 PM9/23/20

to

Nicolas Robert <nicolasro...@gmail.com> wrote:
>
> Rich ,
>
> Sorry I don?t understand that you write

>
>> You are passing a string to a command that expects a list. This will
>> cause Tcl to perform implicit string to list conversion, and the result
>> will not be what you expect (i.e., $data will not be broken into "lines").
>>
>
> What is the difference between this :
>
> set lines [split $currentdata "\n"]

The fact that I didn't read far enough into your code sample..... Sorry
for the confusion.

I miss-interpreted it as ultimately passing the "data" which came from
[read] on to the foreach ... $data. You've got the splitting into
lines in the final proc where you also handle the fact that a block
read does not always align to a line boundary.

> Can you show me a sequence of characters that will be not readable.

No, because it looks like you did the right thing, and I just jumped to
an incorrect conclusion before reaing through your entire code example.

>> # do something here with the partial line that us usually left at
>> # [lindex $lines end] because reading by 4k chunks does not always
>> # align with "line boundaries".
>
> if {[string match "*\n" $currentdata]} {
> ...
>
> It's not enough if I check like this ?

No, that works. But the result of doing that will be at least two
scans through the data which has been read in. The first scan to find
the last \n character. Then a second scan to split it up into lines
via the [split] command you do have present.

If you read in the block, and simply split the block on \n, you get a
list of lines with one scan pass over the block (and it is the one scan
pass that is required for your use as well) with either a partial line
or an empty string in the final element of the resulting list.

You can then iterate over list elements 0 to end-1 (which is fast to do
in Tcl 8+ because lists are O(1) access time), which will processs all
the whole lines that were in the block, and list element "end" will
either be a partial line or an empty string, both of which could simply
be prepended to the next block (i.e. either via "append partial [read
... 4096]" or as "set block $partial[read ... 4096]". Or if you did
not want yet another temporary variable as: "set block [lindex $data
end][read ... 4096]"

And you may (if you decide to make this change) need to add a check for
the last list element being not an empty string after you drop out of
your main loop if you want to handle a last partial line that is
lacking a terminating newline character.

Ralf Fassel

unread,

Sep 24, 2020, 10:01:41 AM9/24/20

to

* Nicolas Robert <nicolasro...@gmail.com>

| For reading a file of ~8 gb line by line by chunk size of 4096 , I
| have a time of ~44 seconds. (vb.net ~18 seconds)
| Can this code be optimized ? what do you think?

Using 'string index' instead of 'string match' for the EOL-check and
simply reusing $dataline instead of checking for empty gives me a
speedup of ~20%. Simply using read without coroutines is slightly faster
than using coroutines.

Data are on SSD. If using a regular HDD, times are roughly doubled.

---------------------------
Your version, unmodified:
% tclsh $TMPDIR/xxx/readlargefile.tcl "file"
51438

---------------------------
Your version, modified:
@@ -36,16 +36,12 @@

set data [readlargefile]

# merge currentdata or not...

- if {$dataline ne ""} {
- set currentdata ${dataline}${data}
- } else {
- set currentdata $data
- }
+ set currentdata ${dataline}${data}

set lines [split $currentdata "\n"]

# check if currentdata is complete

- if {[string match "*\n" $currentdata]} {
+ if {[string index $currentdata end] eq "\n"} {

set dataline ""
} else {
set dataline [lindex $lines end]

% tclsh8.6 readlargefile.tcl "file"
40233
% tclsh8.6 readlargefile.tcl "file"
40309
---------------------------
No coroutine:
@@ -29,23 +13,20 @@

set fp [open $file r]

- coroutine readlargefile DataSizeRead $fp $size
-
- while {[llength [info commands readlargefile]]} {
+ while {1} {

- set data [readlargefile]
+ set data [read $fp $size]
+ if {$data eq ""} {
+ break
+ }

# merge currentdata or not...

- if {$dataline ne ""} {
- set currentdata ${dataline}${data}
- } else {
- set currentdata $data
- }
+ set currentdata ${dataline}${data}

set lines [split $currentdata "\n"]

# check if currentdata is complete

- if {[string match "*\n" $currentdata]} {
+ if {[string index $currentdata end] eq "\n"} {

set dataline ""
} else {
set dataline [lindex $lines end]

% tclsh8.6 readlargefile3.tcl "file"
38605
% tclsh8.6 readlargefile3.tcl "file"
38900
---------------------------

Using 'gets' directly:
while {[gets $fp line] >= 0} {
ProcessSingleDataLine $line
}
% tclsh readlargefile2.tcl "file"
120038

HTH
R'

Nicolas Robert

unread,

Sep 25, 2020, 7:14:04 AM9/25/20

to

> Using 'string index' instead of 'string match' for the EOL-check and
> simply reusing $dataline instead of checking for empty gives me a
> speedup of ~20%. Simply using read without coroutines is slightly faster
> than using coroutines.
>

Thanks Ralf
I have a question
If my chunk size is 1 character or 4096 the time is the same... do you know why ?

Nicolas

Ralf Fassel

unread,

Sep 25, 2020, 7:35:24 AM9/25/20

to

* Nicolas Robert <nicolasro...@gmail.com>

| If my chunk size is 1 character or 4096 the time is the same... do you know why ?

Look again, "this can't happen¹". I get a massive slow down reading 1 char at
a time, and I would say this is to be expected: repeatedly calling a tcl
proc (4096 times vs 1 time) should definitely slow down the whole process,

R'
---
¹ http://www.catb.org/jargon/html/C/can-t-happen.html

Nicolas Robert

unread,

Sep 26, 2020, 2:22:54 AM9/26/20

to

Forgotten this question I’m stupid!

Nicolas

Schelte

unread,

Sep 26, 2020, 10:36:07 AM9/26/20

to

On Fri, 18 Sep 2020 15:47:43 -0700, Alexandru wrote:
> I thought, I'll give a try implementing a procedure that reads blocks of
> lines from large files using "read".
> The result is pretty disapointing, thogh maybe it's possible to write
> the procedures more efficient than I did.
> I tested my procedure on a large ASCII file of 700 MB and block reading
> was 7 seconds, line by line reading was 9 seconds.
>

Your code for reading the file in blocks is buggy. You use [concat] to
join two strings. But [concat] is defined to join lists. As a result, you
get an extra space in the lines that run across block boundaries.

But worse is your command "set lines1 [lrange $lines 0 end-1]]". There's
one ] too many. This means the list [lrange $lines 0 end-1] is shimmered
to a string to be able to add that ] at the end. That makes the block
reading appear much slower than it should be.

Your block reading code is also very complicated with all the juggling of
partial lines. I would code it as:

proc ReadLargeFile {path nchars} {

set f [open $path r]

while {![eof $f]} {
set data [read $f $nchars]
append data [gets $f]
foreach line [split $data \n] {
# Process the line
}
}
close $f
}

This reads until the first newline after the specified blocksize, making
things a lot simpler.

Schelte.

ted brown

unread,

Sep 26, 2020, 2:23:51 PM9/26/20

to

good catch, that does speed it up, but my timing for just the [read]
is still 2/3 the total time and is ~ 5x what it takes for grep.exe alone
up-ing the read size by 10x reduces the time for just the [read]
by 1/3, so something must be pretty costly in the [read] code

Rich

unread,

Sep 26, 2020, 2:51:51 PM9/26/20

to

ted brown <tedbr...@gmail.com> wrote:
> is ~ 5x what it takes for grep.exe alone

Be careful comparing against grep. Depending upon exactly which grep
version you are using, the grep source has been *highly* optimized for
speed:

https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

The whole article is a good read, but here's the quick summary:

Anyway, just FYI, here's a quick summary of where GNU grep gets its
speed. Hopefully you can carry these ideas over to BSD grep.

#1 trick: GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT
BYTE.

#2 trick: GNU grep is fast because it EXECUTES VERY FEW
INSTRUCTIONS FOR EACH BYTE that it *does* look at.

You'll have a *very* hard time approaching grep's performance using Tcl
code alone, because just the overhead of being a scripting language is
going to start you out a fair amount behind grep performance wise
before you code anything in Tcl.

ted brown

unread,

Sep 26, 2020, 3:21:15 PM9/26/20

to

I've even done a single complete 700mb [read] using 64bit tcl and it's
still slower than grep (and actually a bit slower than [read] with say,
50k blocks). I would think that while grep might be able to avoid
"looking" at every byte, it would still need to "read" in every byte.

I found that a [file copy a b] is faster than the grep, and yet [read]
with nothing else should be faster still, I would think, unless [read]
has to do something with memory or whatever that takes a lot of time.
And in my test, I didn't even store the result of the [read] and made
sure to not leave it for the proc return (did a return "").

A few calls to [read] should be mostly some C code, so the scripting
language shouldn't be an issue, except for what it needs to do to store
the memory somewhere.

Rich

unread,

Sep 26, 2020, 3:53:31 PM9/26/20

to

ted brown <tedbr...@gmail.com> wrote:
> On 9/26/2020 11:51 AM, Rich wrote:
>> ted brown <tedbr...@gmail.com> wrote:
>>> is ~ 5x what it takes for grep.exe alone
>>
>> Be careful comparing against grep. Depending upon exactly which grep
>> version you are using, the grep source has been *highly* optimized for
>> speed:
>>
>> https://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html
>>
>> The whole article is a good read, but here's the quick summary:
>>
>> Anyway, just FYI, here's a quick summary of where GNU grep gets its
>> speed. Hopefully you can carry these ideas over to BSD grep.
>>
>> #1 trick: GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT
>> BYTE.
>>
>> #2 trick: GNU grep is fast because it EXECUTES VERY FEW
>> INSTRUCTIONS FOR EACH BYTE that it *does* look at.
>>
>> You'll have a *very* hard time approaching grep's performance using Tcl
>> code alone, because just the overhead of being a scripting language is
>> going to start you out a fair amount behind grep performance wise
>> before you code anything in Tcl.
>>
> I've even done a single complete 700mb [read] using 64bit tcl and it's
> still slower than grep (and actually a bit slower than [read] with say,
> 50k blocks). I would think that while grep might be able to avoid
> "looking" at every byte, it would still need to "read" in every byte.

Nope, if you read the article, you also find that grep is mmaping the
file, which means it does not actually read every byte in from the file
either.

> I found that a [file copy a b] is faster than the grep, and yet [read]
> with nothing else should be faster still, I would think, unless [read]
> has to do something with memory or whatever that takes a lot of time.
> And in my test, I didn't even store the result of the [read] and made
> sure to not leave it for the proc return (did a return "").

Unless the optimizer has gotten really smart, a read to nowhere (not
stored in a variable) still has to create a Tcl_Obj, which then simply
gets deallocated when the proc returns.

Plus, I'd have to check the sources, but read, without a byte value,
might be reallocating the buffer space as it reads more data, which
will often incur a memory copy plus the allocation overhead.

> A few calls to [read] should be mostly some C code, so the scripting
> language shouldn't be an issue, except for what it needs to do to store
> the memory somewhere.

Yes, but Tcl's "memory" system is a whole lot more complicated than a
basic C read loop. And if read does reallocations and memory copies
for a call without an explicit size, that's likely the cause of a
portion of the difference in performance.

If you are reading from a file (and have enough ram to hold the entire
file) then try this for a read performance test:

set fd [open bigfile {RDONLY}]
set data [read $fd [file size bigfile]]

And see how it performs.

ted brown

unread,

Sep 26, 2020, 5:06:34 PM9/26/20

to

On 9/26/2020 12:53 PM, Rich wrote:
>
> Nope, if you read the article, you also find that grep is mmaping the
> file, which means it does not actually read every byte in from the file
> either.

The trick they use to avoid scanning every byte is to be able to skip,
but if your pattern is "foobar" how can you skip over more than 6-7
bytes without possibly missing a match. So, you'll end up touching every
page (just not every byte on every page) and so it would need to be
read-faulted in even if done via mmap.

Their use of mmap is mostly so they can do unbuffered input, i.e. the
kernel can read the data right into the user's memory rather than into a
kernel buffer and then copy to the user memory.

My use of grep was to sorta get a minimum time for just reading through
a file. On my system, I use a ramdisk and I don't think it does file
caching for the ramdisk, but I'm not positive.

I then read that same file from the ramdisk using [read] and that's
where I got my timings from which are more than 5x longer.

>

>
> Unless the optimizer has gotten really smart, a read to nowhere (not
> stored in a variable) still has to create a Tcl_Obj, which then simply
> gets deallocated when the proc returns.

My thinking is also that the cost to do the [read] is mostly somewhere
in the tcl object code plus memory allocation.

> Yes, but Tcl's "memory" system is a whole lot more complicated than a
> basic C read loop. And if read does reallocations and memory copies
> for a call without an explicit size, that's likely the cause of a
> portion of the difference in performance.

So it probably isn't something in [read] itself, but the memory
operations it needs to do by calling something else.

>
> If you are reading from a file (and have enough ram to hold the entire
> file) then try this for a read performance test:
>
> set fd [open bigfile {RDONLY}]
> set data [read $fd [file size bigfile]]

Same as my previous tests, however, it needs 64 bit tcl to work. Took
2.5 secs for a 700 mb file.

Schelte

unread,

Sep 26, 2020, 6:36:05 PM9/26/20

to

On Sat, 26 Sep 2020 12:21:09 -0700, ted brown wrote:
> I found that a [file copy a b] is faster than the grep, and yet [read]
> with nothing else should be faster still, I would think, unless [read]
> has to do something with memory or whatever that takes a lot of time.
> And in my test, I didn't even store the result of the [read] and made
> sure to not leave it for the proc return (did a return "").

In the test procs, translation is not specifically set, so it is left at
the default of "auto". That means that [read] needs to look for \r, \n,
and \r\n and turn that into \n. I suspect that setting translation to
binary (or opening the files as rb) will show another speed increase. But
it's too late in the day for me to test how significantly that increase
is.

Schelte.

ted brown

unread,

Sep 26, 2020, 9:08:24 PM9/26/20

to

Wow, now that's interesting. I never knew that about [read]. It might
also explain why I can't read that entire 700mb file in one [read] using
32 bit tcl. It gets an error and a dialog pops up saying,

----------------
Fatal Error

unable to realloc 529,126,623 bytes
----------------
(my commas)

However, doing,

set fd [open $path {RDONLY BINARY}]

it still gets that error under 32 bit tcl.

However, under 64 bit tcl, it works and drops from the ~ 2.7 secs time
to only 1.1 so that's a nice improvement and getting closer to the
theoretical minimum time.

And also, doing a [string length $data] in the 2 cases returns a
difference exactly the number of lines in the file, so it did remove all
the cr's when set to binary.

ted brown

unread,

Sep 26, 2020, 9:13:54 PM9/26/20

to

On 9/26/2020 6:08 PM, ted brown wrote:

> And also, doing a [string length $data] in the 2 cases returns a
> difference exactly the number of lines in the file, so it did remove all
> the cr's when set to binary.
>
>

Think I stated that backwards. With binary, it should keep the cr's.

Christian Gollwitzer

unread,

Sep 27, 2020, 1:10:46 AM9/27/20

to

Am 27.09.20 um 03:13 schrieb ted brown:

It's not only the \r\n expansion. If you read the file as binary, then
Tcl also performs conversion to Unicode, depending upon the encoding of
the file. The binary flag to read is equivalent to

fconfigure $fd -encoding binary -translation binary

So this, of course, will change what your program does, not only the speed.

Christian

Christian Gollwitzer

unread,

Sep 27, 2020, 3:28:07 AM9/27/20

to

Am 27.09.20 um 07:10 schrieb Christian Gollwitzer:

> If you read the file as binary, then
> Tcl also performs conversion to Unicode,

of course that should have been "if you read the file as text, ..." or
"if you don't read it as binary"

Christian

Uwe Klein

unread,

Sep 27, 2020, 4:18:21 AM9/27/20

to

Am 26.09.2020 um 21:53 schrieb Rich:
> Nope, if you read the article, you also find that grep is mmaping the
> file, which means it does not actually read every byte in from the file
> either.

That IMHO is a bit of a misconception.

mmap creates a mapping of hd space of a file to memory.
( If you want to look at all bytes in the end all those bytes have been
retrieved from diskspace.)

That memory accessed creates a "fill from file" exception for that block.
What is moved to system activity and loses quite a bit of overhead is
the work to move bytes from disk to structured ( mmap: flat) memory
storage.

I seem to remember that there were some mmap packages around for tcl.

Uwe

Uwe Klein

unread,

Sep 27, 2020, 4:26:13 AM9/27/20

to

Am 27.09.2020 um 03:08 schrieb ted brown:
> On 9/26/2020 3:35 PM, Schelte wrote:
>> On Sat, 26 Sep 2020 12:21:09 -0700, ted brown wrote:
>>> I found that a [file copy a b] is faster than the grep, and yet [read]
>>> with nothing else should be faster still, I would think, unless [read]
>>> has to do something with memory or whatever that takes a lot of time.
>>> And in my test, I didn't even store the result of the [read] and made
>>> sure to not leave it for the proc return (did a return "").
>>
>> In the test procs, translation is not specifically set, so it is left at
>> the default of "auto". That means that [read] needs to look for \r, \n,
>> and \r\n and turn that into \n. I suspect that setting translation to
>> binary (or opening the files as rb) will show another speed increase. But
>> it's too late in the day for me to test how significantly that increase
>> is.
>>
>>
>> Schelte.
>>
>
> Wow, now that's interesting. I never knew that about [read]. It might
> also explain why I can't read that entire 700mb file in one [read] using
> 32 bit tcl. It gets an error and a dialog pops up saying,
>
> ----------------
> Fatal Error
>
> unable to realloc 529,126,623 bytes
> ----------------
> (my commas)
>

Hmm that looks like allocatable memory is limited.

you read into a tcl variable.
that variable ( content) grows, and grows, ...
until the initially allocated memory is depleted.

next step realloc for a bigger block. rinse, repeat.

your error happens when this realloc does not find
a suitable "onepiece" block of memory.

IMU tcl does not use "chunking" of variable content.

Uwe

ted brown

unread,

Sep 27, 2020, 4:59:14 AM9/27/20

to

On 9/27/2020 1:26 AM, Uwe Klein wrote:

> you read into a tcl variable.
> that variable ( content) grows, and grows, ...
> until the initially allocated memory is depleted.
>
> next step realloc for a bigger block. rinse, repeat.
>
> your error happens when this realloc does not find
> a suitable "onepiece" block of memory.
>

The test was with a single,

set data [read $fd [file size $path]]

So, data is only getting set once. How [read] might be breaking up
things internally to provide the return value of the whole file, I don't
know. Maybe it's the one that is needing to grow a bigger buffer.

But it's not the translation and encoding, now that it's all binary.

Christian wrote:

> fconfigure $fd -encoding binary -translation binary

It would appear we get that automatically with [open] and {RDONLY BINARY}.

I went back and tested Schelte's method of mixing [read] and [gets] with
an [append], and there's no detectable difference from [read] alone.

The sweet spot for block size was 40k-200k for my 700mb file, which then
takes only .6 secs or half the time of reading the whole file in a
single slurp.

But just splitting that 700mb, however, and we're back up to 2.4 seconds
and we haven't even searched the input yet.

Looking at that paper on grep tricks linked to by Rich, a couple of its
ideas might be useful.

1. They don't look for the newlines in a block, which for us would mean
don't do the split.
2. Only if there's a match in a block, go back and look for the line
boundaries.
3. Use Boyer–Moore algorithm for speed, we can do [string first].

So, I get 2.5s on my file with [string first] which has only 1 match in
the last line. If all you want to know is a count of matches (don't need
line numbers etc.) then the whole job is 2.5s vs .6 for grep --count.
Not too bad.

ted brown

unread,

Sep 27, 2020, 5:15:12 AM9/27/20

to

On 9/27/2020 1:59 AM, ted brown wrote:

> I went back and tested Schelte's method of mixing [read] and [gets] with
> an [append], and there's no detectable difference from [read] alone.
>

No "timing" difference that is, so the mixing and appending cost next to
nothing and is a much easier method to code. (I never would have thought
you could mix them, great tip).

Uwe Klein

unread,

Sep 27, 2020, 7:01:21 AM9/27/20

to

Am 27.09.2020 um 10:59 schrieb ted brown:
> On 9/27/2020 1:26 AM, Uwe Klein wrote:
>
>> you read into a tcl variable.
>> that variable ( content) grows, and grows, ...
>> until the initially allocated memory is depleted.
>>
>> next step realloc for a bigger block. rinse, repeat.
>>
>> your error happens when this realloc does not find
>> a suitable "onepiece" block of memory.
>>
>
> The test was with a single,
>
> set data [read $fd [file size $path]]
>
> So, data is only getting set once. How [read] might be breaking up
> things internally to provide the return value of the whole file, I don't
> know. Maybe it's the one that is needing to grow a bigger buffer.

file size is not known.

tcl allocates a "reasonable" block to store read data.
while busily transfering data
tcl notices that this block is now full and needs
to be reallaocated to a larger block.

This repeats until EOF is seen and the final size
is apparent.

Uwe

ted brown

unread,

Sep 27, 2020, 12:14:32 PM9/27/20

to

That's interesting, I bet it explains why its slower to [read] the
whole file.

When trying larger [read] sizes, it gradually sped up until it got to
about 200k bytes, and then started slowing down. I wonder if that
"reasonable" value is fixed, and what it is. Keeping under that should
avoid any reallocates.

And I guess that means it has to both allocate a bigger buffer and then
copy the current buffer over to it so it's all contiguous. I should
look at the code for [read].

Rich

unread,

Sep 27, 2020, 12:46:18 PM9/27/20

to

Uwe Klein <u...@klein-habertwedt.de> wrote:
> Am 26.09.2020 um 21:53 schrieb Rich:
>> Nope, if you read the article, you also find that grep is mmaping the
>> file, which means it does not actually read every byte in from the file
>> either.
>
> That IMHO is a bit of a misconception.

The point of the referenced article is that grep is as fast as it is by
*not* examining every byte in the file. I.e., it uses a search
algorithm that allows it to skip over bytes while still finding what it
is looking for.

If the search it is performing is such that it can skip over a page of
bytes without having to look at them, then that page is never read into
memory from disk.

Now, yes, in most uses, it is not going to be able to skip over enough
bytes to miss touching a page somewhere, so in most usage, the file
will be paged in. But, mmap is often faster than read in many
instances because it avoids reading into the kernel disk cache, then
copying the memory from kernel disk cache to a stdlib buffer, then
further copying the data from the stdlib buffer into a buffer in the
grep process. The kernel disk cache memory page simply "appears"
within the memory of the process, with no copying (other than the DMA
from disk into the disk cache page).

And even if one uses libc's raw low-level read that avoids stdlib
buffers, there is still the copy from kernel memory to process memory
that mmap avoids.

> mmap creates a mapping of hd space of a file to memory.
> ( If you want to look at all bytes in the end all those bytes have been
> retrieved from diskspace.)

But that is just what the grep article I linked to said, grep does not
actually look at all the bytes. It skips over (i.e., does not look at)
a large number of them.

> I seem to remember that there were some mmap packages around for tcl.

There was, I'm not sure of its maintence status.

Uwe Klein

unread,

Sep 27, 2020, 12:54:14 PM9/27/20

to

Am 27.09.2020 um 18:14 schrieb ted brown:
> And I guess that means it has to both allocate a bigger buffer and then
> copy the current buffer over to it so it's all contiguous. I should
> look at the code for [read].
>
>
>

Use gdb and break on realloc.( and try to catch the initial allocaction.
this may be burried in tcl. I seem to remember that tcl layers something
on its own over allocation from the system. )

See what happens.

Uwe

Christian Gollwitzer

unread,

Sep 27, 2020, 1:37:11 PM9/27/20

to

Am 27.09.20 um 18:54 schrieb Uwe Klein:

> Use gdb and break on realloc.( and try to catch the initial allocaction.
> this may be burried in tcl. I seem to remember that tcl layers something
> on its own over allocation from the system. )

If you recompile Tcl with -DPURIFY, it uses the standard allocation
functions (calloc, malloc, realloc) - this is a debug feature, because
otherwise memory debug programs like valgrind won't work. It reduces the
speed by roughly 2x.

Christian

Nicolas Robert

unread,

Oct 3, 2020, 6:43:13 AM10/3/20

to

I tried to use the package Thread as suggested to me. tpool::post seems good.
I feel and I see it with the time is increased (x3).My feeling is that my code is not being processed in parallel at all
Maybe I’m not in the right direction ?

below :

proc DataSizeRead {chan size} {

yield [info coroutine]

while 1 {
set data [read $chan $size]
if {$data eq ""} {
break
}

yield $data
}
}

proc ReadFile {file {size 8000}} {

set dataline ""
set currentdata ""
set tpoollist {}

package req Thread

set tp [tpool::create -maxworkers 8 -initcmd {
proc Processline {data} {
# process data line...
foreach line $data {
if {$line eq ""} continue
#...
}
}
}]

set fp [open $file r]

coroutine readlargefile DataSizeRead $fp $size

while {[llength [info commands readlargefile]]} {

set data [readlargefile]

# merge currentdata...

set currentdata ${dataline}${data}

set lines [split $currentdata "\n"]

# check if currentdata is complete

if {[string index $currentdata end] eq "\n"} {
set dataline ""
} else {
set dataline [lindex $lines end]

set lines [lrange $lines 0 end-1]
}

tpool::post -detached -nowait $tp [list Processline $lines]

}

# last line...
if {$dataline ne ""} {
set lines [split $dataline "\n"]
tpool::post -detached -nowait $tp [list Processline $lines]
}

close $fp

tpool::release $tp

}

set size 10000000
set file [lindex $argv 0]

set TIME_start [clock clicks -milliseconds]
ReadFile $file $size
puts [expr {[clock clicks -milliseconds] - $TIME_start}]

Nicolas

stefan

unread,

Oct 7, 2020, 5:34:32 AM10/7/20

to

Hi Nicolas!

> Do you know if this could do in TCL ?

I am not fully aware how Parallel.ForEach is implemented in .NET exactly, but given that the threading model etc. is fundamentally different to Tcl's, there won't be an equivalent. Besides, you never posted the time measurement of your .NET implementation, for comparison? What kind of improvement did you measure when compared to Tcl's straightforward variant using while/gets?

That said, I was curious, and came up with the below implementation on top of Thread's tpool:

Bottom line is, for a file size of ~700MB [*] and the workload of counting lines, I see on my MBP 15" 2019 times around ~2s. The more work (time) is actually done (spent) when processing each chunk, the more beneficial this design will become. But, as always, everyone should run oneself's measurement.

The main ideas underlying my implementation are:

o Hand out the [read] step also to the worker threads, as chunks, rather than blocking the scheduling thread itself.
o Worker threads follow a leader/follower pattern so that the one file channel can be safely passed around. To pass between a leader and a follower (via the scheduler), [chan pipe]s are used.
o In a second step, once passed on the file channel to a follower, the leader continues by processing the respective chunk.

-------------------%<-------------------
package require Thread

namespace eval ::par {
proc schedule {tp fh chunksize} {

variable jobs
variable done

lassign [chan pipe] pr pw

thread::detach $pw

set handler [list {pr job fh tp chunksize} {
variable jobs
variable done
set result [chan read $pr]
### 1: EOF signals that a job has finished reading a
### chunk, and released the file channel
if {[chan eof $pr]} {
chan close $pr
thread::attach $fh
### 2: Has the file been completely consumed?
if {![chan eof $fh]} {
## 2a: NO, schedule the next job.
thread::detach $fh
lappend jobs [schedule $tp $fh $chunksize]
} else {
## 2b: YES, signal the overall job schedule is complete.
set done 1
}
} else {
# puts $result
}
} [namespace current]]

chan configure $pr -blocking 0

set job [tpool::post -nowait $tp [list apply {{fh pw chunksize} {
### 1: Read the chunk
thread::attach $fh
thread::attach $pw
set data [chan read $fh $chunksize]
append data [chan gets $fh]
# set offset [chan tell $fh]
thread::detach $fh
### 2: Release the file channel to a follower job, to be scheduled next
chan close $pw
### 3: Process the chunk, the more time is spent here, the more any par pays off.
llength [split $data \n]
}} $fh $pw $chunksize]]

chan event $pr readable [list apply $handler $pr $job $fh $tp $chunksize]
return $job
}

proc process {file workers {chunks ""}} {

variable jobs
variable done

if {$chunks eq ""} {
set chunks $workers
}

set tp [tpool::create -minworkers $workers -maxworkers $workers]
set fsize [file size $file]

set chunksize [expr {int(ceil($fsize/$chunks))}]

set fh [open $file r]
thread::detach $fh

lappend jobs [schedule $tp $fh $chunksize]

### 1: Join point: Have all jobs been scheduled?
vwait [namespace current]::done

set results [list]
### 2: Join point: Have all jobs been completed?
while {[llength $jobs]} {
set ready [tpool::wait $tp $jobs jobs]
foreach r $ready {
lappend results [tpool::get $tp $r]
}
}
catch {chan close $fh}
unset -nocomplain jobs
unset -nocomplain done
return [tcl::mathop::+ {*}$results]
}
}

::par::process "data.txt" 12
-------------------%<-------------------

HTH, Stefan

[*] Created by executing:
base64 -b 68 /dev/urandom | head -c 734003200 > data.txt

stefan

unread,

Oct 7, 2020, 5:37:27 AM10/7/20

to

> My feeling is that my code is not being processed in parallel at all

This feeling is right: You would have to enter the Tcl event loop at some point, e.g., using tpool::wait, otherwise your program just exits without waiting for the jobs to complete. See my implementation.

Stefan

stefan

unread,

Oct 7, 2020, 5:44:39 AM10/7/20

to

> -------------------%<-------------------
> package require Thread

Damn it, this ended up being posted unreadable. I created a gist:

https://gist.github.com/mrcalvin/aaf8af0c208b1da5204d525b5763871c

Stefan

dave bruchie

unread,

Oct 8, 2020, 3:13:35 AM10/8/20

to

Another example.

The example (proc test at bottom) counts lines from over 1Gb random data in about 1.4 sec with 8 child threads (see example) on a 4 core hyperthreaded processor, about 7 seconds using a single child thread. Things are a lot slower if the file is not binary (note the [fconfigure $fp -translation binary] in the example test proc at the bottom).

# used to get the event loop running
# could also use thread::wait, vwait ...
package require Tk
package require Thread

proc mkFile {file size} {
# generate a file of random bytes
set fp [open $file w]
fconfigure $fp -translation binary
for {set i 0} {$i < $size} {incr i 4} {
puts -nonewline $fp [binary format I [expr {0xFFFFFFFF&int(2**32*rand())}]]
}
close $fp
}

# the main routine controlling the threads
proc procBlocksOfLines {fp procFunc accResultsFunc reportScript size {numThreads 4}} {
# set up threads and process a file as blocks of complete lines
# the trailing end-of-line is removed from the last line of each block

# setup internal state
variable stuff
# file handle
set stuff(fp) $fp
# minimum block size (char)
set stuff(size) $size
# func to accumulate results in calling thread
set stuff(accResults) $accResultsFunc
# script to generate report in calling thread
set stuff(reportResults) $reportScript
# setup threads
for {set i 0} {$i < $numThreads} {incr i} {
# get a list of threads ready to process blocks
lappend stuff(ready) [thread::create [format {
# function to process a block of lines using procFunc (runs in child threads)
proc procBlockOfLines {cid fp size} {
# read a block, append more data to the next end of line
set data [read $fp $size][gets $fp]
# return fp to controller
thread::transfer $cid $fp
thread::send -async $cid [list doneWithFP]
# process block and return results to calling thread
thread::send -async $cid [list blockResults [thread::id] [apply {%s} $data]]}
thread::wait
} $procFunc]]
}
# used to detect when all blocks are processed
set stuff(numActiveThreads) $i
# calling thread starts with the file pointer
set stuff(fpInUse) 0
# process first block
sendBlock
}

proc doneWithFP {} {
# notifies calling thread a child thread has returned the fp
variable stuff
set stuff(fpInUse) 0
if {0 < [llength $stuff(ready)]} {
# at least one child thread is ready, and fp is unused
sendBlock
}
}

proc blockResults {tid data} {
# processes results from a child thread using supplied accResultsFunc
# also generates report when processing is complete
variable stuff
# start processing next block
lappend stuff(ready) $tid
if {!$stuff(fpInUse)} {
# fp is not in use, and a child thread is available
# also updates number of active child threads (to detect processing is complete)
sendBlock
}
# accumulate results
apply $stuff(accResults) $data
# if processing is complete
if {0 == $stuff(numActiveThreads)} {
# generate report using reportScript
eval $stuff(reportResults)
}
}

proc sendBlock {} {
# send a block to a child thread
# assumes at least one child thread is ready, and the fp is not in use by a child thread
variable stuff

if {"" ne $stuff(fp)} {
if {![eof $stuff(fp)]} {
# unprocessed data remains
# send fp to a thread to read a block
set stuff(fpInUse) 1
set tid [lpop stuff(ready)]
thread::transfer $tid $stuff(fp)
thread::send -async $tid "procBlockOfLines [thread::id] $stuff(fp) $stuff(size)"
return
} else {
# done with file
close $stuff(fp)
# mark file as closed
set stuff(fp) ""
# fall through to clean up ready threads
}
}
# file is closed
# release any ready threads
foreach id $stuff(ready) {
thread::release $id
# when no threads are active, processing is complete
incr stuff(numActiveThreads) -1
}
set stuff(ready) [list]
}

proc test {} {
# array for report data
variable results
set results(total) 0
# save start time for report
set results(startTime) [clock milliseconds]
# setup file to process (over 1 Gb of random bytes in my case)
set fp [open tmp.txt r]
# pick binary to minimize processing in [read] and [gets]
fconfigure $fp -translation binary
# process blocks of characters containing complete lines given:
# channel (fp)
# apply style func to count lines, using one string input, returns data for following func
# apply style func to accumulate total in results(total) from above block processing func
# script to report total lines and elapsed time (using data in results array)
# string block minimum size
# number of threads (default 4)
procBlocksOfLines $fp \
{{data} {
llength [split $data \n]
}} \
{{num} {
variable results
incr results(total) $num
}} {
variable results
puts "total: $results(total) in [expr {[clock milliseconds]-$results(startTime)}] ms"
} \
1000000 \
8
}

Nicolas Robert

unread,

Oct 8, 2020, 1:07:14 PM10/8/20

to

Dave, Stefan Thank you and especially the comments in your code (always useful !!)

Dave,
Not tested again ( little problem I don't have the version 8.7 and you use lpop command , for tested I found examples on the Wiki for this command in 8.6 )

Stefan,

Bingo !!
time : exec ParallelForEach.exe "largefile.txt" > ~ 26 seconds (file = ~ 8gb)
time : ::par::process "largefile.txt" 12 > ~ 29 seconds (file = ~ 8gb)
time : tclsh.exe readlargefile.tcl "largefile.txt" (Ralf fassel modification of my coroutine) > ~ 45 seconds (file = ~ 8gb)

In Sfefan's code , (Dave, yours is a little more complicated to understand (I would have a little reading before going to bed...) but I think the philosophy is identical to Stefan (maybe...)

If I understand... :

** You start a job with :
:: Read ; gets ; and split $data
** You start a second job with :
:: Read ; gets ; and split $data
** You start a third job with :
:: Read ; gets ; and split $data
etc... etc...
until end of file

And You wait until the "jobs" is finished

in .NET it's different ( it is my understanding of this function from what I have read)

** start loop
:: Read a line
:: Create a task to process this line ' no wait here, the taskscheduler will start the task as soon as a thread is available
** end loop until the end of the file

Finally in 2 examples Tcl [Read] command it’s the big part of the job , I feel like .Net it is more transparent with read function (can be done it the same thing I have not read the source code)

Nicolas

dave bruchie

unread,

Oct 9, 2020, 1:38:52 PM10/9/20

to

In mine:

The processing threads get a message with the file pointer. They read a block of lines. Send a doneWithFP message back to the controlling thread. Process the lines, then send a blockResults message back to the calling thread, and wait for the next message.

The controlling thread builds the processing threads and puts them in a ready list, sends the file pointer to the last thread in the ready list, and waits for doneWithFP or blockResults messages.

doneWithFP checks that there is at least one ready thread, if so it calls sendBlock. Finally it waits for more messages.

blockResults adds the processing thread to the thread ready list. If the file pointer is not in use it calls sendBlock to start processing another block. Then it saves the new results. After that it checks if all processing is done (no active threads), if so the final report is generated. Finally it returns to wait for more messages.

SendBlock assumes a thread is ready, and the file pointer is not in use (checked by callers). If the file pointer is not at end of file, it removes a thread from the ready list, sends it the file pointer, and waits for the next message. If the file pointer is at end of file, it is closed, and marked empty (""). Once the file is closed (or marked empty) any threads still in the ready file are released, and the active threads count is updated. Finally it returns to caller.

Looking back at your original post what you want is something like the tcllib fileutil::foreachLine but using multiple threads

Dave B

aotto1968

unread,

Oct 12, 2020, 4:05:01 PM10/12/20

to

On 17.09.20 19:45, Nicolas Robert wrote:
> A few days ago I needed to read a file (> 8gb) , I tried with Tcl but unfortunately a little long time...
> I tried with this class Parallel.ForEach (https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?redirectedfrom=MSDN&view=netcore-3.1) in Vb.net and and this is really much improved.
> It looks like this to those who know :
> Sub Main(args As String())
>
> Dim po As New ParallelOptions
> po.MaxDegreeOfParallelism = System.Environment.ProcessorCount
>
> Dim value As Integer = 0
>
> Parallel.ForEach(File.ReadLines(args(0)), po, Sub(line)
>
> If line.Contains(args(1)) Then
> value = 1
> End If
>
> End Sub)
>
> Console.WriteLine(value)
>
> End Sub

>
> Do you know if this could do in TCL ?
>

> Nicolas
>

Hi,

For testing it is good to have a proper test environment:

> The task is to find a needleString in a LARGE FILE

1. create a 4GB large file with random data on LINUX
This is used as TEMPLATE for the NEXT step.

> head -c 4G </dev/urandom >test.data

2. add on RANDOM position a "needleString"
Create a NEW file with TEMPLATE and additional Xnum needleString

> bash ./.populate.tcl test.data 255 "hello_world"
> 255 x "hello_world" on random position

.populate.tcl:
START ============================================================

if {$argc != 3} {
puts stderr "usage: $argv0 filename number string"
exit 1
}

proc myputs {str} {
puts -nonewline stdout $str
puts -nonewline $::logFH $str
}

foreach {fn nu st} $argv break
set sz [file size $fn]
set ssz [string length $st]
set logFH [open $fn.log w]

# create SORTED list of "nu" x random numbers between "0" and "sz"
set poL [list]
for {set i 0} {$i < $nu} {incr i} {
lappend poL [expr {round(rand() * $sz)}]
}
set poL [lsort -integer $poL]

myputs "POSITIONS --- ( num=[llength $poL] ) --------\n"
set idx 1
foreach p $poL {
myputs [format {%-10s, } $p]
if {($idx % 8) == 0} { myputs "\n" }
incr idx
}
myputs "\n"
myputs "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"

# create difference list of SORTED list
set poD [list]
set poC 0
foreach pos $poL {
lappend poD [expr {$pos-$poC}]
set poC $pos
}

# write "nu" x "st" into "fn" at random position
if {![file exists $fn.new]} {file copy $fn $fn.new}
set fh [open $fn.new w]
foreach pos $poD {
seek $fh $pos current
puts -nonewline $fh $st
seek $fh -$ssz current
}
close $fh

END =========================================================

3. use a SINGEL-PROC tcl-script as benchmark example

> bash ./.grep.tcl test.data.new hello_world

.grep.tcl
START ============================================================

if {$argc != 2} {
puts stderr "usage: $argv0 filename string"
exit 1
}
foreach {fn st} $argv break

set sz [file size $fn]
set bk [expr {1024 * 1024 * 1024}]

set fh [open $fn rb]
set pos 0
set poL [list]
set ovZ [expr {[string length $st] - 1}]; # overlap of blocks

while {true} {
set dt [read $fh [expr {$bk + $ovZ}]]
foreach r [regexp -all -inline -indices "$st" $dt] {
foreach {p w} $r break
lappend poL [expr {$pos + $p}]
}
if {[eof $fh]} break
seek $fh -$ovZ current
incr pos $bk
}
close $fh

puts "POSITIONS --- ( num=[llength $poL] ) -------------------"
set idx 1
foreach p $poL {
puts -nonewline [format {%-10s, } $p]
if {($idx % 8) == 0} { puts "" }
incr idx
}
puts ""
puts "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"

END =========================================================