On 17.09.20 19:45, Nicolas Robert wrote:
> A few days ago I needed to read a file (> 8gb) , I tried with Tcl but unfortunately a little long time...
> I tried with this class Parallel.ForEach (
https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreach?redirectedfrom=MSDN&view=netcore-3.1) in Vb.net and and this is really much improved.
> It looks like this to those who know :
> Sub Main(args As String())
>
> Dim po As New ParallelOptions
> po.MaxDegreeOfParallelism = System.Environment.ProcessorCount
>
> Dim value As Integer = 0
>
> Parallel.ForEach(File.ReadLines(args(0)), po, Sub(line)
>
> If line.Contains(args(1)) Then
> value = 1
> End If
>
> End Sub)
>
> Console.WriteLine(value)
>
> End Sub
>
> Do you know if this could do in TCL ?
>
> Nicolas
>
Hi,
For testing it is good to have a proper test environment:
> The task is to find a needleString in a LARGE FILE
1. create a 4GB large file with random data on LINUX
This is used as TEMPLATE for the NEXT step.
> head -c 4G </dev/urandom >test.data
2. add on RANDOM position a "needleString"
Create a NEW file with TEMPLATE and additional Xnum needleString
> bash ./.populate.tcl test.data 255 "hello_world"
> 255 x "hello_world" on random position
.populate.tcl:
START ============================================================
if {$argc != 3} {
puts stderr "usage: $argv0 filename number string"
exit 1
}
proc myputs {str} {
puts -nonewline stdout $str
puts -nonewline $::logFH $str
}
foreach {fn nu st} $argv break
set sz [file size $fn]
set ssz [string length $st]
set logFH [open $fn.log w]
# create SORTED list of "nu" x random numbers between "0" and "sz"
set poL [list]
for {set i 0} {$i < $nu} {incr i} {
lappend poL [expr {round(rand() * $sz)}]
}
set poL [lsort -integer $poL]
myputs "POSITIONS --- ( num=[llength $poL] ) --------\n"
set idx 1
foreach p $poL {
myputs [format {%-10s, } $p]
if {($idx % 8) == 0} { myputs "\n" }
incr idx
}
myputs "\n"
myputs "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n"
# create difference list of SORTED list
set poD [list]
set poC 0
foreach pos $poL {
lappend poD [expr {$pos-$poC}]
set poC $pos
}
# write "nu" x "st" into "fn" at random position
if {![file exists $
fn.new]} {file copy $fn $
fn.new}
set fh [open $
fn.new w]
foreach pos $poD {
seek $fh $pos current
puts -nonewline $fh $st
seek $fh -$ssz current
}
close $fh
END =========================================================
3. use a SINGEL-PROC tcl-script as benchmark example
> bash ./.grep.tcl
test.data.new hello_world
.grep.tcl
START ============================================================
if {$argc != 2} {
puts stderr "usage: $argv0 filename string"
exit 1
}
foreach {fn st} $argv break
set sz [file size $fn]
set bk [expr {1024 * 1024 * 1024}]
set fh [open $fn rb]
set pos 0
set poL [list]
set ovZ [expr {[string length $st] - 1}]; # overlap of blocks
while {true} {
set dt [read $fh [expr {$bk + $ovZ}]]
foreach r [regexp -all -inline -indices "$st" $dt] {
foreach {p w} $r break
lappend poL [expr {$pos + $p}]
}
if {[eof $fh]} break
seek $fh -$ovZ current
incr pos $bk
}
close $fh
puts "POSITIONS --- ( num=[llength $poL] ) -------------------"
set idx 1
foreach p $poL {
puts -nonewline [format {%-10s, } $p]
if {($idx % 8) == 0} { puts "" }
incr idx
}
puts ""
puts "^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^"
END =========================================================