Concurrent solution: Which is the most efficient way to read STDIN lines 100s of MB long?

227 views
Skip to first unread message

Const V

unread,
Jun 13, 2022, 2:21:58 PM6/13/22
to golang-nuts

This is related of previous discussion but this time about finding concurrent solution.

I'm posting my solution. Any feedback and suggestions for improvement are welcome!

I need to write a program that reads STDIN and should output every line that contains a search word "test" to STDOUT. 

The input is a 100MB string.

Utilizing 8 cores is giving 2 times faster solution with concurrent version.

Each coroutine is working on 12.5MB. The worst case scenario is when the string error is at the end of the input.

If one goroutine finds it I can stop the other 7 from working. This will be good if the searched string is in the middle.

The only way to interrupt bytes.Contains is by getting its source code and look how to stop the search.

------------------------------------------------

Machine Mac-mini M1 16 MB:

------------------------------------------------

7.47s  BigGrepBytes

3.81s  BigGrepBytes1_Concurrent

------------------------------------------------


Here are the benchmarks:

---

Type: cpu

Time: Jun 13, 2022 at 10:34am (PDT)

Duration: 124.30s, Total samples = 207.43s (166.88%)

Active filters:

   focus=Benchmark

   hide=benchm

   show=Benchmark

      flat  flat%   sum%        cum   cum%

     7.47s  3.60% 21.86%      7.47s  3.60%  Benchmark_100MB_End__20charsBigGrepBytes_

     3.81s  1.84% 34.38%      3.81s  1.84%  Benchmark_100MB_End__20charsBigGrepBytes1_Concurrent

---

Not Concurrent version:

---

func BigGrepBytes(r io.Reader, w io.Writer, find []byte) { // <1>

    var b bytes.Buffer                   // <2>

    _, _ = b.ReadFrom(r)                 // <3>

    if bytes.Contains(b.Bytes(), find) { // <4>

        w.Write(b.Bytes())

    } else {

        w.Write([]byte(" \n")) // <5>

    }

}

Concurrent version:

--

func BigGrepBytesCh1(r io.Reader, w io.Writer, find []byte, cores int) {

    ch := make(chan bool, cores) // <1>

    overlap := len(find)

    var b bytes.Buffer   // <2>

    _, _ = b.ReadFrom(r) // <3>

    // <4>

    filelen := len(b.Bytes())

    chunksize := filelen / cores

   for i := 0; i < cores; i++ {

        start := i * chunksize

        end := min(start+chunksize+overlap, filelen)

        go BytesContainsCh1(b.Bytes(), start, end, find, ch)

    }

    found := false

    for i := 0; i < cores; i++ {

        if <-ch {

           found = true

           w.Write(b.Bytes()) // <7>

           break

        } else {

       }

    }

    if !found { // <8>

        w.Write([]byte(" \n"))

    }

}

func BytesContainsCh1(b []byte, start int, end int, find []byte, ch chan bool) { // <1>

    ch <- bytes.Contains(b[start:end], find)

}

--

Howard C. Shaw III

unread,
Jun 21, 2022, 10:37:06 AM6/21/22
to golang-nuts
There seems to be a conflict between these two statements:
>  should output every line that contains a search word "test" to STDOUT. 
and
>  If one goroutine finds it I can stop the other 7 from working. 

If it is supposed to output EVERY line containing test, what is the logic that says you can stop the other goroutines? Is this because you have done something ahead of time that separates the input into lines, and so you know all the goroutines are operating on the same line? 

It looks, frankly, more like this is code to 'repeat STDIN on STDOUT if test is present anywhere in STDIN,' with no cognizance of lines at all.

As long as that is what you are intending to do, rather than the described task, then you don't need to worry about stopping the other goroutines at all!

"Program execution begins by initializing the main package and then invoking the function main. When that function invocation returns, the program exits. It does not wait for other (non-main) goroutines to complete." -- https://go.dev/ref/spec#Program_execution

Simply exiting from your main when you have completed your read from STDIN and your write to STDOUT upon any of the goroutines returning a confirmation will end the program, whether other goroutines are still alive or not.

Howard


Reply all
Reply to author
Forward
0 new messages