goroutine benchmark results interpretation for anonymous func?

168 views
Skip to first unread message

Const V

unread,
Jun 12, 2022, 4:16:30 AM6/12/22
to golang-nuts
I'm implementing the following algorithm - sending result from bytes.Contains -
true or false to 1) channel directly and 2) via func.
I'm expecting 1)to be faster but actually 2) is faster:

4.99s vs 6.47s 

1) via channel:

for i := 0; i < cores; i++ {
  start := i * chunksize
  end := min(start+chunksize+overlap, filelen)
  go func(start, end, i int, quit chan string) {
      ch <- bytes.Contains(b.Bytes()[start:end], find)
   }(start, end, i, quit)

6.47s   Benchmark_100MB_Concurrent

2) via func:

for i := 0; i < cores; i++ {
  start := i * chunksize
  end := min(start+chunksize+overlap, filelen)
  go func(start, end, i int, quit chan string) {
      BytesContainsCh1(b.Bytes(), start, end, find, ch)
   }(start, end, i, quit)

func BytesContainsCh1(b []byte, start int, end int, find []byte, ch chan bool) { // <1>
   ch <- bytes.Contains(b[start:end], find)
}

4.99s Benchmark_100MB_Concurrent


Brian Candler

unread,
Jun 12, 2022, 5:54:51 AM6/12/22
to golang-nuts
On Sunday, 12 June 2022 at 09:16:30 UTC+1 Const V wrote:
  go func(start, end, i int, quit chan string) {
      BytesContainsCh1(b.Bytes(), start, end, find, ch)
   }(start, end, i, quit)

I note that this could be further simplified to:

go BytesContainsCh1(b.Bytes(), start, end, find, ch)
 
Maybe the compiler does this optimisation automatically.  Does it make any difference to your timings?

If you want to understand the difference you might need to look at the assembly language generated. See what happens with the number of stack frames allocated, whether the unused argument 'i' is elided in one of the cases, and so on.

Const V

unread,
Jun 12, 2022, 12:17:23 PM6/12/22
to golang-nuts
I already have a go routine on the anonymous function:
go func(start, end, i int, quit chan string) {

You are suggesting doing this?
go func(start, end, i int, quit chan string) {
      go BytesContainsCh1(b.Bytes(), start, end, find, ch)
   }(start, end, i, quit)

Brian Candler

unread,
Jun 12, 2022, 5:11:57 PM6/12/22
to golang-nuts
No: I'm suggesting exactly what I wrote.  Starting a goroutine looks like this:

go <function>(<args>)

It doesn't have to be an anonymous function, it can be a "real" function.  Hence this is perfectly valid:

go BytesContainsCh1(b.Bytes(), start, end, find, ch)

Const V

unread,
Jun 12, 2022, 7:55:11 PM6/12/22
to golang-nuts
The way I'm using is "b" is a large buffer - hundreds of MB.
I want to stop processing after the string is found using "quit".

for i := 0; i < cores; i++ {
         go func(start, end, i int, quit chan string) {
               ch <- bytes.Contains(b.Bytes()[start:end], find)
               select {
               case <-quit:
                    quit <- "bye"
                     return
               }
         }(start, end, i, quit)
}
for i := 0; i < cores; i++ {
        if <-ch { // the result from bytes.Contains is true so I want to stop processing the other routines.
            quit <- "quit"
         }
 }

the whole point was the question is why if I use 
BytesContainsCh1(b.Bytes(), start, end, find, ch)
the processing is faster: 4.99s 
rather using 
ch <- bytes.Contains(b.Bytes()[start:end], find)
6.47s 

which takes longer.
The logic says it should be otherwise because BytesContainsCh1 is a function call and it should be slower.

Tamás Gulácsi

unread,
Jun 13, 2022, 12:49:50 AM6/13/22
to golang-nuts
ch <- bytes.Contains(b.Bytes()[start:end], find)
               select {
               case <-quit:
                    quit <- "bye"
                     return
               }

Don't you want a "default:" case in there? This way all your goroutines are kept alive and waiting for that quit - which only ONE goroutine will get!
Use close(quit) - that will signal all goroutines!

BTW your bytes.Contains won't break mid-execution, so it either executes, or it haven't even started yet on "quit".
You have to break the problem to smaller chunks and loop on it, checking the quit channel on each iteration.

Or accept that this few hundred MBs is not big, and forget all the complexity, just bytes.Contains each part (/core) and write the result into a make([]bool, core), at the right index,
and at last iterate over this result slice and check whether any found sth.
You'll still need a WaitGroup for waiting all goroutines.

Const V

unread,
Jun 13, 2022, 2:20:29 AM6/13/22
to golang-nuts
All valid questions.
Still my original question remains - the timing I've got is inuntitive.

Brian Candler

unread,
Jun 13, 2022, 12:10:44 PM6/13/22
to golang-nuts
On Monday, 13 June 2022 at 01:55:11 UTC+2 Const V wrote:
The way I'm using is "b" is a large buffer - hundreds of MB.
I want to stop processing after the string is found using "quit".

for i := 0; i < cores; i++ {
         go func(start, end, i int, quit chan string) {
               ch <- bytes.Contains(b.Bytes()[start:end], find)
               select {
               case <-quit:
                    quit <- "bye"
                     return
               }
         }(start, end, i, quit)

You missed

  start := i * chunksize
  end := min(start+chunksize+overlap, filelen)

But even if that's what you meant, I'm not understanding what you're trying to do with this 'quit' channel.  The two steps in the goroutine (bytes.Contains, followed by <-quit) happen sequentially, one after the other. "bytes.Contains" is not interruptible; it will scan the whole string you've given, until either it finds the string or it exhausts the string.  You simply have to wait for it to complete.

Furthermore, I think that trying to parallize a string search this way is almost certain not to bring any benefits, and is very likely to be slower. This is because the bottleneck in string searching is not the CPU core; unless the data is already in cache (which it won't be, if it's hundreds of megabytes), the bottleneck is the bandwidth into main RAM (DRAM).

A single thread scanning from start to end of the buffer has the advantages that (a) all accesses are sequential, which is an access pattern DRAMs are optimised for, and (b) it can stop as soon as the search has found the string.  Splitting the search across (say) 4 cores will make things worse. You'll still be limited by RAM bandwidth, but the access patterns will jump around, and as I've just explained, you won't be able to terminate the other searches early.

How do your two original benchmarks compare to doing the simple option, i.e. just

bytes.Contains(b.Bytes(), find)

- with no goroutines?

Const V

unread,
Jun 13, 2022, 2:02:44 PM6/13/22
to golang-nuts

"bytes.Contains" is not interruptible - yes. I can improve the performance of the concurrent implementation

by stopping processing in other goroutines  if the string is found in one of it,

by using the original source code in my program to make it interruptible.


I'm analyzing 100MB string.

Utilizing 8 cores is giving 2 times faster solution with concurrent version.

Each coroutine is working on 12.5MB. The worst case scenario is when the string error is at the end of the input.

If one goroutine finds it I can stop the other 7 form working. This will be good if the searched string is in the middle.

------------------------------------------------

Machine Mac-mini M1 16 MB:

------------------------------------------------

7.47s  BigGrepBytes

3.81s  BigGrepBytes1_Concurrent

------------------------------------------------

I'll post more + benchmark in another thread with a better focus on concurrent.

Reply all
Reply to author
Forward
0 new messages