Re: [go-nuts] Some questions about GC and map[int]int

275 views
Skip to first unread message

Kurtis Rader

unread,
Nov 24, 2023, 2:13:04 AM11/24/23
to Zhao Weng, golang-nuts
You only need to ask for help once. You sent essentially the same message twice. Also, your example program doesn't provide any time for the GC to run in any meaningful manner as far as I can tell. So I am confused how your trivial program relates to a real world program. That is, while simple reproductions of a problem are always a good idea that does not seem to be the case with your example reproduction. No real program would instantiate a huge map then immediately terminate. I think you need to provide more context and a more realistic program to reproduce the problem.

On Thu, Nov 23, 2023 at 10:10 PM 'Zhao Weng' via golang-nuts <golan...@googlegroups.com> wrote:
Dear Gophers:

I have some questions about GC and map[int]int, please help.

consider the following program:

```go
package main

import (
"fmt"
"os"
"runtime/pprof"
)

func main() {
f, err := os.OpenFile("memory.pb.gz", os.O_RDWR|os.O_CREATE|os.O_TRUNC, 0666)
if err != nil {
panic(err)
}

m := make(map[int]int)
for i := 0; i < 10e6; i++ {
m[i] = 2 * i
}

if err := pprof.WriteHeapProfile(f); err != nil {
panic(err)
}
}
```

after inserting 10 million key/value pair, the result of this memory profile show inuse_objects count ~ 100k

```bash
go tool pprof -inuse_objects memory.pb.gz
Type: inuse_objects
Time: Nov 23, 2023 at 3:40pm (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 127450, 100% of 127450 total
      flat  flat%   sum%        cum   cum%
    127450   100%   100%     127450   100%  main.main
         
```

in my view this inuse_objects is found by GC by following the bucket and old_buckets pointer of hmap

```go
// A header for a Go map.
type hmap struct {
// Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
// Make sure this stays in sync with the compiler's definition.
count int // # live cells == size of map. Must be first (used by len() builtin)
flags uint8
B uint8 // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
hash0 uint32 // hash seed

buckets unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
nevacuate uintptr // progress counter for evacuation (buckets less than this have been evacuated)

extra *mapextra // optional fields
}
```

but according to this doc: https://go101.org/optimizations/6-map.html

If the key type and element type of a map both don't contain pointers, then in the scan phase of a GC cycle, the garbage collector will not scan the entries of the map. This could save much time.


My question is:
maybe GC should not follow the bucket and old_buckets pointer of hmap?
or the doc above just forbid GC from scan the entries not the bucket and old_buckets pointer of hmap, and larger map[int]int do increase GC scan overhead by having more buckets and old_buckets?

Thanks

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/b7049ce2-d5a8-49bc-ae44-7a73777494f4n%40googlegroups.com.


--
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

Zhao Weng

unread,
Nov 24, 2023, 4:54:24 AM11/24/23
to golang-nuts
Sorry about the redundant spam like mail I send to the mail box, I do delete the conversation on the google groups, I send later emails for correcting typos and replace screenshot to text. I'm so sorry for that.

Zhao Weng

unread,
Nov 24, 2023, 5:07:56 AM11/24/23
to golang-nuts
And sorry for long monologue missing the whole point. It all boil down to one simple question:

Does larger map[int]int do increase GC scan overhead by having more buckets and old_buckets (which is reference by pointer) ?

The trivial program above seems to suggest positive since it shows a huge map[int]int can have many inuse_objects.

For context, I have a long running service which has a long-lived map[int]int in memory, we constantly add new data to it and maintain its size by calling delete() when oversize,  we assume there should not result in GC overhead growth as time goes by. 

Thanks

Zhao Weng

unread,
Nov 24, 2023, 5:44:20 AM11/24/23
to golang-nuts
with a new program I seems to reproduce the problem:

```go
package main

import (
"fmt"
"runtime"
"time"
)

func main() {
m := make(map[int]int)
for i := 0; i < 10e6; i++ {
m[i] = 2 * i
}

st := time.Now()
runtime.GC()
fmt.Printf("GC in %f seconds, len(m):%d\n", time.Since(st).Seconds(), len(m))

// delete old data and add new data, while keeping len(m) == 10e6
for run := 1; run < 11; run++ {
for i := 0; i < 10e6; i++ {
delete(m, i*run)
}
for i := 0; i < 10e6; i++ {
m[i*(run+1)] = i * (run + 1)
}
}

st = time.Now()
runtime.GC()
fmt.Printf("after delete and refill 10 runs, GC in %f seconds, len(m):%d\n", time.Since(st).Seconds(), len(m))

// delete old data and add new data, while keeping len(m) == 10e6
for run := 11; run < 101; run++ {
for i := 0; i < 10e6; i++ {
delete(m, i*run)
}
for i := 0; i < 10e6; i++ {
m[i*(run+1)] = i * (run + 1)
}
}
st = time.Now()
runtime.GC()
fmt.Printf("after delete and refill 100 runs, GC in %f seconds, len(m):%d\n", time.Since(st).Seconds(), len(m))
}
```

And the output is:
```bash
GC in 0.003220 seconds, len(m):10000000
after delete and refill 10 runs, GC in 0.023603 seconds, len(m):10000000
after delete and refill 100 runs, GC in 0.053591 seconds, len(m):10000000
```

for each run I empty the map[int]int and refill it with 10 million item, and as it goes GC duration seems to increase.

Brian Candler

unread,
Nov 24, 2023, 8:20:34 AM11/24/23
to golang-nuts
A map is a dynamic data structure, which allocates more pieces (buckets) as the map grows. Google "golang map internals" to find articles and videos which explain the data structure in more detail.

Clearly, all parts of that structure must be tracked by the GC, otherwise they would get garbage collected.

If the keys and values are ints, then they're stored directly in hash slots and no further garbage tracking is required. If they're pointers, then additionally the GC would have to track the memory used by the objects pointed to.

Zhao Weng

unread,
Nov 24, 2023, 10:43:25 AM11/24/23
to golang-nuts
So in this case, if I understand it correctly, statement of this doc: https://go101.org/optimizations/6-map.html

If the key type and element type of a map both don't contain pointers, then in the scan phase of a GC cycle, the garbage collector will not scan the entries of the map. This could save much time.

just mean we can save further garbage tracking, but buckets will be tracked no matter what, Am I right?

Brian Candler

unread,
Nov 24, 2023, 11:09:25 AM11/24/23
to golang-nuts
Yes, it says it will not scan the entries of the map (i.e. the keys and the values held within the map).

But if the map points to (say) 100,000 separate hash buckets, then the memory allocated for those buckets clearly must be marked as not eligible for garbage collection.

Zhao Weng

unread,
Nov 26, 2023, 9:14:04 PM11/26/23
to golang-nuts
@Brian thanks a lot for confirmation!!!
Reply all
Reply to author
Forward
0 new messages