Garbage collector consumes more CPU with more cores

Richard Gooch

unread,

Oct 2, 2015, 1:45:04 AM10/2/15

to golang-nuts

Hi, all. It seems that the more CPU cores you have, the more CPU time is spent in the garbage collector. This is with go1.5 and 1.5.1.

I have a programme where I have one goroutine pinned to an OS thread, which creates a lot of objects (it's building a tree representing a file-system). The "main" goroutine is normally sitting in a select loop waiting for input on a channel (waiting for a new tree, then dropping the reference to the old tree, and calling runtime.GC()). It's on the order of 10 seconds for this cycle.

So far I've been testing on my desktop (4 core hyperthread system, so 8 hardware threads). On this system, the unpinned OS threads use a variable amount of CPU, but it tends to range from 6 to 15% (each). I see 8-9 OS threads like this. Today I tried it for the first time on a system with 40 hardware threads. Now I see ~40 OS threads using 10 to 30% CPU time. The system load average is also higher. If I call runtime.GOMAXPROCS(2) on my desktop I see the number of unpinned OS threads drop to 4 and the load average goes down. I don't see a clear change in the CPU percentage.

I ran the profiler and it looks like the majority of the time is spent in the garbage collector and low-level lock implementation. It looks like there is a scalability problem with the garbage collector: the more threads there are, the more CPU time each thread spends in the GC, and the more total CPU time on the system (not just per core) is spent.

Is this expected behaviour?

BTW: for fun I turned off the goroutine which runs the file-system scan and the CPU load for the entire programme drops to zero, as I expected.

Regards,

Richard....

Giulio Iotti

unread,

Oct 2, 2015, 6:04:16 AM10/2/15

to golang-nuts

On Friday, October 2, 2015 at 7:45:04 AM UTC+2, Richard Gooch wrote:

Is this expected behaviour?

I don't think it's wrong that if your program is faster and creates more garbage, more CPU time is spent in garbage collection.

If the relation between garbage and time spent collecting is not linear, I think the current GC doesn't make guarantees. It only guarantees stop-the-world pauses.

Does your program actually run faster? (Process more trees per second, for example.)

--

Giulio Iotti

Ian Lance Taylor

unread,

Oct 2, 2015, 9:50:39 AM10/2/15

to Richard Gooch, golang-nuts, Austin Clements

There may be a scaling problem. I don't know.

That said, the new GC design in Go 1.5 is designed so that the overall
program does not spike its memory allocation. If one goroutine is
working hard allocating memory, other goroutines are going to work
sweeping the heap looking for garbage to collect. It sounds like your
program is going to defeat that strategy, assuming most of the memory
you allocate is going to your tree and is therefore not collectable.
The tuning system is going to see allocation spiking more and more and
is going to try harder and harder to stop it.

The GC is limited to taking about 25% of the available CPU, so giving
it more goroutines just gives it more of a chance to look for
non-existent garbage.

I'm CC'ing austin so that he can tell me if I'm completely wrong.

Ian

Richard Gooch

unread,

Oct 2, 2015, 9:57:03 AM10/2/15

to golang-nuts

My programme is not faster. The limiting factor is the pinned goroutine (which is pinned to an OS thread which is niced), and that takes ~17 seconds for each cycle. At the end of each cycle, it sends the data over a channel and starts a new scan cycle. The "main" goroutine receives that, performs a quick comparison (a small fraction of a second), and discards the previous scan result. For over 95% of the time, the "main" goroutine is blocked in the select call. And yet, during that time, a lot of CPU time is being chewed up in the garbage collector and locking.

Regards,

Richard....

Richard Gooch

unread,

Oct 2, 2015, 10:09:11 AM10/2/15

to golang-nuts, rg+go...@safe-mbox.com, aus...@google.com

Interesting that you say that. I actually put in an explicit call to GC() after the end of each scan cycle because I found that reduced the amount of memory used (MemStats.Sys or RSS). I expect that's because I'm creating and deleting objects faster than the GC can handle.

The GC is limited to taking about 25% of the available CPU, so giving
it more goroutines just gives it more of a chance to look for
non-existent garbage.

I'm CC'ing austin so that he can tell me if I'm completely wrong.

I think I would be better off if I had control over the GC. Specifically, if I could turn off automatic GC and pick when and where (i.e. in a goroutine pinned to an OS thread) I did it. That would stop the background CPU consumption. I'm better off with a 100 ms stall than the high CPU consumption.

Regards,

Richard....

Michael Jones

unread,

Oct 2, 2015, 10:39:19 AM10/2/15

to Richard Gooch, golang-nuts, aus...@google.com

You can control GC in an important sense, especially in the cyclic application you describe. Manage your own pool of objects that the tree is build from. Put your allocator in there, with a free list, and a recycle logic for “free” activities. This way the total memory use is no more than two trees, or 1 plus the fraction of the old tree that you can release as you process it in parallel with building a new one.

I do this all the time and essentially have GB of data in play with nearly zero garbage creation until the program finishes. (Nearly zero but not zero because of miscellaneous non-pool allocations and certain anti-memory-waste strategies.) It is simple and fast.

Austin Clements

unread,

Oct 2, 2015, 12:22:26 PM10/2/15

to Ian Lance Taylor, Richard Gooch, golang-nuts

This isn't entirely true. The runtime is designed to *dedicate* 25% of the CPU to GC during a GC cycle; however, it will also give any idle time to the garbage collector to help it finish faster. Richard, it sounds like your program doesn't have much parallelism (please correct me if I'm wrong), so if you're giving it 40 CPUs, but the application is only using one or two, the garbage collector is happy to suck up the idle time on those other CPUs in order to finish the GC cycle faster.

If you run with GODEBUG=gctrace=1 set, you can see exactly how much time the garbage collector is using for various activities. The format is documented here: https://godoc.org/runtime (search for gctrace). I suspect you'll see a very high idle CPU usage to mark phase duration ratio.

It's entirely possible there are also scaling problems. I did run a GC benchmark on an 80 core machine last week where the application itself was able to use all 80 cores and the garbage collector fared pretty well, but that was just one benchmark and I was focusing on stop-the-world time.

Austin Clements

unread,

Oct 2, 2015, 12:26:28 PM10/2/15

to Richard Gooch, golang-nuts

It shouldn't be possible to create objects faster than the GC can handle. It will scale up its activity (possibly at the cost of your application's throughput!) to make sure the application can't outpace it.

The GC is limited to taking about 25% of the available CPU, so giving
it more goroutines just gives it more of a chance to look for
non-existent garbage.

I'm CC'ing austin so that he can tell me if I'm completely wrong.

I think I would be better off if I had control over the GC. Specifically, if I could turn off automatic GC and pick when and where (i.e. in a goroutine pinned to an OS thread) I did it. That would stop the background CPU consumption. I'm better off with a 100 ms stall than the high CPU consumption.

I don't particularly recommend this, but if you understand your application well enough to take complete control of GC, you can run your application with GOGC=-1 in the environment to disable automatic GC scheduling and only use runtime.GC().

Richard Gooch

unread,

Oct 2, 2015, 12:30:29 PM10/2/15

to golang-nuts, rg+go...@safe-mbox.com, aus...@google.com

On Friday, 2 October 2015 07:39:19 UTC-7, Michael Jones wrote:

You can control GC in an important sense, especially in the cyclic application you describe. Manage your own pool of objects that the tree is build from. Put your allocator in there, with a free list, and a recycle logic for “free” activities. This way the total memory use is no more than two trees, or 1 plus the fraction of the old tree that you can release as you process it in parallel with building a new one.

I do this all the time and essentially have GB of data in play with nearly zero garbage creation until the program finishes. (Nearly zero but not zero because of miscellaneous non-pool allocations and certain anti-memory-waste strategies.) It is simple and fast.

I think that approach will still leave a lot of work for the GC because I allocate a lot of slices (directory entries). I'd need to implement a slab allocator for the different slice lengths. This doesn't sound easy anymore :-/

Regards,

Richard....

Richard Gooch

unread,

Oct 2, 2015, 12:39:40 PM10/2/15

to golang-nuts, ia...@golang.org, rg+go...@safe-mbox.com, aus...@google.com

Log attached.

log

Richard Gooch

unread,

Oct 2, 2015, 12:42:54 PM10/2/15

to golang-nuts, rg+go...@safe-mbox.com, aus...@google.com

That made it worse. On my 8 HT system, the unpinned OS threads now take 30-50% CPU time and the scanning thread is running slower. Same effect with GOGC=0, it seems.

Regards,

Richard....

Austin Clements

unread,

Oct 2, 2015, 12:46:52 PM10/2/15

to Richard Gooch, golang-nuts

Oh, sorry. Set GOGC=off. (The value used inside the runtime is -1, but, in fact, GOGC=-1 will set the internal value to 0, just as you saw. :)

⚛

unread,

Oct 2, 2015, 12:48:20 PM10/2/15

to golang-nuts

In each cycle, is 99% of the new tree the same as the old tree?

On Friday, October 2, 2015 at 7:45:04 AM UTC+2, Richard Gooch wrote:

Richard Gooch

unread,

Oct 2, 2015, 12:56:35 PM10/2/15

to golang-nuts, rg+go...@safe-mbox.com, aus...@google.com

On Friday, 2 October 2015 09:46:52 UTC-7, Austin Clements wrote:

Oh, sorry. Set GOGC=off. (The value used inside the runtime is -1, but, in fact, GOGC=-1 will set the internal value to 0, just as you saw. :)

Cough. Well, the background GC is doing something useful :-) With GOGC=off I'm using a lot more memory. I expect that's because I do a lot of slice creation and then create new slices for different inode types and drop the initial slice. For each directory. Hm. I've been considering switching to using interface{} for all these objects and using type switches, but I wonder about the performance impact of that, both when accessing fields and on the GOB encoder (which is already pretty slow with a 100k object tree)... That approach would reduce the number of alloc/dealloc pairs.

Regards,

Richard....

Richard Gooch

unread,

Oct 2, 2015, 1:02:38 PM10/2/15

to golang-nuts

On Friday, 2 October 2015 09:48:20 UTC-7, ⚛ wrote:

In each cycle, is 99% of the new tree the same as the old tree?

I'd say 90% the same. 99% if I started re-using slices, I guess, but that will be more complicated.

P.S. please Cc: me on replies. I don't look at the Groups web page that often.

Regards,

Richard....

Bakul Shah

unread,

Oct 2, 2015, 1:44:08 PM10/2/15

to Richard Gooch, golang-nuts

On Fri, 02 Oct 2015 09:30:29 PDT Richard Gooch <rg+go...@safe-mbox.com> wrote:
>
> I think that approach will still leave a lot of work for the GC because I
> allocate a lot of slices (directory entries). I'd need to implement a slab
> allocator for the different slice lengths. This doesn't sound easy anymore
> :-/

So a slice holds directory entries and you append to it one at
a time? Can you create a slice with a much larger capacity
and then trim?

From quickly scanning this thread, looks like you are
repeatedly doing the equivalent of "find / ..." on a
filesystem. If you can explain in some detail what youre code
does, we might be able to provide more specific suggestions.
There is more than one way to skin a cat (or "find" in this
case:)

This is more likely a program design issue. GC frees you from
low level memory management details but you still have to
understand the memory needs of your program and accordingly
architect your code.

Austin Clements

unread,

Oct 2, 2015, 1:55:06 PM10/2/15

to Richard Gooch, golang-nuts, Ian Lance Taylor

In this log, it looks like ~4 to 5 of the 8 CPUs are going unused by the application, so the GC is happy to make use of those to finish the GC cycles faster.

Richard Gooch

unread,

Oct 2, 2015, 10:27:48 PM10/2/15

to golang-nuts, rg+go...@safe-mbox.com

On Friday, 2 October 2015 10:44:08 UTC-7, Bakul Shah wrote:

On Fri, 02 Oct 2015 09:30:29 PDT Richard Gooch <rg+go...@safe-mbox.com> wrote:
>
> I think that approach will still leave a lot of work for the GC because I
> allocate a lot of slices (directory entries). I'd need to implement a slab
> allocator for the different slice lengths. This doesn't sound easy anymore
> :-/

So a slice holds directory entries and you append to it one at
a time? Can you create a slice with a much larger capacity
and then trim?

I used to do that but it consumed more memory. See below.

From quickly scanning this thread, looks like you are
repeatedly doing the equivalent of "find / ..." on a
filesystem. If you can explain in some detail what youre code
does, we might be able to provide more specific suggestions.
There is more than one way to skin a cat (or "find" in this
case:)

I call Readdirnames(), then I walk the directory entries, call syscall.Lstat() and add each entry to one of 4 slices, depending on whether it's a regular file, symlink, device node/pipe or a directory.

What I used to do was create each of those slices with capacity = num dirents, but that of course wastes memory, as I mentioned above. The next thing I did was to allocate new slices of the correct length for each type and deref the oversized slices. That brought down allocated memory (after a GC()) but pushed up Sys memory consumption. So I gave up on that and just leave it up to append() to decide how much memory to allocate.

As I mentioned upthread, I've considered just creating a slice of interface{} and sticking the different entry types in there, but I've been concerned that will have a performance cost since I'll need a bunch of type switches all over. I don't have enough experience with benchmarking Go to know if that's a concern, but it's work I was hoping to avoid (or at least defer:-). Somebody who knows please feel free to speak up now :-)

Regards,

Richard....

Tamás Gulácsi

unread,

Oct 3, 2015, 1:36:28 AM10/3/15

to golang-nuts

You can try sync.Pool to reuse those result slices, if you don't want to juggle with the old/actual pairs.

Maybe try Readdir in a loop limiting the slice size. (Readdir names also sorts, but that is not much work)

Generally: don't create garbage, reuse everything!

Taru Karttunen

unread,

Oct 3, 2015, 5:44:18 AM10/3/15

to Richard Gooch, golang-nuts

On 02.10 06:57, Richard Gooch wrote:
> My programme is not faster. The limiting factor is the pinned goroutine
> (which is pinned to an OS thread which is niced), and that takes ~17
> seconds for each cycle. At the end of each cycle, it sends the data over a
> channel and starts a new scan cycle. The "main" goroutine receives that,
> performs a quick comparison (a small fraction of a second), and discards
> the previous scan result. For over 95% of the time, the "main" goroutine is
> blocked in the select call. And yet, during that time, a lot of CPU time is
> being chewed up in the garbage collector and locking.

So your program is essentially:
A: scan whole directory tree and send to B
B: calculate difference between directory trees received from A

Is there any reason you couldn't:

A: scan directories comparing to the old state updating it as
things are processing and send differences to B when done
B: do whatever you want with the differences

This would drastically reduce your need for allocations.

- Taru Karttunen

Richard Gooch

unread,

Oct 3, 2015, 1:32:34 PM10/3/15

to golang-nuts, rg+go...@safe-mbox.com

Yes, it would, but at any time the programme can receive an RPC request to get the last scan result, and I don't want to wait until the current scan completes (in some cases it can take a long time, such as an hour). Encoding an object tree that is being mutated isn't safe. So the flow is:
- scanning goroutine sends completed scan results over channel
- main goroutine receives new scan results and compares with the last scan, updates global pointer and generation count if different
- RPC handler compares generation count with what's in the request and if different responds with scan result.

Regards,

Richard....

Michael Jones

unread,

Oct 3, 2015, 5:08:53 PM10/3/15

to Richard Gooch, golang-nuts

Filesystem notification seems a natural way here.

https://en.wikipedia.org/wiki/Inotify

https://fsnotify.org/

Michael

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Richard Gooch

unread,

Oct 3, 2015, 5:29:32 PM10/3/15

to golang-nuts, rg+go...@safe-mbox.com

On Saturday, 3 October 2015 14:08:53 UTC-7, Michael Jones wrote:

Filesystem notification seems a natural way here.

https://en.wikipedia.org/wiki/Inotify
https://fsnotify.org/

I'm doing full checksum comparisons as that catches file corruption as well as change events that the kernel is aware of. Also, the full checksum scan is required at least once - even if you want to ignore file corruption - since I need to know (from off-machine) if files are deviating from their required state. Inotify is a potential optimisation for later, but it's not a replacement for checksumming.

Regards,

Richard....

Richard Gooch

unread,

Oct 4, 2015, 12:26:00 AM10/4/15

to golang-nuts, rg+go...@safe-mbox.com, aus...@google.com

On Friday, 2 October 2015 09:46:52 UTC-7, Austin Clements wrote:

Oh, sorry. Set GOGC=off. (The value used inside the runtime is -1, but, in fact, GOGC=-1 will set the internal value to 0, just as you saw. :)

A second followup on GOGC=off: this not only disables the automatic GC, but it also disables runtime.GC(), which is not what I think you expected, and definitely isn't what I wanted.

Is there a way to disable automatic GC while still letting calls to runtime.GC() clean up memory?

Regards,

Richard....

Giulio Iotti

unread,

Oct 4, 2015, 2:21:58 AM10/4/15

to golang-nuts, rg+go...@safe-mbox.com

Another problem with inotify is that there is a hard limit of watchers per process. Since you'll need to monitor each directory (and each subdirectory) you'll hit the limit quite soon. (Correct me if I'm wrong.)

Another thing you could try is to structure your program more or less as Taru said above: create two tree objects, A and B. While scanning, update B and respond to RPC calls with A (the "stale tree"). You will need a warmup time on the first run, but subsequently you will generate very little garbage.

Again, without knowing your actual code and implementation: spawn multiple workers that hash file contents until your code is I/O bound. Here too, recycle the hash.Hash instances (there should be a Reset function.)

Tell us more about the overall architecture before tuning the GC.

--

Giulio Iotti

Sokolov Yura

unread,

Oct 4, 2015, 5:21:10 AM10/4/15

to golang-nuts

Why do you pin walking goroutine? Why don't you parallelize walk by natural way (i.e. goroutine per thread with some kind of concurrency limit).

Michael Jones

unread,

Oct 4, 2015, 8:19:58 AM10/4/15

to Sokolov Yura, golang-nuts

Good suggestion. On that front, I wrote a parallel version of the standard filesystem walker last year. Maybe something like this would be of interest to Richard once he gets his design for memory reuse solved:

https://groups.google.com/forum/#!searchin/golang-nuts/walk$20jones$20parallel/golang-nuts/FplDIeViOq4/dFctpG5MS-kJ

…note that it’s is often 10x faster (scroll to last post) and one of the sample programs (dup at https://github.com/MichaelTJones) does a nice hierarchical job of testing for sameness.

Michael

> On Oct 4, 2015, at 2:21 AM, Sokolov Yura <funny....@gmail.com> wrote:
>
> Why do you pin walking goroutine? Why don't you parallelize walk by natural way (i.e. goroutine per thread with some kind of concurrency limit).
>

Richard Gooch

unread,

Oct 4, 2015, 6:26:19 PM10/4/15

to golang-nuts

On Sunday, 4 October 2015 02:21:10 UTC-7, Sokolov Yura wrote:

Why do you pin walking goroutine? Why don't you parallelize walk by natural way (i.e. goroutine per thread with some kind of concurrency limit).

Because the scanner can be CPU intensive. The scanning code rate limits the amount of I/O bandwidth but if the tree being scanned fits within the page cache, there is no I/O to the underlying media and thus no rate limiting applies. So, to limit the impact of the scanner on the system, the scanning thread is niced. The rest of the programme is left at normal priority since it's supposed to be doing very little (responding to occasional RPC requests) and it should be responsive to the RPC requests.

My work-around is to call runtime.GOMAXPROCS(2): one thread for the (niced) scanner and another thread for everything else. Even so, I see 3-4 OS threads chewing CPU (ignoring the scanner thread, which is easy to identify since it's niced:-). I wanted two OS threads total.

Regards,

Richard....

Richard Gooch

unread,

Oct 4, 2015, 6:29:22 PM10/4/15

to golang-nuts, funny....@gmail.com

On Sunday, 4 October 2015 05:19:58 UTC-7, Michael Jones wrote:

Good suggestion. On that front, I wrote a parallel version of the standard filesystem walker last year. Maybe something like this would be of interest to Richard once he gets his design for memory reuse solved:

https://groups.google.com/forum/#!searchin/golang-nuts/walk$20jones$20parallel/golang-nuts/FplDIeViOq4/dFctpG5MS-kJ

…note that it’s is often 10x faster (scroll to last post) and one of the sample programs (dup at https://github.com/MichaelTJones) does a nice hierarchical job of testing for sameness.

That sounds nice, but my goals are a scanner with minimal footprint. The systems where it's running have real work to do :-)

Regards,

Richard....

Matt Harden

unread,

Oct 4, 2015, 7:03:11 PM10/4/15

to Richard Gooch, golang-nuts, funny....@gmail.com

This might be helpful, on Linux: http://man7.org/linux/man-pages/man7/fanotify.7.html

--

Richard Gooch

unread,

Oct 4, 2015, 7:08:26 PM10/4/15

to golang-nuts, rg+go...@safe-mbox.com, funny....@gmail.com

On Sunday, 4 October 2015 16:03:11 UTC-7, Matt Harden wrote:

This might be helpful, on Linux: http://man7.org/linux/man-pages/man7/fanotify.7.html

I feel like we're looping. Upthread I said that checksum scanning detects file corruption and other events the kernel doesn't know about. Hence, *notify APIs are not a solution for this problem.

Юрий Соколов

unread,

Oct 5, 2015, 1:15:24 AM10/5/15

to Richard Gooch, golang-nuts

What happens, if you set GOMAXPROCS(1)?

More general question to core team: is locked thread encounted in GOMAXPROCS?

05 окт. 2015 г. 1:26 пользователь "Richard Gooch" <rg+go...@safe-mbox.com> написал:

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/YaRDry8ZCYk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Tamás Gulácsi

unread,

Oct 5, 2015, 1:53:22 AM10/5/15

to golang-nuts

What do you mean "the kernel doesn't know about"? Is there such thing?
Except hardware errors, all fs access goes through the kernel, so it does know about.

Richard Gooch

unread,

Oct 5, 2015, 2:22:44 AM10/5/15

to golang-nuts

On Sunday, 4 October 2015 22:53:22 UTC-7, Tamás Gulácsi wrote:

What do you mean "the kernel doesn't know about"? Is there such thing?
Except hardware errors, all fs access goes through the kernel, so it does know about.

Yes, hardware errors. They do happen. Firmware bugs. Kernel bugs. Also exploit code (either directly writing to the block device, memory, leveraging kernel bugs, etc.). If you care about data integrity or intrusion detection, you do a crypto hash checksum (I use SHA-512 because that currently has the best performance vs. integrity checking tradeoff).

Regards,

Richard....

aro...@gmail.com

unread,

Oct 5, 2015, 3:04:15 AM10/5/15

to golang-nuts, rg+go...@safe-mbox.com

On Friday, October 2, 2015 at 7:27:48 PM UTC-7, Richard Gooch wrote:

I call Readdirnames(), then I walk the directory entries, call syscall.Lstat() and add each entry to one of 4 slices, depending on whether it's a regular file, symlink, device node/pipe or a directory.

What I used to do was create each of those slices with capacity = num dirents, but that of course wastes memory, as I mentioned above. The next thing I did was to allocate new slices of the correct length for each type and deref the oversized slices. That brought down allocated memory (after a GC()) but pushed up Sys memory consumption. So I gave up on that and just leave it up to append() to decide how much memory to allocate.

As I mentioned upthread, I've considered just creating a slice of interface{} and sticking the different entry types in there, but I've been concerned that will have a performance cost since I'll need a bunch of type switches all over. I don't have enough experience with benchmarking Go to know if that's a concern, but it's work I was hoping to avoid (or at least defer:-). Somebody who knows please feel free to speak up now :-)

Imagine that you store a pre-allocated slice of "struct TypedEntry { typ uint8; val interface{} }". Then, instead of type switching, you just check the typ field. Not much CPU overhead. Type switches should be approximately the same overhead. The type cast itself should cost nothing.

Other options are:

* Store everything in a single giant backing array, and then after scanning copy to four other slices. However, that briefly requires a lot more memory.

* Store everything in a single giant backing array, then re-order it so that the different types are in contiguous sections, and then create four slices into that array.

I think that sorting the items in one large array should be incredibly fast since it's a trivial memory operation, and then slicing that to make four smaller slices is approximately zero meaningful overhead, so that's my favorite option.

- Augusto

Richard Gooch

unread,

Oct 6, 2015, 12:52:08 AM10/6/15

to golang-nuts, rg+go...@safe-mbox.com

On Sunday, 4 October 2015 22:15:24 UTC-7, Sokolov Yura wrote:

What happens, if you set GOMAXPROCS(1)

I get the pinned (niced) OS thread, and 1-2 "main" OS threads. However, the aggregate CPU time is ~100% of one core. So, even though there are still multiple OS threads running, only one at a time seems to run. Which is good enough for my purposes.

Richard Gooch

unread,

Oct 8, 2015, 4:11:07 PM10/8/15

to golang-nuts, rg+go...@safe-mbox.com

On Monday, 5 October 2015 21:52:08 UTC-7, Richard Gooch wrote:

On Sunday, 4 October 2015 22:15:24 UTC-7, Sokolov Yura wrote:
What happens, if you set GOMAXPROCS(1)

I get the pinned (niced) OS thread, and 1-2 "main" OS threads. However, the aggregate CPU time is ~100% of one core. So, even though there are still multiple OS threads running, only one at a time seems to run. Which is good enough for my purposes.

For the curious, I've spent some time refactoring my code so that I can increase the sharing between the "previously scanned" tree and the "scan in progress" tree. I've even thrown in some small object pools/stacks to reduce the load on the GC. A common pattern in the code is: create new inode, open file, read, compute checksum, compare and replace with old inode if identical, insert inode into tree, cleanup, move onto the next entry. For directories, a stack size of 10 is sufficient (max depth of tree, since the code is recursive), and for other inodes a stack size of 1 is sufficient for the common case (no change in the file-system between scans).

Nevertheless, the GC still has a lot of work to do. I find that I can drastically reduce the amount of Sys memory use by calling runtime.GC() frequently, but that of course slows things down a lot. I presume this is because lots of temporary objects are being created and then go out of scope in the call stack, but they are not freed as they go out of scope, even if they were never exposed outside the goroutine or (in some cases) not even outside the function which created them. I can keep implementing new pools for more of the short-lived objects, but this is getting painful. I miss having a normal stack :-/

Are there plans for better freeing of objects which are not exposed outside the scope in which they were created?

Regards,

Richard....

Austin Clements

unread,

Oct 15, 2015, 4:21:03 PM10/15/15

to Richard Gooch, golang-nuts

You're right that GOGC=off also disables runtime.GC(). This is probably the right thing to do, too, since GOGC=off is really meant for debugging whether problems are related to GC, so it makes sense for it to turn off all forms of GC.

AFAIK, this means there's no way to completely disable automatic GC while still allowing runtime.GC() calls. The closest alternative would be to set GOGC to a really high value, which will effectively disable automatic GC.

j...@tendermint.com

unread,

Jan 10, 2016, 5:41:04 PM1/10/16

to golang-nuts, rg+go...@safe-mbox.com

Have you tried using this? https://golang.org/pkg/sync/#Pool

Reply all

Reply to author

Forward