How to do vdso calls in my own code?

538 views
Skip to first unread message

Pure White

unread,
Apr 26, 2021, 4:24:09 AM4/26/21
to golang-nuts
Hi all,

I'm trying to get time using `CLOCK_REALTIME_COARSE` and `CLOCK_MONOTONIC_COARSE` for performance reasons, and need to use vdso call by hand-written assembly code. That is, I want to reimplement `time.Now` using `CLOCK_REALTIME_COARSE` and `CLOCK_MONOTONIC_COARSE`.

I referenced the code in runtime and found that there's an issue #20427 indicates that I need to switch to g0 for vdso calls, so I tried two methods but neither is good.

## The first method

The first method I tried is just copy the code in runtime and simply change the clockid, but this requires copying all the runtime type definations as well to make the compiler generate "go_asm.h" for me.

The code runs well, but this is really ugly and unmaintainable as the type definations may change across different go versions.

## The second method

The second method I tried is to link the `runtime.systemstack` and use it to do vdso calls:
```go
//go:linkname systemstack runtime.systemstack
//go:noescape
func systemstack(fn func())
```

My code is something like this:

```go
// now calls vdso and is implemented in asm
func now() (sec int64, nsec int32, mono int64)

func Now() {
  var sec, mono int64
  var nsec int32
  systemstack(func() {
    sec, nsec, mono = now()
  })
  ... // logic copied from time.Now()
}
```

The code runs well without `-race`(test isn't enough), but I encountered fatal error under `-race` mode.
For detailed information, I've filed an issue: https://github.com/golang/go/issues/45768.

## The right way?

So I really want to know what is the right way to do vdso call outside runtime?

Thanks very much!

Ian Lance Taylor

unread,
Apr 26, 2021, 6:40:17 PM4/26/21
to Pure White, golang-nuts
The right way is to use cgo. But probably the cost of making the cgo
call will overwhelm the speed advantage of using the coarse clocks.

Using go:linkname to call systemstack is completely unsupported, and
is even less maintainable than your first method.

I have to admit that I don't see a way to do this at all. What is the
application? How much difference will it make to use the coarse
timestamps?

Ian

Pure White

unread,
Apr 26, 2021, 11:58:07 PM4/26/21
to Ian Lance Taylor, golang-nuts
Hi Ian,
As you have said, the cost of making the cgo call is too expensive for my situation.

I know using go:linkname to call systemstack is magic and neither recommended nor supported. But I don’t have other choice.

My application will call `time.Now` very often to get the timestamp, so this optimization is useful for me.

Put this aside, I think go should provide a way to do all the thing I need outside runtime. Maybe exporting something like `runtime.systemstack` sounds reasonable for me as there may be some need to call func on  a large stack such as g0 stack. What do you think?

Ian Lance Taylor

unread,
Apr 27, 2021, 12:14:07 AM4/27/21
to Pure White, golang-nuts
On Mon, Apr 26, 2021 at 8:57 PM Pure White <wu.pur...@gmail.com> wrote:
>
> As you have said, the cost of making the cgo call is too expensive for my situation.
>
> I know using go:linkname to call systemstack is magic and neither recommended nor supported. But I don’t have other choice.
>
> My application will call `time.Now` very often to get the timestamp, so this optimization is useful for me.
>
> Put this aside, I think go should provide a way to do all the thing I need outside runtime. Maybe exporting something like `runtime.systemstack` sounds reasonable for me as there may be some need to call func on a large stack such as g0 stack. What do you think?

runtime.systemstack is a runtime-internal function. It does not make
sense to expose it. The kind of code that can run on the system stack
is quite limited. It's true that making a non-blocking system call is
the kind of code that is OK, but most code would lead directly to a
program crash. It's not Go style to give programs fragile and
dangerous mechanisms like that.

I really encourage you to explain in detail what you are doing, if you
can. That may help lead people to finding a solution that can work.
The position right now is that your program must call
clock_gettime(CLOCK_REALTIME_COARSE) without using cgo. As far as I
can tell that is impossible today, though of course it could be added
to future releases of Go. But first let's find out whether that is in
fact the right solution.

Ian

Pure White

unread,
Apr 27, 2021, 12:40:20 AM4/27/21
to Ian Lance Taylor, golang-nuts
Thanks for your reply!

In my situation, we are doing trace frequently, and we need to call `time.Now` in each trace span.
`time.Now` sometimes cost 1%-2% cpu in flame graph, so I think it’s valuable to do this optimization.

Though this optimization is better-to-have for us, I think maybe there’s other situation that this is necessary such as fin-tech. 

Manlio Perillo

unread,
Apr 27, 2021, 10:43:15 AM4/27/21
to golang-nuts
Il giorno lunedì 26 aprile 2021 alle 10:24:09 UTC+2 Pure White ha scritto:
Hi all,

I'm trying to get time using `CLOCK_REALTIME_COARSE` and `CLOCK_MONOTONIC_COARSE` for performance reasons, and need to use vdso call by hand-written assembly code. That is, I want to reimplement `time.Now` using `CLOCK_REALTIME_COARSE` and `CLOCK_MONOTONIC_COARSE`.

I referenced the code in runtime and found that there's an issue #20427 indicates that I need to switch to g0 for vdso calls, so I tried two methods but neither is good.
< [...]

## The right way?

So I really want to know what is the right way to do vdso call outside runtime?

Thanks very much!

What about using a different function instead of time.Now, and using RawSyscall?

Manlio 

Kurtis Rader

unread,
Apr 27, 2021, 11:42:25 AM4/27/21
to Pure White, golang-nuts
This feels like a "XY Problem" but probably is not an example of that class of problems. Nonetheless, as Ian points out, the Go community can probably provide more useful answers if given more context regarding why you feel the need for this optimization. In my experience applications that call the equivalent of `gettimeofday()` so frequently that function call is a bottleneck tend to have more serious problems. See also questions such as https://stackoverflow.com/questions/58189790/do-clock-monotonic-and-clock-monotonic-coarse-have-the-same-base.

On Mon, Apr 26, 2021 at 1:24 AM Pure White <wu.pur...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/49e183dc-4f13-4627-b0ae-cab2c15ae931n%40googlegroups.com.


--
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

Ian Lance Taylor

unread,
Apr 27, 2021, 11:51:46 AM4/27/21
to Manlio Perillo, golang-nuts
On Tue, Apr 27, 2021 at 7:43 AM Manlio Perillo <manlio....@gmail.com> wrote:
>
> Il giorno lunedì 26 aprile 2021 alle 10:24:09 UTC+2 Pure White ha scritto:
>>
>> So I really want to know what is the right way to do vdso call outside runtime?
>>
> What about using a different function instead of time.Now, and using RawSyscall?

That wouldn't be a VDSO call. VDSO calls are in general faster than
system calls. There is more background at
https://man7.org/linux/man-pages/man7/vdso.7.html.

Ian

Manlio Perillo

unread,
Apr 27, 2021, 11:54:38 AM4/27/21
to golang-nuts
Yes, that wouldn't a VDSO call.  But since a VDSO call will require cgo , maybe RawSyscall will be more efficient? 


Manlio

Amnon

unread,
Apr 27, 2021, 12:13:26 PM4/27/21
to golang-nuts
https://blog.cloudflare.com/its-go-time-on-linux/

A bit old, but still relevant.

Also https://github.com/dterei/gotsc may be useful depending on your requirements.

Ian Lance Taylor

unread,
Apr 27, 2021, 12:13:57 PM4/27/21
to Manlio Perillo, golang-nuts
The meaningful comparison is to time.Now, which uses VDSO.  I suppose it is possible that RawSyscall of clock_gettime with a coarse clock would be faster than time.Now, but I would be surprised.

Ian

Michael Pratt

unread,
Apr 27, 2021, 12:54:25 PM4/27/21
to Ian Lance Taylor, Manlio Perillo, golang-nuts
I'm not sure if calling through the VDSO is the best solution to this specific issue, though it does sound like a case that would certainly benefit.

Regardless, one fairly clean way we could support this would be to make x/sys/unix.ClockGettime (and Gettimeofday) call through the VDSO rather than performing the syscall. That is always a valid operation (the VDSO will make the syscall if it doesn't support the specific options passed), and I think would solve this problem without even changing the API. It seems we don't do that already for simplicity of implementation and prior lack of need.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Michael Pratt

unread,
Apr 27, 2021, 12:59:57 PM4/27/21
to Ian Lance Taylor, Manlio Perillo, golang-nuts
Oops, I should say the syscall package could do this. x/sys/unix has the extra complexity of not being tied to a Go release.

Manlio Perillo

unread,
Apr 27, 2021, 1:02:51 PM4/27/21
to golang-nuts
Il giorno martedì 27 aprile 2021 alle 00:40:17 UTC+2 Ian Lance Taylor ha scritto:
On Mon, Apr 26, 2021 at 1:24 AM Pure White <wu.pur...@gmail.com> wrote:
> [...]

> So I really want to know what is the right way to do vdso call outside runtime?

The right way is to use cgo. But probably the cost of making the cgo
call will overwhelm the speed advantage of using the coarse clocks.

Using go:linkname to call systemstack is completely unsupported, and
is even less maintainable than your first method.

I have to admit that I don't see a way to do this at all. What is the
application? How much difference will it make to use the coarse
timestamps?
Ian

Is it technically possible to generate fast unguarded cgo code, using some directive like `//cgo:fastcall` ?

Thanks
Manlio 

Ian Lance Taylor

unread,
Apr 27, 2021, 3:22:30 PM4/27/21
to Michael Pratt, Manlio Perillo, golang-nuts
On Tue, Apr 27, 2021 at 9:59 AM Michael Pratt <mpr...@google.com> wrote:
>
> Oops, I should say the syscall package could do this. x/sys/unix has the extra complexity of not being tied to a Go release.

It's a good idea, although I'll note that the syscall package does not
currently define ClockGettime.

I think it would be manageable to use build tags to do this in x/sys/unix.

Ian

Ian Lance Taylor

unread,
Apr 27, 2021, 3:27:42 PM4/27/21
to Manlio Perillo, golang-nuts
It is technically possible, but unlikely to be implemented. There is
some discussion at https://golang.org/issue/42469.

Ian

Pure White

unread,
Apr 27, 2021, 11:26:17 PM4/27/21
to golang-nuts
I think this is a good idea. But I suppose this will meet the same problem I have met ---- syscall package is outside of the runtime.

Ian Lance Taylor

unread,
Apr 28, 2021, 12:23:05 AM4/28/21
to Pure White, golang-nuts
On Tue, Apr 27, 2021 at 8:26 PM Pure White <wu.pur...@gmail.com> wrote:
>
> I think this is a good idea. But I suppose this will meet the same problem I have met ---- syscall package is outside of the runtime.

We can introduce hooks into the runtime that are only for the standard
library, because we know exactly how they will be used.

Ian
> To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/a57afc3e-a91a-4564-b540-dce9ff0e2429n%40googlegroups.com.

Pure White

unread,
Apr 28, 2021, 1:22:49 AM4/28/21
to Ian Lance Taylor, golang-nuts
Wow, sounds great!
Really looking forward to this feature!

Ian Lance Taylor

unread,
Apr 28, 2021, 12:11:50 PM4/28/21
to Pure White, golang-nuts
In the meantime we both sent speedups for time.Now. I measured
time.Now as taking about 30 nanoseconds per call before the speedup,
and you measured it as taking about 40 nanoseconds per call before the
speedup. That makes me wonder about your code: if a 40 nanosecond
function is taking 1% to 2% of your execution time, then if your
program is CPU bound you must be calling it a truly extraordinary
number of times.

Ian

Pure White

unread,
Apr 28, 2021, 11:04:58 PM4/28/21
to Ian Lance Taylor, golang-nuts
Hello, 
We are doing heavily tracing, and for each func we need to get the time, so that’s why I’d like to optimize this.

Robert Engels

unread,
Apr 28, 2021, 11:58:37 PM4/28/21
to Pure White, Ian Lance Taylor, golang-nuts
If you are timing so short operations that the overhead is 1-2%, just time every 1000 calls - reduces the overhead to a minimum. Normally makes no difference if the operations are that short. Similar to how benchmarking works - time the total and divide by the number of operations. 

On Apr 28, 2021, at 10:04 PM, Pure White <wu.pur...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Pure White

unread,
Apr 29, 2021, 12:15:34 AM4/29/21
to Robert Engels, Ian Lance Taylor, golang-nuts
Thanks for your suggestion. But in that way, we are not able to capture the abnormal call. 

Pure White

unread,
Apr 29, 2021, 12:20:34 AM4/29/21
to Robert Engels, Ian Lance Taylor, golang-nuts
Anyway, I think we are all consensus to optimize `time.Now`, Ian and me both sent pr for this. Thanks for everyone taking part in.
在 2021年4月29日 +0800 AM11:58,Robert Engels <ren...@ix.netcom.com>,写道:

Amnon

unread,
Apr 29, 2021, 2:20:10 AM4/29/21
to golang-nuts
Why not use the TSC timer for your timings, and convert TSC cycle times into durations during 
post-run analysis?

Robert Engels

unread,
Apr 29, 2021, 7:47:58 AM4/29/21
to Amnon, golang-nuts
The point is that if you are timing “so much” that a 1-2 % overhead (so you must have tons of very small operations) matter to throughput - sampling will still capture enough that a representative event will be recorded.

Still, optimizing time.Now() is always a good idea but I am guessing much of the overhead is function call so it will need to be turned into a compiler intrinsic (like Java) - I’m also guessing that you are suffering sampling bias and even if you fix this you are going to detect a 1-2% overhead somewhere else in the path.

You are going to end up using an in memory histogram because the recording of the event is going to be more expensive than time.Now() - unless you are going to pause the program when an anomaly is detected which is usually not viable. 

1-2% overhead is actually very good for performance monitoring using a general purpose OS and GC language like Go. 



On Apr 29, 2021, at 1:20 AM, Amnon <amn...@gmail.com> wrote:

Why not use the TSC timer for your timings, and convert TSC cycle times into durations during 
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages