time.Now performance

3,901 views
Skip to first unread message

Albert Strasheim

unread,
Nov 7, 2012, 5:33:41 AM11/7/12
to golan...@googlegroups.com
Hello all

We have an app on linux/amd64 that calls time.Now quite frequently.

We're interested in the UnixNano value, so I did some benchmarks that uses the vDSO work from a while back:


I think it might be of interest:

package time_test

import (
"syscall"
"testing"
"time"
)

func now() time.Time {
var tv syscall.Timeval
syscall.Gettimeofday(&tv)
return time.Unix(0, syscall.TimevalToNsec(tv))
}

func BenchmarkTimeNow(b *testing.B) {
for i := 0; i < b.N; i++ {
time.Now()
}
}

func BenchmarkNowGettimeofday(b *testing.B) {
for i := 0; i < b.N; i++ {
now()
}
}

X5675 running 3.3.8-1.fc16.x86_64:

BenchmarkTimeNow 1000000 1030 ns/op
BenchmarkNowGettimeofday 2000000 767 ns/op

E5-2670 running 3.4.7-1.fc16.x86_64:

BenchmarkTimeNow 1000000 1124 ns/op
BenchmarkNowGettimeofday 2000000 759 ns/op

i7-3720QM running 3.6.3-1.fc17.x86_64:

BenchmarkTimeNow 5000000 422 ns/op
BenchmarkNowGettimeofday 20000000 85.7 ns/op

syscall.Time is also a winner if you only need second precision.

Go version here is 96fde1b15506. Not quite ready for 64-bit ints yet.

The E5-2670 numbers seem a bit off. I'll update to 3.6 soon and retest. I'm guessing it's a kernel thing since I can't imagine the i7 being that much faster, but maybe it is...

Maybe there's some scope here for tweaking the time.Now implementation on linux/amd64?

Regards

Albert

Brad Fitzpatrick

unread,
Nov 7, 2012, 7:38:50 AM11/7/12
to Albert Strasheim, golan...@googlegroups.com
Both methods are just microsecond resolution anyway, right?  So I don't see why changing time·now in pkg/runtime/sys_linux_amd64.s would be objectionable, unless the case of the VDSO being unavailable (I don't think it ever is, on Linux versions we support?) is much slower.

File a bug at least.

Anthony Martin

unread,
Nov 7, 2012, 8:24:08 AM11/7/12
to Albert Strasheim, golan...@googlegroups.com
This is a one line change:

diff -r c0761b6a5160 src/pkg/runtime/sys_linux_amd64.s
--- a/src/pkg/runtime/sys_linux_amd64.s Fri Nov 02 20:46:47 2012 +1100
+++ b/src/pkg/runtime/sys_linux_amd64.s Wed Nov 07 05:05:43 2012 -0800
@@ -104,7 +104,7 @@
TEXT time·now(SB), 7, $32
LEAQ 8(SP), DI
MOVQ $0, SI
- MOVQ $0xffffffffff600000, AX
+ MOVQ runtime·__vdso_gettimeofday_sym(SB), AX
CALL AX
MOVQ 8(SP), AX // sec
MOVL 16(SP), DX // usec

Intel Core 2 Duo (2.16 GHz) running 3.6.5-1-ARCH

benchmark old ns/op new ns/op delta
BenchmarkTimeNow 1862 1031 -44.63%
BenchmarkNowGettimeofday 1153 1162 +0.78%

Cheers,
Anthony

Russ Cox

unread,
Nov 7, 2012, 9:12:19 AM11/7/12
to Anthony Martin, Albert Strasheim, golang-dev
If someone wants to puzzle through the format of the time data just
lying there in user memory, it might even be possible to get
nanosecond precision, like we have on OS X.

Russ

minux

unread,
Nov 7, 2012, 2:02:48 PM11/7/12
to Russ Cox, Anthony Martin, Albert Strasheim, golang-dev
I believe the newer VDSO provides a purely user-space gtod using rdtsc
(just like what Darwin did).

This could explain the huge performance difference between time(2)
and gettimeofday(2) in OP's tests.

Russ Cox

unread,
Nov 7, 2012, 2:11:59 PM11/7/12
to minux, Anthony Martin, Albert Strasheim, golang-dev
>> If someone wants to puzzle through the format of the time data just
>> lying there in user memory, it might even be possible to get
>> nanosecond precision, like we have on OS X.
>
> I believe the newer VDSO provides a purely user-space gtod using rdtsc
> (just like what Darwin did).
>
> This could explain the huge performance difference between time(2)
> and gettimeofday(2) in OP's tests.

I agree, but after doing all that work it returns microseconds.
On Darwin because we were forced to do the work ourselves we were able
to get nanoseconds.

Russ

minux

unread,
Nov 7, 2012, 2:28:17 PM11/7/12
to Russ Cox, Anthony Martin, Albert Strasheim, golang-dev
Just digging through kernel source code, it seems we can use vdso version
of clock_gettime to get nanosecond precision and at the same time benefit
from the fast use space implementation.

The reason i don't recommend we implement the calculation code is:
1. i don't believe the data structure is exported, so it might change.
2. the kernel has two clock sources: one is rdtsc, the other is hpet (high precision
event timer), i don't think we want to touch the hpet thing.

Relevant section of vdso source code:

Dave Cheney

unread,
Nov 7, 2012, 4:32:58 PM11/7/12
to minux, Russ Cox, Anthony Martin, Albert Strasheim, golang-dev
If the new code is using RDTSC then it will be highly arch specific as it took intel several goes to get the TSC working properly. From my notes that implies Nehalem based cores only. 

Anthony Martin

unread,
Nov 7, 2012, 8:30:28 PM11/7/12
to minux, Russ Cox, Albert Strasheim, golang-dev
minux <minu...@gmail.com> once said:
> The reason i don't recommend we implement the calculation code is:
> 1. i don't believe the data structure is exported, so it might change.
> 2. the kernel has two clock sources: one is rdtsc, the other is hpet (high
> precision
> event timer), i don't think we want to touch the hpet thing.

I agree with both points here. And, indeed, the kernel's
clock data structure has changed since 2.6.32. Originally,
it had a function pointer that you would call to get the
cycle counter value. As of 3.6.6, however, it contains a
mode flag describing which clock source to read from.

Anthony

Russ Cox

unread,
Nov 7, 2012, 8:32:11 PM11/7/12
to Anthony Martin, minux, Albert Strasheim, golang-dev
I have not caught up with today's deluge of mail yet but if someone
sends that 1-line change to use the vdso implementation, LGTM.

Anthony Martin

unread,
Nov 7, 2012, 9:15:00 PM11/7/12
to Dave Cheney, minux, Russ Cox, Albert Strasheim, golang-dev
Dave Cheney <da...@cheney.net> once said:
> If the new code is using RDTSC then it will be highly arch specific
> as it took intel several goes to get the TSC working properly. From my
> notes that implies Nehalem based cores only.

The user-space implementation on Linux will fall back
to the gettimeofday system call if the TSC is unstable
and the HPET timers are not supported.

Anthony

minux

unread,
Nov 8, 2012, 5:46:39 AM11/8/12
to Anthony Martin, Dave Cheney, Russ Cox, Albert Strasheim, golang-dev
I created https://codereview.appspot.com/6814103/ to use vDSO clock_gettime
on linux/amd64. Beside 3ns performance improvement (27.4ns->24.4ns), time.Now()
now gets real nanosecond precision on Linux/amd64.

Anthony Martin

unread,
Nov 8, 2012, 7:10:42 AM11/8/12
to minux, Dave Cheney, Russ Cox, Albert Strasheim, golang-dev
minux <minu...@gmail.com> once said:
> I created https://codereview.appspot.com/6814103/ to use vDSO clock_gettime
> on linux/amd64. Beside 3ns performance improvement (27.4ns->24.4ns),
> time.Now() now gets real nanosecond precision on Linux/amd64.

You mean "resolution", correct? None of these timer
implementations will give you nanosecond precision
in user space.

Anthony

minux

unread,
Nov 8, 2012, 7:29:43 AM11/8/12
to Anthony Martin, Dave Cheney, Russ Cox, Albert Strasheim, golang-dev
Right. Sorry for the confusion. it is nanosecond resolution.

Maxim Khitrov

unread,
Nov 8, 2012, 7:38:04 AM11/8/12
to Anthony Martin, Dave Cheney, minux, Russ Cox, Albert Strasheim, golang-dev
How is the current TSC value converted to Unix time? Does the
implementation store some base TSC reading with a corresponding
gettimeofday value, or is there some other mechanism involved? I'm
just wondering if something similar could be used on Windows with the
QueryPerformanceCounter function in order to get better resolution.

minux

unread,
Nov 8, 2012, 8:00:05 AM11/8/12
to Maxim Khitrov, Anthony Martin, Dave Cheney, Russ Cox, Albert Strasheim, golang-dev
The linux kernel user context syscall implementation:
http://lxr.linux.no/linux+v3.6.6/arch/x86/vdso/vclock_gettime.c
and code to update the data structure:
I think to use something like this, kernel involvement is necessary.
Reply all
Reply to author
Forward
0 new messages