Are you seeing the average response time / latency of the cache from outside?
If so, you should see lots of really quick responeses, and a few ones as slow as inside that average to what you're seeing.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi,External measurements probably show more acurate picture.
First of all internal latency numbers only include time spent doing actual work but don't include HTTP parsing (by net/http) and network overhead.
Secondly latency measured internally always looks better because it doesn't include application stalls that happened outside of the measured code.
Imagine that it takes 10ms for net/http to parse the request (e.g. due to STW pause). and 1ms to run the handler. The real request latency is 11ms in this case by if measured internally it is only 1ms. This is known as coordinated omission.
# cat access.log-20170318 | grep "17/Mar/2017:03:42:17" | awk '{ print $15,$16 }' | sort | uniq -c
2056 0.000 0.000
200 0.001 0.000
1313 0.001 0.001
3 0.002 0.001
9 0.002 0.002
5 0.003 0.003
3 0.004 0.004
4 0.005 0.005
5 0.006 0.006
4 0.007 0.007
2 0.008 0.007
5 0.008 0.008
1 0.009 0.009
I recommend to watch this video for lots of useful information about latency measurement: https://www.youtube.com/watch?v=lJ8ydIuPFeU
My approach is usually this:When a problem like this occurs, I very quickly switch from random guessing at what the problem can be into a mode where I try to verify the mental model I have of the system. Your mental model is likely wrong, and thus it is leading you astray in what the problem might be. So I start devising metrics that can support the mental model I have. Often, when your model is corrected, you start understanding the pathology of the system. I tend to start from the bottom and work up through the layers, trying to verify in each layer that I'm seeing behavior that isn't out of the ordinary from the mental model I have.
* At 4000 req/s, we are implicitly assuming that each request look the same. Otherwise that is a weak metric as an indicator of system behavior. Are they the same and take the same work? If we log the slowest request every 5 seconds, what does it look like compared to one of the typical ones.
* The 99th percentile ignores the 40 slowest queries. What does the 99.9, 9.99, ... and max percentiles look like?
* What lies between the external measurement and the internal measurement? Can we inject a metric for each of those?
* The operating system and environment is only doing work for us, and not for someone else because it is virtualized, or some other operation is running.
* There is enough bandwidth.
* Caches have hit/miss rates that looks about right.
* The cache also caches negative responses. That is, if an element is not present in the backing store, a lookup in the cache will not fail on repeated requests and go the said backing store.
* 15% CPU load means we are spending ample amounts of time waiting. What are we waiting on?
Start measuring foreign support systems further down the chain. Don't trust your external partners. Especially if they are a network connection away. What are the latencies for the waiting down the line?
* Are we measuring the right thing in the internal measurements? If the window between external/internal is narrow, then chances are we are doing the wrong thing on the internal side.
* The 99th percentile ignores the 40 slowest queries. What does the 99.9, 9.99, ... and max percentiles look like?I'v have no answer to this question. And I don't know how it can help me?
Yep, it's the also the main question! I'm log and graph nginx $request_time, and log and graph internal function time. What is between, I can't log, it's:- local network (TCP);- work in kernel/user space;- golang GC and other run-time;- golang fasthttp machinery before call my http handler.
* Caches have hit/miss rates that looks about right.In my application this is not true caches, it real it's dictionary loaded from database, and user in calculation.
* 15% CPU load means we are spending ample amounts of time waiting. What are we waiting on?Maybe, or maybe the 32 core can process the 4k rps. How can I find out, what my app is waiting on?
* Are we measuring the right thing in the internal measurements? If the window between external/internal is narrow, then chances are we are doing the wrong thing on the internal side.Could you explain this?
As you can see, your hypothesis is not true, more then 99 percent of requests is really fast and occur less the 1 millisecond! And I try to find our what happens in this 1 percent!
On Sun, Mar 19, 2017 at 4:58 PM Alexander Petrovsky <askj...@gmail.com> wrote:* The 99th percentile ignores the 40 slowest queries. What does the 99.9, 9.99, ... and max percentiles look like?I'v have no answer to this question. And I don't know how it can help me?Usually, the maximum latency is a better indicator of trouble than a 99th percentile in my experience. If you improve the worst case, then surely the other cases are likely to follow. However, there are situations where this will hurt the median 50th percentile latency. Usually this trade-off is okay, but there are a few situations where it might not be.
Yep, it's the also the main question! I'm log and graph nginx $request_time, and log and graph internal function time. What is between, I can't log, it's:- local network (TCP);- work in kernel/user space;- golang GC and other run-time;- golang fasthttp machinery before call my http handler.The kernel and GC can be dynamically inspected. I'd seriously consider profiling as well in a laboratory environment. Your hypothesis is that none of these have a discrepancy, but they may have.
* Caches have hit/miss rates that looks about right.In my application this is not true caches, it real it's dictionary loaded from database, and user in calculation.Perhaps the code in https://godoc.org/golang.org/x/text is of use for this? It tends to be faster than maps because it utilizes compact string representations and tries. Of course, it requires you show that the problem is with the caching sublayer first.
* 15% CPU load means we are spending ample amounts of time waiting. What are we waiting on?Maybe, or maybe the 32 core can process the 4k rps. How can I find out, what my app is waiting on?blockprofile is my guess at what I would grab first. Perhaps the tracing functionality as well. You can also adds metrics on each blocking point in order to get an idea of where the system is going off. Functionality like dtrace would be nice, but I'm not sure Go has it, unfortunately.
* Are we measuring the right thing in the internal measurements? If the window between external/internal is narrow, then chances are we are doing the wrong thing on the internal side.Could you explain this?There may be a bug in the measurement code, so you should probably go over it again. One common fault of mine is to place the measurement around the wrong functions, so I think they are detecting more than they are. A single regular expression that is only hit in corner-cases can be enough to mess with a performance profile. Another common mistake is to not have a appropriate decay parameter on your latency measurements, so older requests eventually gets removed from the latency graph[0]In general, as the amount of work a system processes goes up, it gets more sensitive to fluctuations in latency. So even at a fairly low CPU load, you may still have some spiky behavior hidden by a smoothing of the CPU load measure, and this can contribute to added congestion.[0] A decaying Vitter's algorithm R implementation, or Tene's HdrHistogram is preferable. HdrHistogram is interesting in that it uses a floating-point representation for its counters: one array for exponents, one array for mantissa. It allows very fast accounting (nanoseconds) and provides precise measurements around 0 at the expense of precision at, say, 1 hour. It is usually okay because if you waited 1 hour, you don't care if it was really 1 hour and 3 seconds. But at 1us, you really care about being precise.
As you can see, your hypothesis is not true, more then 99 percent of requests is really fast and occur less the 1 millisecond! And I try to find our what happens in this 1 percent!I was probably not clear enough with my explanation. In 99% of cases net/http (or fasthttp) parsing will be very fast (a few micros) and won't add much to the internally measured latency. However in 1% of cases there could be a GC stop the world pause or go runtime decides to use the request goroutine to assist GC or some sub-optimal scheduling decision or I/O and the request will take longer but this will never be reflected in the measured time.
https://golang.org/cmd/trace/ can be used to find out what is happening inside a running Go application. If you capture a trace during interval with request(s) taking more time that usual then you will be able to find out what exactly takes so long (syscalls, scheduler, GC, etc).
Also note that there are still a few latency related bugs in Go runtime. E.g. https://github.com/golang/go/issues/14812, https://github.com/golang/go/issues/18155, https://github.com/golang/go/issues/18534