pprof : how to dynamically collect profile data from live API server ?

1,290 views
Skip to first unread message

Biju

unread,
Dec 31, 2015, 2:32:05 PM12/31/15
to golang-nuts
Hello all,

We are trying to track own a performance problem in our golang REST API server. We have some beta users using the server through our web UI. Once in a while some of the REST APIs are taking a long time to execute (10 secs instead of the normal 10ms etc). 

We are trying to leverage pprof, but have not found a way to do it. The APIs work well anytime when we tried to monitor using the pprof http end points. We would like to collect the profile data (CPU, memory,... from the current API call stack) from the live beta server only if/when the APIs run slow. The same server is shared by multiple users/APIs. So, ideally we would like a solution where we don't have to start/stop/isolate the server instance and users. Is this possible? Could you please help?  

I have listed the concept we are trying to implement. We will collect profile data for all API calls, log the data if/when the request takes 2 or more seconds.  The comment lines with "FIX" are place holders where we need help.  

Same code is also available at


===================
package main

import (
    "encoding/json"
    "fmt"
    "log"
    "net/http"
    "github.com/gorilla/mux"
    _ "net/http/pprof"
    "time"
)

func main() {
    router := mux.NewRouter()
    
    // enabling pprof
    router.PathPrefix("/debug/").Handler(http.DefaultServeMux)

    router.HandleFunc("/myapi", handleAPI).Methods("GET")
    http.ListenAndServe(":9090", router)
}


func handleAPI(res http.ResponseWriter, req *http.Request) {
    
    perfTracker := startPerfTracker()
    defer perfTracker.stopPerfTracker()
    
    appFunc(res, req)
}

type tracker struct {
    Start time.Time
    //FIX add field to track perf profile data
}

func startPerfTracker() *tracker {
    return &tracker{
        Start: time.Now(),
        //FIX initialize perf profile data
    }
}

func (trk *tracker) stopPerfTracker() {
    
    //FIX stop perf profile data collection
    durnSecs := time.Since(trk.Start).Seconds()
    if durnSecs >= 2 {
        // API took longer than 2 seconds
        // log CPU and Memory profile stats for this call
    }
}



type retData struct {
    StartTime string
    EndTime string
}

func appFunc(res http.ResponseWriter, req *http.Request) {
    
    res.Header().Set("Content-Type", "application/json")
    
    startTime := "Start: " + time.Now().String()
    time.Sleep(3 * time.Second)
    endTime := "End: " + time.Now().String()
    

    data := retData {
        StartTime : startTime,
        EndTime : endTime,
    }

    outgoingJSON, err := json.Marshal(data)

    if err != nil {
        log.Println(err.Error())
        http.Error(res, err.Error(), http.StatusInternalServerError)
        return
    }

    fmt.Fprint(res, string(outgoingJSON))
}
====================

Biju

unread,
Jan 14, 2016, 8:10:42 AM1/14/16
to golang-nuts
Hello all,

Happy to report that we found and fixed our performance problem. It was caused by environmental issues at a third party service we were using.

Just in case anyone else encounter similar situation, here is how we found the root cause. I would love to hear if anyone else has a better/easier solution for tracking profiler data from live servers.

Problem summary: Hard-to-reproduce performance problem in a backend API service written in Go. We had to collect profiler (pprof) data from a live server without disrupting live users. We could not use http/pprof because the problem was hard to reproduce during manual tests.

How we solved it:
* Enabled cpu and memory profiling for the production server main function using github.com/davecheney/profile
* Added code to make sure that the pprof logs are backed up before being overwritten when process starts
* Added timer code at the start/end of the API entry points as described in the last email. The timer code dynamically reported any API call that took more than one second to an error reporting service (we are using https://rollbar.com/ , there are other options)
* We started running three load-balanced instances of our server behind nginx. 
* When the performance problem occurred next time we were notified via Rollbar. We identified the server instance from the error log, gracefully brought the server instance down and analyzed the pprof logs to narrow down to the function that triggered the problem.

We have left the tracker code in the live server, just in case we have any issues in the future. 

Thanks!

Biju

Tamás Gulácsi

unread,
Jan 14, 2016, 8:56:42 AM1/14/16
to golang-nuts
You can try https://github.com/rakyll/gom which is almost that: a continuous pprof. Maybe some persistence shall be added to gom.

Theivaraj S

unread,
Oct 2, 2022, 9:45:48 PM10/2/22
to golang-nuts
how to profiling for post method for rest api in golang 
Reply all
Reply to author
Forward
0 new messages