I'm baffled at what appears to be a small memory leak in my application.
I'm hoping someone here can give me some advice beyond trying to
reproduce the problem with a smaller set of code. I'll do that if I have to,
but I'm hoping I'm just doing something obviously wrong in the code I
discuss below.
I've got a little library that was inspired (stolen) from Rob's demo
on lexing a few years back:
I use it to make it easier to write applications that parse fixed format records.
As part of the logic there's en Emit method that sends the current token over
a channel. It converts the token to a string and sends it back within a struct
Item that captures the token type, it's position, and a string value:
// Emit reports the current item to the client
func (l *Lexer) Emit(t ItemType) {
l.items <- Item{t, l.rpos - int64(l.pos-l.start), string(l.buf[l.start:l.pos])}
l.Skip()
}
The l.buf referenced in that code is a []byte, and the Skip method either
sets l.start to l.pos, advancing the position within l.buf, or copies the
remainder of l.buf to the start of the []byte and resets l.pos and l.start
to zero.
// Skip advances over the current item without reporting it
func (l *Lexer) Skip() {
// We're at a point where we know we have completely read a
// token. If we've read 90% of an l.buf's capacity, shift the
// unread content to the start of the buffer. Otherwise just
// move l.start to the current position.
n := cap(l.buf)
r := n - l.pos
if n/10 >= r {
l.buf, l.start, l.pos = append(l.buf[0:0], l.buf[l.pos:]...), 0, 0
} else {
l.start = l.pos
}
}
In my particular application that is using the lexer library, only time the
string is actually used is to convert the string value to a number using
strconv.Atoi or a string value to a time.T using time.Date, and then it
(and its Item struct) are allowed to fall out of reference, to be garbage
collected.
I didn't think any of this could result in a leak, but when I watch my
program over time it very slowly increases in size by 4 bytes on a
regular basis. A new lexer is created every 5 minutes, used to parse
a chunk of log file data, and then discarded. From what I've observed
the program grows in 4 byte increments 2 or 3 times per hour.
When I look at it in the pprof profiler what I see indicates a number of
allocations from the l.Emit method, where I'm generating a string
out of the sub-slice of l.buf and putting it into a struct that I pass back
along a channel"
l.items <- Item{t, l.rpos - int64(l.pos-l.start), string(l.buf[l.start:l.pos])}
Am I doing something wrong in the code above? Either with the
construction of the Item struct or the string? I played around with the
code to see if I could isolate the growth and I believe it's from the string
allocation. If substitute the above for:
b := make([]byte, l.pos-l.start)
copy(b, l.buf[l.start:l.pos])
l.items <- Item{t, l.rpos - int64(l.pos-l.start), string(b)}
The pprof points to the creation of b. I'm confused at why this
would be a leak, since after a loop of my program as far as I can
tell I am no longer referencing the string once I've finished an
iteration of my program (everything falls out of scope at that
point).
Jim