An example of a goroutine leak and how to debug one
When I approach a function that is combining running a piece of code in a goroutine and communication/cancellation using channels, I am usually tempted to look deeper as that is a great place to introduce a goroutine leak pretty easily and these errors are pretty easy to miss even for non-beginner golang developer. And that’s what I did for this piece of code I found in KUDO.
What is a goroutine leak?
Leaking goroutine is basically a type of memory leak. You start a goroutine but that will never terminate, forever occupying a memory it has reserved. To simplify the example I posted from KUDO a bit, this is an example of how one can introduce a goroutine leak into their project.
What the code does is that there’s an enforced timeout and a periodic operation running inside a goroutine. Every tick, we try to verify some business logic (assert that something is healthy/ready for example) — in this example it just simulates that call by sleeping and then returning.
So where is the leak? In the simplified example above, the timeout is just 1 second while the verify operation is taking 10 seconds. That means that deadline would be enforced first, returning from `waitReady`. Circa 9 seconds later, our goroutine receives result from the verifier and tries to write into doneChan. Write to unbuffered channel is blocking and nobody is listening on that channel because we already returned from waitReady — and here’s our leak!
How to find out if you have a goroutine leak?
Generally speaking, `runtime` package is your friend here. One way is using runtime.NumGoroutine() in a test before and after calling the waitReady function. If number of goroutines before waitReady and after is not the same, you have a leak.
Another option is to use a library from Uber — goleak. If you dive into how that one is implemented, it also relies on runtime package, this time it reads all the stacks (runtime.Stack function) and introduces couple of convenience methods on top of that.