Talk: Introduction into Go Profiling: Tools
This is a written version of a talk I gave on profiling in Go — what tools exist, how to use them, and how to read the output.
What profiling is (and isn’t)
If you’re already doing observability (logs, metrics, traces), profiling fills in the gap: it tells you where your application spends CPU time, allocates memory, or blocks on locks. It’s the difference between knowing your service is slow and knowing why it’s slow.
Go supports several profile types:
- CPU — where execution time goes
- Heap — where memory is allocated
- Goroutine — all current goroutines and their stacks
- Block — where goroutines wait on synchronization
- Mutex — lock contention
Three ways to collect profiles
1. Benchmark tests
The testing package can write profiles during benchmarks:
func BenchmarkGenerateRandomString(b *testing.B) {
for i := 0; i < b.N; i++ {
GenerateRandomString(10)
}
}
go test -bench=. -cpuprofile=cpu.out
go test -bench=. -memprofile=mem.out
go test -bench=. -blockprofile=block.out
go test -bench=. -mutexprofile=mutex.out
Then analyze:
go tool pprof cpu.out
2. runtime/pprof
For standalone programs where you want to control exactly when profiling starts and stops:
import (
"os"
"runtime/pprof"
"log"
)
func startCPUProfile() (*os.File, error) {
f, err := os.Create("cpu.prof")
if err != nil {
return nil, err
}
if err := pprof.StartCPUProfile(f); err != nil {
f.Close()
return nil, err
}
return f, nil
}
func main() {
f, err := startCPUProfile()
if err != nil {
log.Fatal(err)
}
defer pprof.StopCPUProfile()
defer f.Close()
// your application logic
writeHeapProfile()
}
func writeHeapProfile() {
f, err := os.Create("heap.prof")
if err != nil {
log.Fatal(err)
}
defer f.Close()
if err := pprof.WriteHeapProfile(f); err != nil {
log.Fatal(err)
}
}
3. net/http/pprof
For long-running services, expose profiles over HTTP:
import (
"net/http"
_ "net/http/pprof"
"log"
)
func main() {
go func() {
log.Println(http.ListenAndServe("localhost:6060", nil))
}()
// your application logic
}
Then pull profiles on demand:
# browser
open http://localhost:6060/debug/pprof/
# CPU (30 second sample)
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# heap
go tool pprof http://localhost:6060/debug/pprof/heap
# goroutines
go tool pprof http://localhost:6060/debug/pprof/goroutine
# blocking
go tool pprof http://localhost:6060/debug/pprof/block
# mutex contention
go tool pprof http://localhost:6060/debug/pprof/mutex
Reading the output
Interactive mode
go tool pprof cpu.prof
The commands I use most:
top— functions using the most resourcestop -cum— same, but cumulative (includes callees)list <function>— annotated source for a specific functionweb— call graph in the browser (needs graphviz)
Visualization
# flame graph in the browser
go tool pprof -http=:8080 cpu.prof
# PNG call graph
go tool pprof -png cpu.prof > cpu.png
# diff two profiles
go tool pprof -base=old.prof new.prof
What top output means
flat flat% sum% cum cum%
1.5s 50.00% 50.00% 2.0s 66.67% main.processData
0.5s 16.67% 66.67% 0.5s 16.67% runtime.mallocgc
0.3s 10.00% 76.67% 0.8s 26.67% main.parseInput
0.2s 6.67% 83.33% 0.3s 10.00% encoding/json.Unmarshal
flat = time in the function itself. cum = time in the function plus everything it calls.
Heap profiles
Heap profiles show allocations, not current usage. To see current memory:
curl http://localhost:6060/debug/pprof/heap > heap.prof
# current memory
go tool pprof -inuse_space heap.prof
# total allocations over time
go tool pprof -alloc_space heap.prof
Beyond the basics
Continuous profiling
For production, there are tools that collect profiles continuously with low overhead (<1%): Google Cloud Profiler, Datadog, Pyroscope, Parca.
eBPF profiling
eBPF lets you profile without code changes, but you lose Go-specific context — it can’t attribute samples to Go source as accurately. Useful for system-level stuff. Tools: perf, bpftrace, Parca in eBPF mode.
Profile-guided optimization
Go 1.20+ can use CPU profiles to guide compiler optimizations:
go build -pgo=default.pgo
Common scenarios
Memory leaks
go tool pprof http://localhost:6060/debug/pprof/heap
(pprof) top -cum
(pprof) list <suspicious_function>
CPU hotspots
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
(pprof) top
(pprof) web
Goroutine leaks
curl http://localhost:6060/debug/pprof/goroutine?debug=1
go tool pprof http://localhost:6060/debug/pprof/goroutine
Lock contention
runtime.SetMutexProfileFraction(5)
go tool pprof http://localhost:6060/debug/pprof/mutex
Things to watch out for
- Profile optimized builds. Don’t profile with
-gcflagsdebug flags. - CPU profiles need time — 30 seconds minimum to be representative.
- The compiler inlines small functions, which affects where samples get attributed.
net/http/pprofis safe for production. Profiling only happens when you request it.- Use
-benchmemwith benchmarks to see allocation counts:go test -bench=. -benchmem GODEBUG=gctrace=1prints GC stats, useful alongside goroutine profiles.
Resources
- Profiling Go programs (official tutorial)
- Go pprof blog post
- google/pprof on GitHub
- Efficient Go by Bartek Plotka
- Dave Cheney’s performance workshop