We recently had an opportunity to compare the garbage collection (GC) behaviour of a small Go application at different heap sizes. This application was built to migrate data away from a legacy datastore. The application contains an in-memory cache which can be configured to clear itself at different rates. This means we can look at the command during a GC cycle at various heap sizes and compare the impact of this on GC behaviour.
We found unexpected and severe whole system pauses during the garbage collector’s mark phase. This was an surprising, as Go widely advertises itself as having a low pause garbage collector. In particular we found that the expected ‘stop the world’ pause was very short, but that whole system pauses experienced during the mark phase were several orders of magnitude longer. Observing that these pauses exist is important for understanding why we see such a large performance impact from GC cycles when there are plenty of CPU resources available and the reported pauses are very short.
In this post we will detail what we saw during this experiment and discuss how this impacts the way we investigate GC performance for all of our systems going forward.
It is worth noting that the system measured here was compiled with Go version 1.10. As the GC algorithm is under constant development these effects may disappear in later releases.