Cache friendliness and performance
Cache friendliness and performance is a very common topic among systems programmers. There are numerous articles explaining CPU architecture and why cache friendliness matters available on the internet. In this blog post, I would like to showcase a simple program that shows the impact on cache friendliness on performance of programs.
Simple counter array
The following program has an integer array of size 8 and 8 concurrent workers incrementing the count 1 billion times independently.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
package main
import "fmt"
import "sync"
import "time"
const concurrency = 8
var counts [concurrency]int64
func main() {
wg := new(sync.WaitGroup)
t0 := time.Now()
for i := 0; i < concurrency; i++ {
wg.Add(1)
go func(offset int) {
defer wg.Done()
for j := 0; j < 1000000000; j++ {
counts[offset]++
}
}(i)
}
wg.Wait()
fmt.Printf("Duration %v\n", time.Since(t0))
fmt.Printf("Counts: %v\n", counts)
}
Most systems have a cacheline size of 64bytes. Now, let us make a minor modification to the program by applying the following patch:
-var counts [concurrency]int64
+var counts [CACHELINE_SIZE * concurrency]int64
-counts[offset]++
+counts[CACHELINE_SIZE*offset]++
Performance has improved by 1.6x! The second program avoids false sharing and avoids performance degradation. If the size of two independent variables are less than the cacheline, these two variables could be placed in the same cacheline. If two concurrent routines modify these variables placed in the same cacheline, modification of one variable will lead to invalidation of other variable in the cache. This can lead to severe performance degradation.