If this code could walk

You Don't Need a Library for File Walking in Go

Some time ago I tested multiple Go file walk implementations on my personal blog.

At the time I was building scc and quicly discovered one of the main bottlenecks in the application was walking the file tree finding files to process. This resulted in me exploring other file walk implementations for Go in order to remove the bottleneck.

The conclusion I came to was to use github.com/karrick/godirwalk which was much faster than any of the other implementations at the time. That conclusion was written in 2018 however, and there have been multiple Go releases since then including performance changes related to files https://golang.org/doc/go1.16#os As such it seemed like a good time to revisit the tests and see if anything has changed.

All code for the tests can be found on Kablamo's GitHub. The tests were written to recursively count the number of files and directories from where the application was run and print the total count at the end. The output for each application should be identical, and this was confirmed for all tests run. Because some walk implementations use goroutines atomic.AddInt64 operations were used when counting for all implementations in order to keep the comparison as close as posible.

Walking the Linux Kernel

The first test was run against a shallow checkout of the linux kernel commit d999ade1cc86cd2951d41c11ea769cb4452c8811 with the excellent hyperfine command line tool used to run the tests. All code was compiled using Go 1.17 and run on a MacBook Air 2020 M1 with 16 GB of RAM.

bash
1
# what version of Go compiled everything?
2
3
$ go version
4
go version go1.17 darwin/arm64
5
6
# verify the output of each application is the same
7
8
$ filepathwalk && filepathwalkdir && iafan && karrick && michealtjones && readdir
9
78498
10
78498
11
78498
12
78498
13
78498
14
78498
15
16
# run the benchmark using hyperfine
17
18
$ hyperfine 'filepathwalk' 'filepathwalkdir' 'iafan' 'karrick' 'michealtjones' 'readdir'
19
Benchmark #1: filepathwalk
20
Time (mean ± σ): 236.4 ms ± 3.3 ms [User: 54.7 ms, System: 186.7 ms]
21
Range (min … max): 232.9 ms … 246.0 ms 12 runs
22
23
Benchmark #2: filepathwalkdir
24
Time (mean ± σ): 141.0 ms ± 1.3 ms [User: 42.5 ms, System: 101.1 ms]
25
Range (min … max): 139.2 ms … 145.9 ms 20 runs
26
27
Benchmark #3: iafan
28
Time (mean ± σ): 93.5 ms ± 3.4 ms [User: 117.0 ms, System: 526.8 ms]
29
Range (min … max): 91.4 ms … 110.0 ms 26 runs
30
31
Benchmark #4: karrick
32
Time (mean ± σ): 263.4 ms ± 1.5 ms [User: 57.0 ms, System: 213.5 ms]
33
Range (min … max): 260.9 ms … 266.4 ms 11 runs
34
35
Benchmark #5: michealtjones
36
Time (mean ± σ): 91.8 ms ± 3.8 ms [User: 106.7 ms, System: 549.3 ms]
37
Range (min … max): 90.0 ms … 109.6 ms 26 runs
38
39
Benchmark #6: readdir
40
Time (mean ± σ): 117.8 ms ± 2.8 ms [User: 22.5 ms, System: 92.9 ms]
41
Range (min … max): 112.1 ms … 124.3 ms 23 runs
42
43
Summary
44
'michealtjones' ran
45
1.02 ± 0.06 times faster than 'iafan'
46
1.28 ± 0.06 times faster than 'readdir'
47
1.54 ± 0.07 times faster than 'filepathwalkdir'
48
2.58 ± 0.11 times faster than 'filepathwalk'
49
2.87 ± 0.12 times faster than 'karrick'
50

graphed results

The results show that the parallel implementations now edge out all other implementations. This is the opposite of what I experienced in 2018.

Interestingly what was the fastest implementaton in 2018 github.com/karrick/godirwalk is now the slowest. It also has the biggest difference in API implementaion and as such should probably not be used anymore. Pleasingly filepath.Walk has improved a lot since I last tried this, and is now fast enough that it should be acceptable for most use cases.

For the remaining implementations the margin of difference is not as large as you would expect when compared to the new ReadDir implementation added in Go 1.16.

Walking Over My Code

Wondering if the results might be due to not having enough data to work with, I tried the implementations against all my projects. With just shy of 180,000 files in different directories, it's about 5.2 GB of content.

bash
1
# whats the size of the content we are about to look over
2
3
$ du -hs
4
5.2G .
5
6
# independently verify how many files we should be scanning
7
8
$ find . | wc -l
9
179709
10
11
# verify the output of each application is the same
12
13
$ filepathwalk && filepathwalkdir && iafan && karrick && michealtjones && readdir
14
179709
15
179709
16
179709
17
179709
18
179709
19
179709
20
21
# run the benchmark using hyperfine
22
23
$ hyperfine 'filepathwalk' 'filepathwalkdir' 'iafan' 'karrick' 'michealtjones' 'readdir'
24
Benchmark #1: filepathwalk
25
Time (mean ± σ): 530.7 ms ± 7.9 ms [User: 150.7 ms, System: 398.9 ms]
26
Range (min … max): 525.2 ms … 550.5 ms 10 runs
27
28
Benchmark #2: filepathwalkdir
29
Time (mean ± σ): 346.6 ms ± 21.5 ms [User: 133.6 ms, System: 230.4 ms]
30
Range (min … max): 329.2 ms … 401.8 ms 10 runs
31
32
Benchmark #3: iafan
33
Time (mean ± σ): 303.7 ms ± 47.7 ms [User: 290.7 ms, System: 1136.0 ms]
34
Range (min … max): 268.3 ms … 435.5 ms 10 runs
35
36
Benchmark #4: karrick
37
Time (mean ± σ): 1.750 s ± 0.017 s [User: 339.9 ms, System: 1434.1 ms]
38
Range (min … max): 1.730 s … 1.784 s 10 runs
39
40
Benchmark #5: michealtjones
41
Time (mean ± σ): 276.3 ms ± 18.3 ms [User: 271.4 ms, System: 1229.3 ms]
42
Range (min … max): 263.3 ms … 308.7 ms 10 runs
43
44
Benchmark #6: readdir
45
Time (mean ± σ): 268.7 ms ± 1.7 ms [User: 60.1 ms, System: 219.6 ms]
46
Range (min … max): 266.6 ms … 271.6 ms 11 runs
47
48
Summary
49
'readdir' ran
50
1.03 ± 0.07 times faster than 'michealtjones'
51
1.13 ± 0.18 times faster than 'iafan'
52
1.29 ± 0.08 times faster than 'filepathwalkdir'
53
1.97 ± 0.03 times faster than 'filepathwalk'
54
6.51 ± 0.08 times faster than 'karrick'
55

graphed results second run

This shows that the lower overhead promised in Go 1.16 is indeed the case. The new implementation is the fastest here, and that's without appling goroutines which could yield more performance for those willing to do so.

Go's Shoes Are Made For Walking

In 2021 the Go native file walk implementations beat any library. It's clear that filepath.Walk has improved enough that its no slower than any other similar implementaion I tried, and the new ReadDir implementation is as fast or faster than any other implementation I tried even without Go routines. It's wonderful to see such baseline performance improvements applied to Go, and it seems even a simple recompile with the latest Go version will give some impressive disk performance improvements.