Go 1.1 performance improvements

Comments:"Go 1.1 performance improvements | Dave Cheney"

URL:http://dave.cheney.net/2013/05/21/go-1-1-performance-improvements

This is the first in a series of articles analysing the performance improvements in the Go 1.1 release.

It has been reported (here, and here) that performance improvements of 30-40% are available simply by recompiling your code under Go 1.1. For linux/amd64 this holds true for a wide spectrum of benchmarks. For platforms like linux/386 and linux/arm the results are even more impressive, but I’m putting the cart before the horse.

A note about gccgo. This series focuses on the contributions that the improvements to the gc series of compilers (5g, 6g and 8g) have made to Go 1.1′s performance. gccgo benefits indirectly from these improvements as it shares the same runtime and standard library, but is not the focus of this benchmarking series.

Go 1.1 features several improvements in the compilers, runtime and standard library that are directly attributable for the resulting improvements in program speed. Specifically

Code generation improvements across all three gc compilers, including better register allocation, reduction in redundant indirect loads, and reduced code size.
Improvements to inlining, including inlining of some builtin function calls and compiler generated stub methods when dealing with interface conversions.
Reduction in stack usage, which reduces pressure on stack size, leading to fewer stack splits.
Introduction of a parallel garbage collector. The collector remains mark and sweep, but the phases can now utillise all CPUs.
More precise garbage collection, which reduces the size of the heap, leading to lower GC pause times.
A new runtime scheduler which can make better decisions when scheduling goroutines.
Tighter integration of the scheduler with the net package, leading to significantly decreased packet processing latencies and higher throughput.
Parts of the runtime and standard library have been rewritten in assembly to take advantage of specific bulk move or crypto instructions.

Introducing autobench

Few things irk me more than unsubstantiated, unrepeatable benchmarks. As this series is going to throw out a lot of numbers, and draw some strong conclusions, it was important for me to provide a way for people to verify my results on their machines.

To this end I have built a simple make based harness which can be run on any platform that Go supports to compare the performance of a set of synthetic benchmarks against Go 1.0 and Go 1.1. While the project is still being developed, it has generated a lot of useful data which is captured in the repository. You can find the project on Github.

https://github.com/davecheney/autobench

I am indebted to Go community members who submitted benchmark data from their machines allowing me to make informed conclusions about the relative performance of Go 1.1.

If you are interested in participating in autobench there will be a branch which tracks the performance of Go 1.1 against tip opening soon.

A picture speaks a thousand words

To better visualise the benchmark results, AJ Starks has produced a wonderful tool, benchviz which turns the dry text based output of misc/benchcmp into rather nice graphs. You can read all about benchviz on AJ’s blog.

http://mindchunk.blogspot.com.au/2013/05/visualizing-go-benchmarks-with-benchviz.html

Following a tradition set by the misc/benchcmp tool, improvements, be they a reduction in run time, or an increase in throughput, are shown as bars extending towards the right. Regressions, fall back to the left.

Go 1 benchmarks on `linux/amd64`

The remainder of this post will focus on linux/amd64 performance. The 6g compiler is considered to be the flagship of the gc compiler suite. In addition to code generation improvements in the front and back ends, performance critical parts of the standard library and runtime have been rewritten in assembly to take advantage of SSE2 instructions.

The data for the remainder of this article is taken from the results file linux-amd64-d5666bad617d-vs-e570c2daeaca.txt.

The go1 benchmark suite, while being a synthetic benchmark, attempts to capture some real world usages of the main packages in the standard library. In general the results support the hypothesis of a broad 30-40% improvement. Looking at the results submitted to the autobench repository it is clear that GobDecode and Gzip have regressed and issues 5165 and 5166 have been raised, respectively In the latter case, the switch to 64 bit ints is assumed to be at least partially to blame.

net/http benchmarks

This set of benchmarks are extracted from the net/http package and demonstrated the work that Brad Fitzpatrick and Dmitry Vyukov, and many others, have put into net and net/http packages.

Of note in this benchmark set are the improvements in ReadRequest benchmarks, which attempt to benchmark the decoding a HTTP request. The improvements in the ClientServerParallel benchmarks are not currently available across all amd64 platforms, as some of them have no support for the new runtime integration with the net package. Finishing support for the remaining BSD and Windows platforms is a focus for the 1.2 cycle.

Runtime microbenchmarks

The final set of benchmarks presented here are extracted from the runtime package.

The runtime benchmarks represent micro benchmarks of very low level parts of the runtime package.

The obvious regression is the first Append benchmark. While in wall time, the benchmark has increased from 36 ns/op to 100 ns/op, this shows that for some append use cases there has been a regression. This may have already been addressed in tip by CL 9360043.

The big wins in the runtime benchmarks are the amazing new map code by khr which addresses issue 3886, the reduction in overhead of channel operations (thanks to Dmitry’s new scheduler), improvements in operations involving complex128 operations, and speedups in hash and memmove operations which were rewritten in 64bit assembly.

Conclusion

For linux/amd64 on modern 64 bit Intel CPUs, the 6g compiler and runtime can generate significantly faster code. Other amd64 platforms share similar speedups, although the specific improvements vary. I encourage you to review the benchmark data in the autobench repository and if you are able, submit your own results.

In subsequent articles I will investigate the performance improvement Go 1.1 brings to 386 and arm platforms..

Go 1.1 performance improvements | Dave Cheney

Introducing autobench

A picture speaks a thousand words

Go 1 benchmarks on `linux/amd64`

net/http benchmarks

Runtime microbenchmarks

Conclusion

Trending Articles

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Moondru Mudichu 27-05-2016 – Polimer tv Serial

Password Reset on SX6036?

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

Rigol oscilloscope teardown and repair

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

Chai Status, Funny Tea Quotes in Hindi, चाय पर शायरी

Samuel Llewellyn Richards

GTA 5 PPSSPP Zip File Download For Android Mediafire 382 MB

Cheriyal Mandal Sarpanch Mobile Numbers List Warangal District in Telangana...

Practice Sheet of Right form of verbs for HSC Students

Snes4Sym emulator for nokia s60v3

Revised GDS Gratuity, Severance Amount and SDBS contribution - Social...

Name Of Parts Of The Day In hindi And English-List Of Part Of Days In Hindi

Throw Back: Samini — Where My Baby Dey (Prod by Kaywa)

DRP MAKER WITH CHEMICALS 9491234553

Man to stand trial on three charges of money laundering

Joshua Pigden from Bristol faces trial over rape and Diazepam...

Gulabi kallu Lyrics and translation | GAV / Govindhudu andhari vadele (2014)

SPY CAMERAS: Bus lane clampdown will be running in Derby by the end of November

Introducing autobench

A picture speaks a thousand words

Go 1 benchmarks on linux/amd64

net/http benchmarks

Runtime microbenchmarks

Conclusion

Trending Articles

Go 1 benchmarks on `linux/amd64`