Comments:"Escape Velocity · GitHub"
URL:https://github.com/blog/1475-escape-velocity
We work very hard to keep GitHub fast. Ruby is not the fastest programming language, so we go to great lengths benchmarking and optimizing our large codebase: our goal is to keep bringing down response times for our website even as we add more features every day. Usually this means thinking about and implementing new features with a deep concern for performance (our motto has always been "It's not fully shipped until it's fast"), but sometimes optimizing means digging deep into the codebase to find old pieces of code that are not as performant as they could be.
The key to performance tuning is always profiling, but unfortunately the current situation when it comes to profiling under Ruby/MRI is not ideal. We've been using @tmm1's experimental rblineprof
for it. This little bundle of programming joy hooks into the Ruby VM and traces the stack of your process at a high frequency. This way, as your Ruby code executes, rblineprof
can gather the accumulated time spent on each line of your codebase, and dump informative listings with the data. This is incredibly useful for finding hotspots on any Ruby application and optimizing them away.
Last week, we traced a standard request to our main Rails app trying to find bottlenecks in our view rendering code, and got some interesting results:
# File actionpack/lib/action_view/helpers/url_helper.rb, line 231
| def link_to(*args, &block)
0.2ms ( 897) | if block_given?
0.0ms ( 1) | options = args.first || {}
0.0ms ( 1) | html_options = args.second
0.2ms ( 3) | concat(link_to(capture(&block), options, html_options))
| else
0.3ms ( 896) | name = args.first
1.1ms ( 896) | options = args.second || {}
0.9ms ( 896) | html_options = args.third
|
-> 18.9ms ( 896) | url = url_for(options)
|
| if html_options
9.6ms ( 887) | html_options = html_options.stringify_keys
| href = html_options['href']
0.9ms ( 887) | convert_options_to_javascript!(html_options, url)
-> 66.4ms ( 887) | tag_options = tag_options(html_options)
| else
| tag_options = nil
| end
|
| href_attr = "href=\"#{url}\"" unless href
6.8ms ( 1801) | "<a #{href_attr}#{tag_options}>#{ERB::Util.h(name || url)}</a>".html_safe
| end
| def tag_options(options, escape = true)
0.2ms ( 977) | unless options.blank?
| attrs = []
| if escape
59.1ms ( 974) | options.each_pair do |key, value|
3.9ms ( 3612) | if BOOLEAN_ATTRIBUTES.include?(key)
0.0ms ( 1) | attrs << %(#{key}="#{key}") if value
| else
-> 44.4ms (10971) | attrs << %(#{key}="#{escape_once(value)}") if !value.nil?
| end
| end
| else
0.0ms ( 9) | attrs = options.map { |key, value| %(#{key}="#{value}") }
| end
6.5ms ( 3908) | " #{attrs.sort * ' '}".html_safe unless attrs.empty?
| end
| end
Surprisingly enough, the biggest hotspots in the view rendering code all had the same origin: the escape_once
helper that performs HTML escaping for insertion into the view. Digging into the source code for that method, we saw that it was indeed not optimal:
defescape_once(html)ActiveSupport::Multibyte.clean(html.to_s).gsub(/[\"><]|&(?!([a-zA-Z]+|(#\d+));)/){|special|ERB::Util::HTML_ESCAPE[special]}end
escape_once
performs a Regex replacement (with a rather complex regex), with table lookups in Rubyland for each replaced character. This means very expensive computation times for the regex matching, and a lot of temporary Ruby objects allocated which will have to be freed by the garbage collector later on.
Introducing Houdini, the Escapist
Houdini is a set of C APIs for performing escaping for the web. This includes HTML, hrefs, JavaScript, URIs/URLs, and XML. It also performs unescaping, but we don't talk about that because it spoils the joke on the project name. It has been designed with a focus on security (both ensuring the proper and safe escaping of all input strings, and avoiding buffer overflows or segmentation faults), but it is also highly performant.
Houdini uses different approaches for escaping and unescaping different data types: for instance, when unescaping HTML, it uses a perfect hash (generated at compile time) to match every map entity with the character it represents. When escaping HTML, it uses a lookup table to output escaped entities without branching, and so on.
We wrote Houdini as a C library so we could reuse it from the many programming languages we use internally at GitHub. The first implementation using them is @brianmario's EscapeUtils gem, whose custom internal escaping functions were discarded and replaced with Houdini's API, while keeping the well-known and stable external API.
We had been using EscapeUtils in some places of our codebase already, so it was an obvious choice to simply replace the default escape_once
with a call to EscapeUtils.escape_html
and see if we could reap any performance benefits.
When it comes to real-world performance in Ruby programs, EscapeUtil's biggest advantage (besides the clearly performant C implementation behind it) is that Houdini is able to lazily escape strings. This means it will only allocate memory for the resulting string if the input contains escapable characters. Otherwise, it will flag the string as clean and return the original version, with no extra allocations and no objects to clean up by the GC. This is a massive performance win on an average Rails view render, which escapes thousands of small strings, most of which don't need to be escaped at all.
The result
Once the escaping method was replaced with a call to EscapeUtils, we ran our helpful ./script/bench
. This benchmarking script allows us to compare different branches of our main app and different Ruby versions or VM tweaks to see if the optimizations we are performing have any effect. It runs a specified number of GETs on any route of the Rails app, and measures the average time per request and the amount of Ruby objects allocated.
$ BENCH_RUBIES="tcs" BENCH_BRANCHES="master faster-erb" ./script/bench --user tmm1 --url /github/rails -n 250
-- Switching to origin/master
250 requests to https://github.com/github/rails as tmm1
cpu time: 57,972ms total (231ms avg/req, 223ms - 262ms)
allocations: 73,387,940 objs total (293,551 objs avg/req, 293,550 objs - 293,558 objs)
-- Switching to origin/faster-erb
250 requests to https://github.com/github/rails as tmm1
cpu time: 46,525ms total (186ms avg/req, 181ms - 221ms)
allocations: 68,251,672 objs total (273,006 objs avg/req, 273,005 objs - 273,013 objs)
Not bad at all! Just by replacing the escaping function with a more optimized one, we've reduced the average request time by 45ms, and we're allocating 20,000 less Ruby objects per request. That was a lot of escaped HTML right there!
rblineprof
is still experimental, but if you're working with Ruby, make sure to check it out: @tmm1 has just added support for Ruby 2.0.
And for those of you not running Ruby, we are also open-sourcing the escaping implementation we're now using in GitHub.com as a C library, so you can wrap it and use it from your language of choice. You can find it at vmg/houdini.
We make sure to read every mention on Twitter. If you find a bug or have any questions, send them to support@github.com. Every email is read by a real person.