Comments:" Ruby 2.1: Profiling Ruby · computer talk by @tmm1 "
URL:http://tmm1.net/ruby21-profiling/
Ruby 2.1 is shipping with rb_profile_frames()
, a new C-api for fetching ruby backtraces. The api performs no allocations and adds minimal cpu overhead making it ideal for profiling, even in production environments.
I've implemented a sampling callstack profiler for 2.1 called stackprof with this API, using techniques popularized by gperftools and @brendangregg. It works remarkably well, and provides incredible insight into the execution of your code.
For example, I recently used StackProf::Middlware
on one of our production github.com unicorn workers. The resulting profile is analyzed using bin/stackprof
:
$ stackprof data/stackprof-cpu-4120-1384979644.dump --text --limit 4================================== Mode: cpu(1000) Samples: 9145 (1.25% miss rate) GC: 448 (4.90%)================================== TOTAL (pct) SAMPLES (pct) FRAME 236 (2.6%) 231 (2.5%) String#blank? 546 (6.0%) 216 (2.4%) ActiveRecord::ConnectionAdapters::Mysql2Adapter#select 212 (2.3%) 199 (2.2%) Mysql2::Client#query_with_timing 190 (2.1%) 155 (1.7%) ERB::Util#html_escape
Right away, we see that 2.6% of cpu time in the app is spent in String#blank?
.
Let's zoom in for a closer look:
$ stackprof data/stackprof-cpu-4120-1384979644.dump --method 'String#blank?'String#blank? (lib/active_support/core_ext/object/blank.rb:80) samples: 231 self (2.5%) / 236 total (2.6%) callers: 112 ( 47.5%) Object#present? code: | 80 | def blank? 187 (2.0%) / 187 (2.0%) | 81 | self !~ /[^[:space:]]/ | 82 | end
We see that half the calls into blank?
are coming from Object#present?
.
As expected, most of the time in the method is spent in the regex on line 81. I noticed the line could be optimized slightly to remove the double negative:
def blank?- self !~ /[^[:space:]]/+ self =~ /\A[[:space:]]*\z/
end
That helped, but I was curious where these calls were coming from. Let's follow the stack up and look at the Object#present?
callsite:
$ stackprof data/stackprof-cpu-4120-1384979644.dump --method 'Object#present?'Object#present? (lib/active_support/core_ext/object/blank.rb:20) samples: 12 self (0.1%) / 133 total (1.5%) callers: 55 ( 41.4%) RepositoryControllerMethods#owner 31 ( 23.3%) RepositoryControllerMethods#current_repository callees (121 total): 112 ( 92.6%) String#blank? 6 ( 5.0%) Object#blank? 3 ( 2.5%) NilClass#blank? code: | 20 | def present? 133 (1.5%) / 12 (0.1%) | 21 | !blank? | 22 | end
So Object#present?
is almost always called on String objects (92.6%), with the majority of these calls coming from two helper methods in RepositoryControllerMethods
.
The callers in RepositoryControllerMethods
appeared quite simple, but after a few minutes of staring I discovered the fatal mistake causing repeated calls to present?
:
def owner- @owner ||= User.find_by_login(params[:user_id]) if params[:user_id].present?+ @owner ||= (User.find_by_login(params[:user_id]) if params[:user_id].present?)
end
This simple 2 byte change removed 40% of calls to String#blank?
in our app. To further minimize the cost of String#blank?
in our app, we switched to @samsaffron's pure C implementation in the fast_blank gem.
The end result of all these optimizations was a dramatic reduction in cpu time (i.e. without idle time) spent processing requests:
With 2.1 and stackprof, it's easier than ever to make your ruby code fast. Try it today!