Rails on the Run

Rails experiments by Matt Aimonetti

Browsing Posts tagged benchmarks

At work, we have an XML API that gets quite a lot of traffic. Last week I looked into improving its performance since we are expecting more traffic soon and want to make sure our response time is optimized.

My first thought was to make sure we had  an optimized ActiveSupport’s xmlmini backend. Rails 2.3.5 fixed some issues when using Nokogiri as a xmlmini so I switched to my favorite Ruby XML lib:

ActiveSupport::XmlMini.backend = 'Nokogiri'

I run some benchmarks using ab, httperf and jmeter but the results were not that great. I was so sure that switching from rexml to nokogiri would give me awesome results that I was very disappointed.

I was about to call Aaron Patterson (Nokogiri’s author) to insult him, blame him for _why’s disappearance and tell him that all his pro bono efforts were useless since my own app was not running much faster when switched to his library. As I was about to dial his number on my iPhone I had a crazy thought… maybe it was not Aaron’s fault, maybe it was mine.

So I took a break went to play some fuzzball and as I was being ridiculed by Italian coworker, Emanuele, I realized that most of our API calls were just simple HTTP requests with no XML payload, just some query params. However, we were generating a lot of XML to send back to the client and AS::XmlMini only takes care of the XML parsing, not the rendering.

The XML rendering is done by Jim Weirich’s pure Ruby builder library which is vendored in Rails. Builder does a good job, but I thought that maybe a C based library might improve the speed. A coworker of mine (James Bunch) recommended to look into faster-builder, a drop-in Builder replacement based on libxml. Unfortunately, the project doesn’t seem to be maintained and I decided to look into using Nokogiri XML builder instead. (Also, faster-builder’s author doesn’t like me very much while Aaron knows he’s one of my Ruby heroes so asking for help could be easier)

Some people reported having tried using Nokogiri as a XML builder but didn’t see much speed improvement. Because of the amount of magic required to render a rxml template, I was not really surprised but I decided to contact Aaron and ask him if he tried using his lib instead of builder in a Rails app. Aaron told me he gave it a try a while back and he helped me get my Rails app setup to render xml templates using Nokogiri.

The next step was simple, create a benchmark app and benchmark Builder vs Nokogiri using various templates. Here are the results I got using Ruby 1.9.1 (the Ruby version we use in production) and two sets of templates:

Builder small template, time per request: 15.507 [ms] (mean)

$ ab -c 1 -n 200 http://127.0.0.1:3000/benchmarks/builder_small
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests

Server Software:        nginx/0.7.65
Server Hostname:        127.0.0.1
Server Port:            3000

Document Path:          /benchmarks/builder_small
Document Length:        216 bytes

Concurrency Level:      1
Time taken for tests:   3.101 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      114326 bytes
HTML transferred:       43200 bytes
Requests per second:    64.49 [#/sec] (mean)
Time per request:       15.507 [ms] (mean)
Time per request:       15.507 [ms] (mean, across all concurrent requests)
Transfer rate:          36.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    11   15   8.8     12      47
Waiting:        3   15   8.9     12      47
Total:         11   15   8.8     12      47

Percentage of the requests served within a certain time (ms)
  50%     12
  66%     12
  75%     13
  80%     13
  90%     35
  95%     36
  98%     38
  99%     41
 100%     47 (longest request)

Nokogiri small template, time per request: 15.354 [ms] (mean)

$ ab -c 1 -n 200 http://127.0.0.1:3000/benchmarks/noko_small
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests

Server Software:        nginx/0.7.65
Server Hostname:        127.0.0.1
Server Port:            3000

Document Path:          /benchmarks/noko_small
Document Length:        238 bytes

Concurrency Level:      1
Time taken for tests:   3.071 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      118717 bytes
HTML transferred:       47600 bytes
Requests per second:    65.13 [#/sec] (mean)
Time per request:       15.354 [ms] (mean)
Time per request:       15.354 [ms] (mean, across all concurrent requests)
Transfer rate:          37.75 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    11   15   8.6     12      39
Waiting:       11   15   8.6     12      39
Total:         11   15   8.6     12      39

Percentage of the requests served within a certain time (ms)
  50%     12
  66%     12
  75%     12
  80%     13
  90%     35
  95%     36
  98%     37
  99%     38
 100%     39 (longest request)

Running the benchmarks many times showed that Nokogiri and Builder were taking more or less the same amount of time to builder a small template.

I then decided to try a bigger template, closer to what we have in production, here are the results:

Nokogiri longer template, time per request: 31.252 [ms] (mean)

$ ab -c 1 -n 200 http://127.0.0.1:3000/benchmarks/noko
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests

Server Software:        nginx/0.7.65
Server Hostname:        127.0.0.1
Server Port:            3000

Document Path:          /benchmarks/noko
Document Length:        54398 bytes

Concurrency Level:      1
Time taken for tests:   6.250 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      10951200 bytes
HTML transferred:       10879600 bytes
Requests per second:    32.00 [#/sec] (mean)
Time per request:       31.252 [ms] (mean)
Time per request:       31.252 [ms] (mean, across all concurrent requests)
Transfer rate:          1711.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    24   31  11.3     26      62
Waiting:       23   30  11.3     24      61
Total:         24   31  11.3     26      62

Percentage of the requests served within a certain time (ms)
  50%     26
  66%     27
  75%     27
  80%     29
  90%     54
  95%     55
  98%     58
  99%     59
 100%     62 (longest request)

Builder, longer template, Time per request: 140.725 [ms] (mean)

$ ab -c 1 -n 200 http://127.0.0.1:3000/benchmarks/builder
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 100 requests
Completed 200 requests
Finished 200 requests

Server Software:        nginx/0.7.65
Server Hostname:        127.0.0.1
Server Port:            3000

Document Path:          /benchmarks/builder
Document Length:        54376 bytes

Concurrency Level:      1
Time taken for tests:   28.145 seconds
Complete requests:      200
Failed requests:        0
Write errors:           0
Total transferred:      10947000 bytes
HTML transferred:       10875200 bytes
Requests per second:    7.11 [#/sec] (mean)
Time per request:       140.725 [ms] (mean)
Time per request:       140.725 [ms] (mean, across all concurrent requests)
Transfer rate:          379.83 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       1
Processing:   127  141  24.6    132     331
Waiting:      126  139  23.6    130     328
Total:        127  141  24.6    132     331

Percentage of the requests served within a certain time (ms)
  50%    132
  66%    138
  75%    147
  80%    149
  90%    156
  95%    169
  98%    193
  99%    311
 100%    331 (longest request)

Wow, @tenderlove’s Nokogori just brought us a 4.5X speed improvement for this specific template.  100ms per request is probably not a big deal for most people and I have to say that Jim did a great job with Builder. However in my specific case, 100ms on a request being called thousands of times per hour is quite important.

(The benchmark app is available on github, feel free to fork it and benchmark your own templates)

Who would have thought that a man like this could save the day?!

Aaron 'Tenderlove' Patterson





The moral of the story is that adding a bit of tenderlove to your Ruby code can make it perform much much better!

Thank you Aaron ‘Tenderlove’ Patterson!

In a previous article I took an example of bad metaprogramming and I pushed people to think twice before using metaprogramming.

My main points were that:

  • you might make your code way slower if you don’t know what you are doing
  • readability might drop considerably
  • maintainability can become an issue

People left some very good comments about how to write the same module using metaprogramming and keep things fast.

Today Wycats pinged me about this post and told me that the issue was definemethod and that classeval is effectively the same as regular code, it gets evaluated in eval.c, just like regular Ruby code. On the other hand, defined_method has to marshall the proc.

I cleaned up my benchmarks using rbench, added some of the solutions provided to me and obtained the following results:

results

Here is the original/bad metaprogramming example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

  module MetaTimeDSL

    {:second => 1, 
     :minute => 60, 
     :hour => 3600, 
     :day => [24,:hours], 
     :week => [7,:days], 
     :month => [30,:days], 
     :year => [364.25, :days]}.each do |meth, amount|
      define_method "#{meth}" do
        amount = amount.is_a?(Array) ? amount[0].send(amount[1]) : amount
        self * amount
      end
      alias_method "#{meth}s".intern, "#{meth}"
    end

  end
  Numeric.send :include, MetaTimeDSL

The no metaprogramming module is available there

Refactored:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

  module RefaMetaTimeDSL

  {:second => 1, 
   :minute => 60, 
   :hour => 3600, 
   :day => [24,:hours], 
   :week => [7,:days], 
   :month => [30,:days], 
   :year => [364.25, :days]}.each do |meth, amount|
    self.class_eval <<-RUBY
      def r_#{meth}
        #{amount.is_a?(Array) ? "#{amount[0]}.#{amount[1]}" : "#{amount}"}
      end
      alias_method :r_#{meth}s, :r_#{meth}
    RUBY
  end

end
Numeric.send :include, RefaMetaTimeDSL

the refactor 2 or eval based solution provided by Matt Jones which uses class_eval like the previous refactor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

  module EvalMetaTimeDSL
    def self.included(base)
      base.class_eval do
        [ [:e_second, 1], 
          [:e_minute, 60], 
          [:e_hour, 3600], 
          [:e_day, [24,:e_hours]], 
          [:e_week, [7,:e_days]], 
          [:e_month, [30,:e_days]], 
          [:e_year, [365.25, :e_days]]].each do |meth, amount|
            amount = amount.is_a?(Array) ? amount[0].send(amount[1]) : amount
            eval "def #{meth}; self*#{amount}; end"
            alias_method "#{meth}s", meth
          end
      end
    end
  end
  Numeric.send :include, EvalMetaTimeDSL

and finally, the “better metaprogramming” version:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

 module GoodMetaTimeDSL

  SECOND  = 1
  MINUTE  = SECOND * 60
  HOUR    = MINUTE * 60
  DAY     = HOUR * 24
  WEEK    = DAY * 7
  MONTH   = DAY * 30
  YEAR    = DAY * 364.25
  
  %w[SECOND MINUTE HOUR DAY WEEK MONTH YEAR].each do |const_name|
      meth = const_name.downcase
      class_eval <<-RUBY 
        def g_#{meth} 
          self * #{const_name} 
        end 
        alias g_#{meth}s g_#{meth} 
      RUBY
  end
end
Numeric.send :include, GoodMetaTimeDSL

Looking at the refactored version by Wycats, you can see he’s right and the major issue with the original version was definemethod. Using classeval does make things almost as fast and even faster than the no metaprogramming version.

Interesting enough, the benchmarks show that some methods from the meta modules are faster than the ones from the no meta module. Overall, an optimized metaprogramming can be more or else as fast as a non meta code. Of course, with the new VMs coming up, things might change a little bit depending on the language implementation.

In conclusion, metaprogramming can be as fast as no metaprogramming but that won’t help your code readability and maintainability, so make sure to only use this great trick when needed!

p.s: here is the benchmark file if you don’t believe me ;)