Last Updated:

Understanding Parallel Programming with Ruby Goliath

PostRank recently released a new Ruby web server: Goliath. It uses an event loop in the same way as node.js and nginx to achieve a high level of parallelism, but adds some special sauce that allows traditionally complex asynchronous code to be written in a synchronous style.

For example, asynchronous Ruby code typically looks like this (using the machine's event library):

require 'eventmachine' require 'em-http' EM.run { EM::HttpRequest.new('http://www.sitepoint.com/').get.callback {|http| puts http.response } }

This is useful because it allows the app to perform other actions while the HTTP request completes (this is a "non-blocking" one), but in order to sequentially retrieve the two sites, you need to nest callbacks:

EM::HttpRequest.new('http://www.sitepoint.com/').get.callback {|http| # extract_next_url is a fake method, you get the idea url = extract_next_url(http.response) EM::HttpRequest.new(url).get.callback {|http2| puts http2.response } }

As you can imagine, this model quickly gets dirty. Goliath allows us to write the above code in a simple synchronous way that we are familiar with:

http = EM::HttpRequest.new("http://www.sitepoint.com").get # extract_next_url is a fake method, you get the idea url = extract_next_url(http.response) http2 = EM::HttpRequest.new(url).get

... yet behind the scenes it's still running asynchronously! Other code can be run while HTTP requests are running.

It amazes me. How does it work? Let's find out.

Fiber

Based on the documentation, Goliath claims to be creating his magic "using ruby fibers introduced in Ruby 1.9+". This first hint sends us to ruby rdocs to find:

Fibers are primitives for implementing lightweight collaborative parallelism in Ruby. Essentially, they are a means of creating blocks of code that can be paused and resumed, similar to threads. The main difference is that they are never unloaded and that the planning should be done by the programmer, not the virtual machine.

Ugh, too many big words. Let's just dive in and start learning the Goliath code. The Goliath documentation contains a complete example of a proxy site:

require 'goliath' require 'em-synchrony' require 'em-synchrony/em-http' class HelloWorld < Goliath::API def response(env) req = EM::HttpRequest.new("http://www.google.com/").get resp = req.response [200, {}, resp] end end # to play along at home: # $ gem install goliath # $ gem install em-http-request --pre # $ ruby hello_world.rb -sv

We know that in order for this to happen asynchronously, some fun thing has to happen in this call #get, so let's try to find it. My spider feeling tells me it will be somewhere in em-synchrony/em-http...

$ gem unpack em-synchrony Unpacked gem: '/Users/xavier/Code/tmp/em-synchrony-0.3.0.beta.1' $ cd em-synchrony-0.3.0.beta.1 # I used tab completion on the next line to find the exact path $ cat lib/em-synchrony/em-http.rb

This shows:

# em-synchrony/lib/em-synchrony/em-http.rb begin require "em-http" rescue LoadError =< error raise "Missing EM-Synchrony dependency: gem install em-http-request" end module EventMachine module HTTPMethods %w[get head post delete put].each do |type| class_eval %[ alias :a#{type} :#{type} def #{type}(options = {}, &amp;blk) f = Fiber.current conn = setup_request(:#{type}, options, &amp;blk) conn.callback { f.resume(conn) } conn.errback { f.resume(conn) } Fiber.yield end ] end end end

Jackpot! Fiber! It looks like this is a patch for the existing em-http library, so before we go too far, let's find out what regular em-http code without fibers looks like. The em-http-request wiki has a handy example:

EventMachine.run { http = EventMachine::HttpRequest.new('http://google.com/').get :query =< {'keyname' =< 'value'} http.errback { p 'Uh oh'; EM.stop } http.callback { p http.response_header.status p http.response_header p http.response EventMachine.stop } }

It looks pretty much the same as the above code, which is promising, and when we dive into it, it becomes even more apparent.

$ gem unpack em-http ERROR: While executing gem ... (Gem::RemoteFetcher::FetchError) SocketError: getaddrinfo: nodename nor servname provided, or not known (http://rubygems.org/latest_specs.4.8.gz) # Oh noes it doesn't work! # Search for em gems $ gem list em- *** LOCAL GEMS *** em-http-request (1.0.0.beta.2, 0.3.0) em-socksify (0.1.0) em-synchrony (0.3.0.beta.1) $ gem unpack em-http-request # Ah that is probably it $ cd em-http-request-1.0.0.beta.2 $ ack "get" lib/ lib/em-http/http_connection.rb 4: def get options = {}, &amp;blk; setup_request(:get, options, &amp;blk); end

Notice the last line, which get places right on the setup_request , which is the same call that is made in the fiber example above. Yeah, pretty much the same. Now we can go back to the fiber code.

 

f = Fiber.current conn = setup_request(:#{type}, options, &amp;blk) conn.callback { f.resume(conn) } conn.errback { f.resume(conn) } Fiber.yield

What seems to be happening, instead of immediately doing any work when a callback is called, resume is called on the current fiber, presumably a backup of that thread is started at the yield call point. Checking the documentation for Fiber.yield confirms this, and also explains how the conn variable is returned from this method in the last sentence:

The output returns control to the context that the fiber has resumed, passing on any arguments that have been passed on to it. The fiber will resume processing at this point when the next call resumes. Any arguments passed to the next resume will be the value that this Fiber.yield expression evaluates.

Use this

We now have an idea of how Goliath uses magic, though it can still be fuzzy. Let's see if we do it right by trying to write code that mimics it.

Remember, this fiber trick is just a way to simplify the code filled with the callback, so we should be able to write a method without taking into account the fiber first and then clean it up. I like to start with a simple example, so we're going to write a Goliath base class that locks in for one second and then renders some text.

class Surprise < Goliath::API def response(env) sleep 1 [200, {}, "Surprise!"] end end

Hit it in your web browser and bingo, it waits a second. Not as fast as a tiger, which happens when we issue multiple simultaneous requests:

$ ab -n 3 -c 3 127.0.0.1:9000/ | grep "Time taken" Time taken for tests: 3.011 seconds

Alas, our web server only served one request at a time. It's not web scale. The sleep call blocks not only our response, but the entire server. That's why we moved on to evening programming. Let's try the classic EventMachine timer instead:

class Surprise < Goliath::API def response(env) EventMachine.add_timer 1, proc { [200, {}, "Surprise!"] } end end

Of course, this doesn't work because the method #response should look synchronous. In this case, what happens is that #add_timer returns nil and Goliath immediately tries to render it by exploding in the process. The timer goes off after a while and there's no code to take care of. We cannot send the result of our timer as a return value for the method.

We need to combine the synchronous nature of the first example with the asynchronous elements of the second; beautiful Frankenstein. I hope you understand that we can use fibers for cross-linking.

class Surprise < Goliath::API def response(env) f = Fiber.current EventMachine.add_timer 1, proc { f.resume } Fiber.yield [200, {}, "Surprise!"] end end

We're stealing the pattern we saw in em-synchronicity/em-http above by capturing the current fiber and setting a resume call to asynchronous Fiber.yield that resumes execution in Fiber.yield. Testing this with ab, we can see that this really solves our concurrency problem:

$ ab -n 3 -c 3 127.0.0.1:9000/ | grep "Time taken" Time taken for tests: 1.009 seconds

These fibers are pretty cool.

Completion

By studying goliath's source code and related libraries, we discovered how it performs its asynchronous trick masquerading as synchronous, and was able to put that knowledge into practice with a simple example.

To practice reading the code, here are a few other research tasks you can try:

  • Find where Goliath calls the #response method and see if there are any other hidden tricks with fibers.
  • Explore one of the other libraries for which em-synchrony provides an API, such as em-mongo.
  • Rack-fiber_pool uses fibers in a similar context, test them and see what they're aiming for.

Let us know how you go in the comments. Tune in for next week for more exciting adventures in the jungle of code.