Ruby Unconference 2018 Protobufs, gRPC and Ruby

Monday, 08 October 2018
Dan speaking at London Ruby Unconference 2017
Speaking at unconference last year. Did anyone get a snap from this year? Please send it in!

One of my favourite Ruby meetups is the London Ruby Unconference, which took place last Saturday, just round the corner from Buckingham Palace.

Unlike conferences where the audience is mostly passive, and speakers are representing big-name tech companies, stealthily (or openly) promoting something, an 'unconference' tries to do the opposite: encouraging participation from its attendants.

A real-world demonstration of anarchy in action, the conference schedule is planned on the morning of the event. Anyone can propose a session, and the audience votes with their feet. In this way, even a rough idea for a session can make it onto the agenda.

What I like about this is that I might take an idea of something that I'd like to explore in Ruby, float it as a session and spend time with like minded people working it out.

Enter Protobuf

I'd recently been tasked in a FinTech project with learning about Google's protobuf definitions language for describing binary formats in web services. In high-volume, real-time trading, time is of the essence. Algorithmic trading relies on micro changes in data in order to make decisions, and every nano second counts towards your competitive advantage.

Coming from a Ruby and Rails background, I thought we'd slayed the RPC / SOAP / XML dragon long ago. HTTPS and JSON formats are prevalent all over the web, with RESTful conventions making designing APIs straightforward and predictable. However, when speed is key, protobufs are well worth a look. I wanted to know whether the tooling and workflow would stand up to the standards set by Rails, Padrino and Sinatra, or whether it would be a hark back to my Enterprise JavaBean days.

Although I'd worked with Go on the FinTech project, one of the nice things about protobufs is that by defining your API endpoints and the messages they deliver in one standard syntax, it is possible to have clients and services to be written in various languages, and still be interoperable with each other.

Therefore, I decided to pitch a session on getting started with gRPC and protobufs in Ruby, and see how it compared with the RESTful approach I was used to with Rails.

Getting Started

Since I was attending an event, I thought I'd use the example of attending events and demo a calendar application, using grpc to handle the backend code. With a time limit of around 45 minutes, I thought I could just about demo adding an event to a calendar, and maybe show how we could retrieve them too.

All of the code in this example is on my GitHub

Initially, I created a git repo and some folders to separate my proto definitions from the Ruby code.

  git init calendar_demo
cd calendar_demo
mkdir proto
mkdir lib

To begin fleshing out what our calendar web service can do, I started with a protobuf definition file, a separate API language definition that defines what endpoints the API will support, and what messages they will communicate to each other. This differs from the RESTful approach, where we might start with a resource representing an Event, and expecting API endpoints to match up by convention.

The first draft of the protobuf looked like this:

  syntax = 'proto3';

service Calendar {
  rpc AddEvent(Event) returns (EventAdded) {}
}

message Event {
  string name = 1;
  string date = 2;
}

message EventAdded {
  bool added = 1;
}

Here, we're defining a web service called Calendar, which represents the entire API. Within that, we can have a number of endpoints, which perform separate tasks or services. It's somewhat similar to a class and it's methods; except the method calls are triggered by the underlying protobuf binary messages, over a transport layer (more on that later).

Our AddEvent endpoint needs to know information about the event we are adding. We define this as a message, which I've named Event. Each message consists of a number of fields, each with basic data types, such as string, floats, integers; types that can be serialised (converted into binary), and deserialised at the other end. To keep things simple, I'm just considering an event as storing a name (what we're doing) and a date (when we're doing it).

Note that each field on the message is numbered in ascending order. This is to maintain backwards compatibility with other clients / services that use our service, but might not have the latest codebase, rather than changing the format of the message. In this way, we can add new fields with higher numbers, or change their types, without confusing other clients and services.

As RESTful developers, we might expect such an endpoint to simply acknowledge that the event was added successfully, perhaps with a HTTP status code of 200 or 201. With protobufs, we need to explicitly define the response too, and so here I define another message EventAdded, which encapsulates a boolean value which indicates whether the event was added or not. In practice, we might wish to include more information here, for instance whether the event was added immediately or queued for processing later.

So with this rather concise definition, we've defined a request / response for an endpoint that is designed to add an event to our calendar.

Generating Clients and Servers

Whilst the protobuf definition is short and readable, it won't do very much for us until we put it into action. That means we need to create the code that implements this definition, taking care of serialisation / deserialisation, sending / receiving messages, and routing to the right endpoint. Fortunately, most of this hard work is done for us, using the grpc toolset, leaving us to focus just on our business logic and think in terms of events and calendars.

Using the grpc-tools gem, we can side-step the setup and install of the protoc compiler and plugins, and start from a pre-made binary. I added it to my Gemfile, along with the grpc dependency.

  source "https://www.rubygems.org"

gem 'grpc'
gem 'grpc-tools'

I could then generate the code in a couple of commands:

  bundle
grpc_tools_ruby_protoc -Iproto calendar.proto --ruby_out=lib --grpc_out=lib

This second command asks the bundler-managed, protoc compiler to generate the client and server stubs from our API definition, looking for the protobuf file in the proto folder, and to output the resulting code in the lib folder.

If you sift through the generated code, you will see that it generated:

  • calendar_services.pb, containing a module named after our service, registering our AddEvent endpoint
  • calendar_pb.rb, registering the events that we had defined.

Defining the Server

Ultimately, we can't have the grpc toolset define everything for us; otherwise we'd be out of a job! We need to define what happens when the service receives the AddEvent message, using the generated service as a template. I stored the following code in lib/calendar_server.rb:

  require 'grpc'
require 'calendar_services_pb'
require 'calendar_pb'

module Calendar
  class Server < Service
    def add_event(event, _call)
      # TODO - Business logic!
    end
  end
end

I used good old fashioned inheritance to subclass the Service, inheriting all the methods and functionality generated by the toolset. We are then expected to provide methods for each endpoint that we've defined, so I wrote an add_event method. The method is passed an Event object, which encapsulates our name and date fields, and a _call, which encapsulates the request.

Whilst this is a great start, we have not yet got anything that implements our service, or anything to define the transport layer - that is, how messages are sent and received by the service. The below snippet is recommended from the grpc tutorial as a way of getting started; in practice this might be called elsewhere but for now I'll add it to the server class.

  require 'grpc'
require 'calendar_services_pb'
require 'calendar_pb'

module Calendar
  class Server < Service
    def add_event(event, _call)
      # TODO - Business logic!
    end
  end
end

addr = "0.0.0.0:8080"
s = GRPC::RpcServer.new
s.add_http2_port(addr, :this_port_is_insecure)
puts("... running insecurely on #{addr}")
s.handle(Calendar::Server.new)
s.run_till_terminated

This is just enough code to run the server, but it would not do very much; any connecting clients would hang forever, because our add_event method does not return anything. Let's add a little business logic and naively store our Event in an array:

  require 'grpc'
require 'calendar_services_pb'
require 'calendar_pb'

module Calendar
  class Server < Service
    def initialize
      @events = []
    end

    def add_event(event, _call)
      @events << event

      return EventAdded.new(added: true)
    end
  end
end

addr = "0.0.0.0:8080"
s = GRPC::RpcServer.new
s.add_http2_port(addr, :this_port_is_insecure)
puts("... running insecurely on #{addr}")
s.handle(CalendarServer.new)
s.run_till_terminated

Now that we've got our first draft of the server, you might consider splitting your terminal in half and running the code with:

ruby -Ilib lib/calendar_server.rb

You should see the caveat that the service is running, albeit insecurely. It is sitting in an infinite loop waiting for your requests to arrive!

Sending Client Requests

One drawback of protobufs from a developer's perspective, is that as a low-level, binary protocol, we can't easily test out our API using something like curl. Instead we need to use the client stub provided to make a Ruby client. In practice, this code would also likely be embedded within a larger application, but for now it will suffice just to make a calendar_client.rb in the lib folder with the following code:

  require 'grpc'
require 'calendar_services_pb'
require 'calendar_pb'

stub = Calendar::Stub.new('localhost:8080', :this_channel_is_insecure)
event = Event.new(name: 'Ruby Unconference', date: '2018-10-06')
response = stub.add_event(event)
puts response.added

Here, we're instantiating a suitable client that knows how to 'speak' to our endpoint, and passing it a message. The client handles the response, and gives us an object which provides a getter for the added field, so we can see whether the event was added or not.

ruby -Ilib lib/calendar_client.rb

Streaming Events

One of the nice features of protobufs is that it has support for streaming APIs, which is great for handling lots of data gradually, rather than building huge responses and leaving the client waiting around.

To illustrate this, I demonstrated how you might implement recurring monthly events in a new API endpoint called ShowEvents. This endpoint would stream events one at a time, generating recurring events if necessary from an initial event.

I began by extending our protobuf definition with a new endpoint. All we need to do to make this endpoint a streaming API is to add the stream directive before the return value. I added an EventRange message that specified a range of dates to select from, and I also updated the Event message to include a recurring field.

  syntax = 'proto3';

service Calendar {
  rpc AddEvent(Event) returns (EventAdded) {}
  rpc ShowEvents(EventRange) returns (stream Event) {}
}

message Event {
  string name = 1;
  string date = 2;
  bool recurring = 3;
}

message EventAdded {
  bool added = 1;
}

message EventRange {
  string from = 1;
  string to = 2;
}

Since the API definition has changed, we'd need to update the generated code as before. This gives rise to an important consideration, documented in the generated code, that we must not edit the generated files by hand; otherwise those changes would be overwritten the next time we (or our teammates) update the protobuf definition file.

After running the generator, we only need to add a new method to our server (lib/calendar_server.rb) to support the new endpoint (I'll leave out the rest of the file for brevity). The Ruby grpc implementation makes good use of the Enumerator class here, allowing us to define our stream as an Enumerator - i.e. we define the sequence, and the underlying code will call each repeatedly until it runs out. Thus we can trivially return each of the events in our array in a stream:

  def show_events(search, _call)
  @events.each
end

Using the build-in select method, we can easily add the date range filtering, using the dates supplied in the EventSearch message:

  def show_events(range, _call)
  from = Date.parse(range.from)
  to = Date.parse(range.to)

  @events.select {|event| Date.parse(event.date) <= to && Date.parse(event.date) >= from }.each
end

Since Ruby is weakly typed, there is quite a bit of type conversion going on, between String and Date, which is making the code a bit repetitive. Although Ruby is ambivalent about a lot of these considerations, when it comes to API design, we'll be working with strongly typed code more often.

I found that a good approach for avoiding this casting was to open the Event class provided, and add a getter that provides the Ruby Date equivalent of the underlying data. This allows us to keep using strings for the underlying serialisation, but providing dates for use in our logic

  class Event
  def actual_date
    @actual_date ||= Date.parse date
  end
end

def show_events(range, _call)
  from = Date.parse(range.from)
  to = Date.parse(range.to)

  @events.select {|event| event.actual_date <= to && event.actual_date >= from }.each
end

Using a somewhat obscure, but very powerful feature of the Enumerator class, we can also pass a block to the initialize method of the Enumerator class, and define what each iteration's yielded value is. This is useful for instances when we want to define our own custom sequences.

Using this approach, I tested whether an event was recurring or not, and if so, would generate new dates one month ahead of the original. Sticking this code in a while loop meant I could generate an arbitrary number of events in my stream:

  def show_events(range, _call)
  from = Date.parse(range.from)
  to = Date.parse(range.to)

  Enumerator.new do |y|
    @events.select {|event| event.actual_date <= to && event.actual_date >= from }.each do |event|
      y << event
      while event.recurring && event.actual_date < to
        # assume monthly its only a demo :)
        event = Event.new(name: event.name, date: (event.actual_date >> 1).to_s, recurring: true)
        y << event
      end
    end
  end
end

What's more, the client-side code is provided the stream in the form of a Ruby Enumerable, making it trivial to iterate over the streamed responses:

  range = EventRange.new(from: '2018-01-1', to: '2018-12-31')
events = stub.show_events(range)
events.each do |event|
  puts "#{event.name} is happening on #{event.date}"
end

How fast?

So we can see that using the grpc approach is pretty straightforward to work with, but how fast is it? Part of the problem with our standard HTTP/JSON APIs is the same problem that HTTP/JSON itself was proposing to solve when compared to XML/SOAP web services: serialisation time. The amount of extra bandwidth, both in terms of networking, and computing power to serialise and deserialise data, whether HTTP headers or JSON structures, is an overhead that must be taken into account when designing an API.

Whilst JSON is leaps and bounds simpler and faster than XML, I did a quick benchmark to see how our calendar app compared against an API written in Sinatra. Note that both are using HTTP transport, so really that is just comparing the JSON overhead and the internals of both systems. Below are the results for sending and receiving 1,000,000 requests to add events to our Ruby array:

Time to process 1 million requests (seconds)

user system total real reqs / sec
113.786016 27.136001 140.922017 (342.784571) 7092
300.176772 119.946387 420.123159 (1050.995664) 238

That's 3 times faster with protobufs! Powering away with over 7,000 requests per second, grpc shortcuts the JSON serialisation and takes the prize.

What I learned

  • GRPC isn't a throwback to the WSDL, SOAPy hell we experienced in the 90's.
  • It's pretty easy to work with and define new services
  • It's interoperable and a good way to break down Rails apps into microservices written in other backends like Go
  • It's fast - 3x faster than a JSON API
  • It's fun to meet new people at hackdays! :)

Thanks

Most of the information in this post was inspired by the excellent tutorial on grpc.io, which also has good links and documentation references on protobufs.

Had it not been for London Ruby Unconference, I might not have spent an hour or so hacking away on this demo, so thanks to Jairo and CodeScrum for putting on the event.

If any of this doesn't make sense to you, you need to take a look at my Ruby course!