Rodrigo Flores's Corner Code, Cats, Books, Coffee

Ruby Patterns: Webservice object

This is a series about Ruby Patterns, which will explain some common uses of Ruby syntax. The second post is about a webservice based. I like to call it a pattern because it is very common and tends to repeat (on a not-duplicated way) on Service Oriented Architecture based applications. Of course, this code may be too sophisticated for such a small script like this, but it may be a good way to handle things on a more complex application.

So, you're given the task to write a class that access a webservice and returns the info on it (e.g. the github repos for a given organization). A simplistic implemetation can be like this:

require 'faraday'
require 'json'

def retrieve_repos_for(org)
  connection = Faraday.new(:url => 'https://api.github.com') do |faraday|
    faraday.adapter  Faraday.default_adapter
  end

  response = JSON.parse(connection.get("/orgs/#{org}/repos").body)
  response
end

retrieve_repos_for('github').each do |repo|
  puts repo['clone_url']
end

Obviously, this is an example of a procedural implementation, so, let's make it more object oriented.

require 'faraday'
require 'json'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      connection = Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
      end

      response = JSON.parse(connection.get("/orgs/#{@organization}/repos").body)
      response
    end
  end
end

Github::Organization.new('github').repos.each do |repo|
  puts repo['clone_url']
end

Nice, we now have it inside a class. But we can extract some private methods here.

require 'faraday'
require 'json'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      response = JSON.parse(connection.get(repos_url).body)
      response
    end

    private

    def connection
      Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
      end
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end
  end
end

Github::Organization.new('github').repos.each do |repo|
  puts repo['clone_url']
end

Well, the public methods seems to be more concise now and we have extracted some methods that can be more easily reused. But there is a few flaws: if we call the repos method twice, it will make two requests, but this is easy to solve: just add some memoization.

require 'faraday'
require 'json'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= JSON.parse(connection.get(repos_url).body)
    end

    private

    def connection
      Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
      end
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end
  end
end

Github::Organization.new('github').repos.each do |repo|
  puts repo['clone_url']
end

We're almost done here. I'm not satisfied with the JSON.parse(connection.get(repos_url).body), it seems such a complex line. Let's extract some methods here.

require 'faraday'
require 'json'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= get(repos_url)
    end

    private

    def connection
      Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
      end
    end

    def get(url)
      JSON.parse(connection.get(url).body)
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end
  end
end

Github::Organization.new('github').repos.each do |repo|
  puts repo['clone_url']
end

The repos method seems simple enough now, and we have moved the parsing responsability to the get method. But we can get rid of it delegating to someone else to do that. There is a great gem called faraday-middleware that parses it for me, based on the content type header and returns a hash, so, let's use it.

require 'faraday'
require 'faraday_middleware'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= get(repos_url)
    end

    private

    def connection
      @connection ||= Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
        faraday.response :json, :content_type => /\bjson$/
      end
    end

    def get(url)
      connection.get(url).body
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end
  end
end

I've also added a memoization on the connection (we don't need to instantiate a new one every time).

Two days later, a new requirement: get the organization info and add it on the api. This implementation makes it really easy:

require 'faraday'
require 'faraday_middleware'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= get(repos_url)
    end

    def info
      @info ||= get(info_url)
    end

    private

    def connection
      @connection ||= Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
        faraday.response :json, :content_type => /\bjson$/
      end
    end

    def get(url)
      connection.get(url).body
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end

    def info_url
      "/orgs/#{@organization}"
    end
  end
end

org = Github::Organization.new('github')

puts org.info['name']
org.repos.each do |repo|
  puts repo['clone_url']
end

Neat! It is indeed really easy to add new endpoints support to our class. But I think it has a lot of responsability: it is dealing with the connection to the API. Let's extract a new class that does that and refer to it on the client method.

require 'faraday'
require 'faraday_middleware'

module Github
  class Client
    def initialize
      @connection = Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
        faraday.response :json, :content_type => /\bjson$/
      end
    end

    def get(url)
      @connection.get(url).body
    end
  end

  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= client.get(repos_url)
    end

    def info
      @info ||= client.get(info_url)
    end

    private

    def client
      @client ||= Github::Client.new
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end

    def info_url
      "/orgs/#{@organization}"
    end
  end
end

org = Github::Organization.new('github')

puts org.info['name']
org.repos.each do |repo|
  puts repo['clone_url']
end

Now we have a pretty simple class, which I finally consider a final implementation, it splits the responsability to parse to another place and now I only have to specify the endpoints and get (or post/put/patch/delete) it. Another improvements may be to add a condition to do something when we have a 404 on an endpoint.

What about you ? Would you recommend another improvement ? Do you use something similar ?

octopress and capistrano

To deploy your octopress blog with Capistrano, you should do these steps:

First, add capistrano gem to your Gemfile. Then, after running a bundle install, run bundle exec capify . (or bin/capify . if you use binstubs) to generate the Capistrano files.

After that, you should add a content like this to your config/deploy.rb file.

# Set this forward agent option to not have to add your server's ssh public key to your repository's host authorized keys
ssh_options[:forward_agent] = true
require "bundler/capistrano"

set :keep_releases, 5
set :scm, :git
set :scm_verbose, false

# Set your repository URL
set :repository, 'YOUR REPO HERE'

# Set your application name
set :application, "YOUR APPLICATION NAME"
set :deploy_via, :remote_cache

# Set your machine user
set :user, 'YOUR SSH USER'

set :deploy_to, "/home/#{user}/apps/#{application}"
set :use_sudo, false

# Set your host, you can use the server IPs here if you don't have one yet
role :app, 'YOUR HOSTNAME', :primary => true

default_run_options[:pty] = true

namespace :octopress do
  task :generate, :roles => :app do
    run "cd #{release_path} && bundle exec rake generate"
  end
end

after 'deploy:update_code', 'deploy:cleanup'
after 'bundle:install', 'octopress:generate'

Now, you should add the group production to the development group on your Gemfile. Doing this, capistrano will be able to run octopress:generate on your server.

source "http://rubygems.org"

group :development, :production do
  gem 'rake', '~> 0.9'
  gem 'rack', '~> 1.4.1'
  gem 'jekyll', '~> 0.12'
  gem 'rdiscount', '~> 1.6.8'
  gem 'pygments.rb', '~> 0.3.4'
  gem 'RedCloth', '~> 4.2.9'
  gem 'haml', '~> 3.1.7'
  gem 'compass', '~> 0.12.2'
  gem 'sass-globbing', '~> 1.0.0'
  gem 'rubypants', '~> 0.2.0'
  gem 'rb-fsevent', '~> 0.9'
  gem 'stringex', '~> 1.4.0'
  gem 'liquid', '~> 2.3.0'
end

gem 'sinatra', '~> 1.3.5'
gem 'capistrano'

Finally, before doing the first cap deploy, do a git clone of your blog's repository on the server (or try to connect through ssh to the repository server on your blog server), you will need to use the -A option on ssh command to forward your keys. This is needed because ssh asks for the fingerprint confirmation on the first ssh connection. As capistrano won't do the 'yes' on the confirmation you should do it manually.

Doing this, you will be able to deploy your blog through capistrano. Do you have any tips on how to improve this capistrano recipe ? Please say them on the comments :).

Ruby Patterns: Method with options

This is a series about Ruby Patterns, which will explain some common uses of Ruby syntax. The first post is about methods with options.

Let's start with an example where it can be useful. Suppose, you're given an object and you should write a method that prints the object's attributes in HTML. Also, you'll also have to follow this other requirements:

  • You may specify the attributes that you want to show (by default you should show all of them);
  • You may specify the attributes that you don't want to show (by default, you should show all of them, i.e. this list is empty);
  • You may specify the format: possible formats are :ordered_list, :unordered_list, :table (by default, present it as a :table);
  • You may specify the strftime format of the objects of Time type (default is :month/:day/:year :hour::minute :timezone in strftime format);
  • You may specify the strftime format of the objects of Date type (default is :month/:day/:year in strftime format);

So, given this requirements. Let's write a function:

def object_to_html(object, only=nil, except=[], format=:table, time_format="%m/%d/%Y %H:%M %t", date_format="%m/%d/%Y")
  only ||= object.attributes - except

  if only - except != only
    fail "You can't put in except an attribute that you've put on only"
  end

  # Code that does what we proposed
end

Some examples of the usage of this function (and what someone will say after reading your code):

object_to_html(user) # Seems OK
object_to_html(user, nil, :encrypted_password) # What is this nil ?
object_to_html(user, [:email, :phone, :created_at], [], :ordered_list, "%d/%m/%Y %H:%M", "%d/%m/%Y") # WTF is this empty array ?
object_to_html(product, nil, [], :table, "%d/%m/%Y %H:%M", "%d/%m/%Y") # Gosh. My eyes are bleeding. Call an ambulance.

What a creepy code! If you want to change the last attribute, you will have to know all the default values of the other ones to be able to change it. You will also have to know (or constantly look at its implementation or documentation) the exactly order of the attributes. Also, if you decide to remove one attribute, you will have to find all the calls to this method and change it on all of them. Definitely, this is what I call ugly code: difficult to use and hard to maintain.

So, this pattern comes in handy: method with options!

def object_to_html(object, options={})
  default_options = {
    :format => :table,
    :only => nil,
    :except => [],
    :time_format => "%m/%d/%Y %H:%M %t",
    :date_format => "%m/%d/%Y"
  }

  options = options.reverse_merge(default_options)

  options[:only] ||= object.attributes - options[except]

  if options[:only] - options[:except] != options[:only]
    fail "You can't put in except an attribute that you've put on only"
  end

  # Code that does what we proposed (now it uses the options hash instead of local variables)
end

The same usage examples come along:

object_to_html(user)
object_to_html(user, :except => [:encrypted_password])
object_to_html(user, :only => [:email, :phone, :created_at], :format => :ordered_list, :time_format => "%d/%m/%Y %H:%M")
object_to_html(product, :time_format => "%d/%m/%Y %H:%M", :date_format => "%d/%m/%Y")

Now to the explanation of how this works: the Hash#reverse_merge! method does the key role here. It merges the two hashs (the one containing the default values with the one that was given as a parameter) and in case of the same key in both, it keeps the one that belongs to the Hash that received the method call (in our case, the one given as a parameter to the method). The method is called reverse_merge because the Hash#merge method solves the conflict in the opposite way: in case of the same key in both, it keeps the one that belongs to the hash given as argument on merge. We can easily use merge instead of reverse_merge if we write the line 10 this way:

options = default_options.merge(options)

And then the code would work the almost the same way: the only difference is that instead of accessing local variables we may access the arguments in the options hash, but this can be easily converted. Also, note that we're adding a bang (!) so, it will modify the hash and do the merge on itself instead of not modifying and returning other instance of hash.

Also, if you want to enforce that one option is always given, you can do this:

begin
  options.fetch(:format)
rescue KeyError # Hash#fetch raises KeyError when a key is not found
  raise ArgumentError, "The :format option should be passed"
end

Doing this, you will treat a missing argument the same way Ruby does with the function calling. However, I'm not a big fan of this approach: I think that if an argument is mandatory, it should be passed as a regular argument and not on as a function.

This pattern is heavily used on famous Ruby libraries like Rails, Bundler or Devise and plays a big part on pretty DSLs like Rails routes or Gemfiles. So, IMO, you should use it wherever you find a need to include more than 2 or 3 arguments on a function. Also, it is possible to use this on Javascript: if you take a look at jQuery's ajax function it works the same way: you can pass the url as the first argument and the other parameters (like callbacks, fallbacks, urls, etc) as an object.

A big disadvantage in doing this is that it is very difficult to know what is the default options, and even looking at your code, you may only understand what is the default behaviour passing through all the code (which is far from good). So, I suggest that you document the default parameters and what does they mean like Rails do in this example with the has_many method.

UPDATE: As Rafael Fran├ža suggested, I'm not overriding the argument anymore. As hashes are passed as a reference, when you execute merge! you actually change the parameter outside the function as an undesirable side effect.

UPDATE II: Adam Meehan reminded me on the comments that Hash#reverse_merge is part of ActiveSupport and not part of Ruby standard library. To be able to use it, you will have to require activesupport/core_ext/hash.