Rodrigo Flores's Corner Code, Cats, Books, Coffee

Letting Git Bisect Help You

Git has a wealth of tools that can improve our workflow. One of them is bisect, which is not as well known since it doesn't get used too much, but can still prove itself very useful when you need to discover that exact point where a good branch became a bad branch. Which commit broke something? That's where bisect comes into play.

Binary search

Bisect is based on the binary search algorithm. Given a sorted array of elements, it returns the index where an element can be found (or whether it doesn't exist in the collection), based on the following invariant: once you have compared an element, you can discard all elements that come before or after based on the outcome of the comparison.

If you've ever been to a library, you may have noticed that books are sorted by author within a particular section. Say you're looking for a book whose author's last name is Martin, and you have just found out the author of the last book on a given shelf starts with an F, you can then be sure the right shelf is somewhere after this one. By following this method, you can narrow your search until you find the book you're looking for.

Binary search works the same way. If you're searching for an x element in an array with n elements, you pick the n/2 element and compare it with x. If it is greater, you do the same with the sub-array starting at (n/2)+1 and ending at n. If it is less, you do the same with the sub-array that starts at 1 and ends at n/2, and then proceed recursively. Here's an interesting demonstration of this algorithm.

As for the number of comparisons, it is on average a lot smaller than you would get from the naive search algorithm (where you compare each element in the array). It is also faster on average and very useful to find elements on a sorted collection.

Bisect

Bisect uses binary search to find the commit (i.e. the diff) that introduced a particular change to your branch.

To be able to know where a bug appeared, first you need a way to detect it. You can write a test for it -- or, if that's not possible, try to have at least a list of steps that reproduce it.

With this test in hand, you can start bisecting your code. Remember you must have two commits: one where you're sure the bug exists and one where you're sure that the bug does not exist. You can use hashes and tags to point to these commits. When you bisect you mark a commit as good (the test passes) or bad (the test fails).

In the diagram above, you can see the steps that the bisect performs (green points are good commits, and red ones are bad commits): it moves to the middle of the commits after this (in case of a good commit) or to the middle of the commits before this (in case of a bad commit), ending when it finds the first bad commit (in our case the commit number 6) after 3 steps.

I have generated a git repository with 1024 commits appending integers (from 1 to 1024) on a text file, one integer per line. Our mission is to find which commit introduced the integer 1013 (suppose having this number in our file is a bug).

First, we find a way to quickly determine whether this number is present. Let's call this the test. Doing this is very simple:

$ grep 1013 file.txt

With this command we can start doing our bisect:

$ git bisect start

Running this command on master returns the not-desired result, so we will mark it as bad.

$ git bisect bad

Now, let's point out a good commit: suppose the first commit (7c0dcfa) is free of this bug. So we can either checkout to this commit and mark it as good or run git bisect bad with the hash commit.

$ git bisect good 7c0dcfa
Bisecting: 511 revisions left to test after this (roughly 9 steps)
[8950f7db7e7cad0b2dc394ff9b75fc3d38c9d72a] added 512

A short way to roll those three commands into one is this: $ git bisect start master 7c0dcfa

The start command accepts the bad and the good (in this order) as arguments.

Nice, git bisect has put us in the middle of the collection. Running the test here doesn't reveal anything, so we mark it as good.

$ git bisect good
Bisecting: 255 revisions left to test after this (roughly 8 steps)
[a01ba83f3500b48da97c5f5c33052623aaa4161a] added 768

Again, the test returns the desired state. Good it is.

$ git bisect good
Bisecting: 127 revisions left to test after this (roughly 7 steps)
[4a4a668bf3363d09af5fd1906bc4272aacdb4495] added 896

Good again.

$ git bisect good
Bisecting: 63 revisions left to test after this (roughly 6 steps)
[9059c5b8b898159e8d1d797bff3b1febd1fd6a1c] added 960

Take a look at the message: apart from the current commit and how many revisions are between this one and the first bad one, it tells you how many times you are expected to make this comparison: 6 times. As it is good again, we may mark it as good.

$ git bisect good
Bisecting: 31 revisions left to test after this (roughly 5 steps)
[0c844d0b33ef297b742206ebc293f4925705b083] added 992

Again, we're in a good state.

$ git bisect good
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[0ee17eb17bd96b321a01c73eb13a8929a68b1239] added 1008

4 steps remaining, and we're in a good state.

$ git bisect good
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[dfb1e71736dcfffa2a30aecd7299f45f757c057e] added 1016

But now, running the test shows us the non-desired result. So, let's mark it as bad.

$ git bisect bad
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[6e6d08c374df5162fed65fed82859b69f86b936e] added 1012

We're near the end and we have a desired result, so let's mark it as good.

$ git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[1d23b7045a8accd254efa859d7fc66f1f58a59f0] added 1014

Almost at the end, we have a bad commit.

$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[458eab0eb8d808e16d98ec7039a7c53855dd9ed6] added 1013

Last step and bad it is.

$ git bisect bad
458eab0eb8d808e16d98ec7039a7c53855dd9ed6 is the first bad commit
commit 458eab0eb8d808e16d98ec7039a7c53855dd9ed6
Author: Rodrigo Flores <mail@rodrigoflores.org>
Date:   Tue Oct 21 22:31:05 2014 -0200

    added 1013

:100644 100644 7bc3db7f48a43ccf1a8cc7c26146912cc88c1009 b393a2138a96c1530f41f701ab43cca893226976 M  file.txt

So, now we finally have the commit that introduced the number 1013, as we wanted. You can review the steps taken by using git bisect log.

git bisect start
# bad: [740cdf012013dc41a39b41d4b09b57a970bfe38f] added 1024
git bisect bad 740cdf012013dc41a39b41d4b09b57a970bfe38f
# good: [7c0dcfa7514379151e0d83ffbf805850d2093538] added 1
git bisect good 7c0dcfa7514379151e0d83ffbf805850d2093538
# good: [8950f7db7e7cad0b2dc394ff9b75fc3d38c9d72a] added 512
git bisect good 8950f7db7e7cad0b2dc394ff9b75fc3d38c9d72a
# good: [a01ba83f3500b48da97c5f5c33052623aaa4161a] added 768
git bisect good a01ba83f3500b48da97c5f5c33052623aaa4161a
# good: [4a4a668bf3363d09af5fd1906bc4272aacdb4495] added 896
git bisect good 4a4a668bf3363d09af5fd1906bc4272aacdb4495
# good: [9059c5b8b898159e8d1d797bff3b1febd1fd6a1c] added 960
git bisect good 9059c5b8b898159e8d1d797bff3b1febd1fd6a1c
# good: [0c844d0b33ef297b742206ebc293f4925705b083] added 992
git bisect good 0c844d0b33ef297b742206ebc293f4925705b083
# good: [0ee17eb17bd96b321a01c73eb13a8929a68b1239] added 1008
git bisect good 0ee17eb17bd96b321a01c73eb13a8929a68b1239
# bad: [dfb1e71736dcfffa2a30aecd7299f45f757c057e] added 1016
git bisect bad dfb1e71736dcfffa2a30aecd7299f45f757c057e
# good: [6e6d08c374df5162fed65fed82859b69f86b936e] added 1012
git bisect good 6e6d08c374df5162fed65fed82859b69f86b936e
# bad: [1d23b7045a8accd254efa859d7fc66f1f58a59f0] added 1014
git bisect bad 1d23b7045a8accd254efa859d7fc66f1f58a59f0
# bad: [458eab0eb8d808e16d98ec7039a7c53855dd9ed6] added 1013
git bisect bad 458eab0eb8d808e16d98ec7039a7c53855dd9ed6
# first bad commit: [458eab0eb8d808e16d98ec7039a7c53855dd9ed6] added 1013

We have on this example 1024 commits, and traversing through all of them took 10 steps. If we had twice this number of commits, it would take us only one more step to find the desired commit, as the efficiency in the worst case scenario is O(log n).

Even so, it is a tedious task to run the same command 10 times. So, let's think about a way to automate it.

Automating

Git bisect can use a script return code to determine whether a revision is good or bad. If it is 0, it is good; if it is not 0, it is considered bad. Luckily, grep has a return code of 0 if it finds a line with the desired pattern and 1 if it doesn't find a line. You can check this by running echo $? after running the test.

This is the opposite of good and bad, so we need to write a shell script that negates the result (I'm not used to writing shell script, so please let me know if there's a cleaner way to do so).

#!/bin/sh

if [[ `grep 1013 file.txt` ]]; then
  exit 1
else
  exit 0
fi

Let's save it to a file called test.sh and give it permission to execute.

Then, after marking the good and the bad commits, you just need to run:

git bisect run ./test.sh
running ./ola.sh
Bisecting: 255 revisions left to test after this (roughly 8 steps)
[a01ba83f3500b48da97c5f5c33052623aaa4161a] added 768
running ./ola.sh
Bisecting: 127 revisions left to test after this (roughly 7 steps)
[4a4a668bf3363d09af5fd1906bc4272aacdb4495] added 896
running ./ola.sh
Bisecting: 63 revisions left to test after this (roughly 6 steps)
[9059c5b8b898159e8d1d797bff3b1febd1fd6a1c] added 960
running ./ola.sh
Bisecting: 31 revisions left to test after this (roughly 5 steps)
[0c844d0b33ef297b742206ebc293f4925705b083] added 992
running ./ola.sh
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[0ee17eb17bd96b321a01c73eb13a8929a68b1239] added 1008
running ./ola.sh
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[dfb1e71736dcfffa2a30aecd7299f45f757c057e] added 1016
running ./ola.sh
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[6e6d08c374df5162fed65fed82859b69f86b936e] added 1012
running ./ola.sh
Bisecting: 1 revision left to test after this (roughly 1 step)
[1d23b7045a8accd254efa859d7fc66f1f58a59f0] added 1014
running ./ola.sh
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[458eab0eb8d808e16d98ec7039a7c53855dd9ed6] added 1013
running ./ola.sh
458eab0eb8d808e16d98ec7039a7c53855dd9ed6 is the first bad commit
commit 458eab0eb8d808e16d98ec7039a7c53855dd9ed6
Author: Rodrigo Flores <mail@rodrigoflores.org>
Date:   Tue Oct 21 22:31:05 2014 -0200

    added 1013

:100644 100644 7bc3db7f48a43ccf1a8cc7c26146912cc88c1009 b393a2138a96c1530f41f701ab43cca893226976 M  file.txt
bisect run success

And then, you've found your broken commit without hassle.

Conclusion

You probably you won't use it everyday. Maybe once a week or even once a month. But git bisect can give you a hand to find a broken commit. Keeping the practice of doing small commits, while not required, can help this process a lot. Finding out the culprit commit won't be of much help if you're forced to look through a gigantic diff. So, having small commits may help you in the long run, even if that means having too many commits.

What about you? Do you use bisect frequently?

Kudos for the friends @mdepolli and @p_balduino for the kind reviews!

Ruby Patterns: Webservice object

This is a series about Ruby Patterns, which will explain some common uses of Ruby syntax. The second post is about a webservice based. I like to call it a pattern because it is very common and tends to repeat (on a not-duplicated way) on Service Oriented Architecture based applications. Of course, this code may be too sophisticated for such a small script like this, but it may be a good way to handle things on a more complex application.

So, you're given the task to write a class that access a webservice and returns the info on it (e.g. the github repos for a given organization). A simplistic implemetation can be like this:

require 'faraday'
require 'json'

def retrieve_repos_for(org)
  connection = Faraday.new(:url => 'https://api.github.com') do |faraday|
    faraday.adapter  Faraday.default_adapter
  end

  response = JSON.parse(connection.get("/orgs/#{org}/repos").body)
  response
end

retrieve_repos_for('github').each do |repo|
  puts repo['clone_url']
end

Obviously, this is an example of a procedural implementation, so, let's make it more object oriented.

require 'faraday'
require 'json'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      connection = Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
      end

      response = JSON.parse(connection.get("/orgs/#{@organization}/repos").body)
      response
    end
  end
end

Github::Organization.new('github').repos.each do |repo|
  puts repo['clone_url']
end

Nice, we now have it inside a class. But we can extract some private methods here.

require 'faraday'
require 'json'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      response = JSON.parse(connection.get(repos_url).body)
      response
    end

    private

    def connection
      Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
      end
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end
  end
end

Github::Organization.new('github').repos.each do |repo|
  puts repo['clone_url']
end

Well, the public methods seems to be more concise now and we have extracted some methods that can be more easily reused. But there is a few flaws: if we call the repos method twice, it will make two requests, but this is easy to solve: just add some memoization.

require 'faraday'
require 'json'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= JSON.parse(connection.get(repos_url).body)
    end

    private

    def connection
      Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
      end
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end
  end
end

Github::Organization.new('github').repos.each do |repo|
  puts repo['clone_url']
end

We're almost done here. I'm not satisfied with the JSON.parse(connection.get(repos_url).body), it seems such a complex line. Let's extract some methods here.

require 'faraday'
require 'json'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= get(repos_url)
    end

    private

    def connection
      Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
      end
    end

    def get(url)
      JSON.parse(connection.get(url).body)
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end
  end
end

Github::Organization.new('github').repos.each do |repo|
  puts repo['clone_url']
end

The repos method seems simple enough now, and we have moved the parsing responsability to the get method. But we can get rid of it delegating to someone else to do that. There is a great gem called faraday-middleware that parses it for me, based on the content type header and returns a hash, so, let's use it.

require 'faraday'
require 'faraday_middleware'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= get(repos_url)
    end

    private

    def connection
      @connection ||= Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
        faraday.response :json, :content_type => /\bjson$/
      end
    end

    def get(url)
      connection.get(url).body
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end
  end
end

I've also added a memoization on the connection (we don't need to instantiate a new one every time).

Two days later, a new requirement: get the organization info and add it on the api. This implementation makes it really easy:

require 'faraday'
require 'faraday_middleware'

module Github
  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= get(repos_url)
    end

    def info
      @info ||= get(info_url)
    end

    private

    def connection
      @connection ||= Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
        faraday.response :json, :content_type => /\bjson$/
      end
    end

    def get(url)
      connection.get(url).body
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end

    def info_url
      "/orgs/#{@organization}"
    end
  end
end

org = Github::Organization.new('github')

puts org.info['name']
org.repos.each do |repo|
  puts repo['clone_url']
end

Neat! It is indeed really easy to add new endpoints support to our class. But I think it has a lot of responsability: it is dealing with the connection to the API. Let's extract a new class that does that and refer to it on the client method.

require 'faraday'
require 'faraday_middleware'

module Github
  class Client
    def initialize
      @connection = Faraday.new(:url => 'https://api.github.com') do |faraday|
        faraday.adapter  Faraday.default_adapter
        faraday.response :json, :content_type => /\bjson$/
      end
    end

    def get(url)
      @connection.get(url).body
    end
  end

  class Organization
    def initialize(organization)
      @organization = organization
    end

    def repos
      @repos ||= client.get(repos_url)
    end

    def info
      @info ||= client.get(info_url)
    end

    private

    def client
      @client ||= Github::Client.new
    end

    def repos_url
      "/orgs/#{@organization}/repos"
    end

    def info_url
      "/orgs/#{@organization}"
    end
  end
end

org = Github::Organization.new('github')

puts org.info['name']
org.repos.each do |repo|
  puts repo['clone_url']
end

Now we have a pretty simple class, which I finally consider a final implementation, it splits the responsability to parse to another place and now I only have to specify the endpoints and get (or post/put/patch/delete) it. Another improvements may be to add a condition to do something when we have a 404 on an endpoint.

What about you ? Would you recommend another improvement ? Do you use something similar ?

octopress and capistrano

To deploy your octopress blog with Capistrano, you should do these steps:

First, add capistrano gem to your Gemfile. Then, after running a bundle install, run bundle exec capify . (or bin/capify . if you use binstubs) to generate the Capistrano files.

After that, you should add a content like this to your config/deploy.rb file.

# Set this forward agent option to not have to add your server's ssh public key to your repository's host authorized keys
ssh_options[:forward_agent] = true
require "bundler/capistrano"

set :keep_releases, 5
set :scm, :git
set :scm_verbose, false

# Set your repository URL
set :repository, 'YOUR REPO HERE'

# Set your application name
set :application, "YOUR APPLICATION NAME"
set :deploy_via, :remote_cache

# Set your machine user
set :user, 'YOUR SSH USER'

set :deploy_to, "/home/#{user}/apps/#{application}"
set :use_sudo, false

# Set your host, you can use the server IPs here if you don't have one yet
role :app, 'YOUR HOSTNAME', :primary => true

default_run_options[:pty] = true

namespace :octopress do
  task :generate, :roles => :app do
    run "cd #{release_path} && bundle exec rake generate"
  end
end

after 'deploy:update_code', 'deploy:cleanup'
after 'bundle:install', 'octopress:generate'

Now, you should add the group production to the development group on your Gemfile. Doing this, capistrano will be able to run octopress:generate on your server.

source "http://rubygems.org"

group :development, :production do
  gem 'rake', '~> 0.9'
  gem 'rack', '~> 1.4.1'
  gem 'jekyll', '~> 0.12'
  gem 'rdiscount', '~> 1.6.8'
  gem 'pygments.rb', '~> 0.3.4'
  gem 'RedCloth', '~> 4.2.9'
  gem 'haml', '~> 3.1.7'
  gem 'compass', '~> 0.12.2'
  gem 'sass-globbing', '~> 1.0.0'
  gem 'rubypants', '~> 0.2.0'
  gem 'rb-fsevent', '~> 0.9'
  gem 'stringex', '~> 1.4.0'
  gem 'liquid', '~> 2.3.0'
end

gem 'sinatra', '~> 1.3.5'
gem 'capistrano'

Finally, before doing the first cap deploy, do a git clone of your blog's repository on the server (or try to connect through ssh to the repository server on your blog server), you will need to use the -A option on ssh command to forward your keys. This is needed because ssh asks for the fingerprint confirmation on the first ssh connection. As capistrano won't do the 'yes' on the confirmation you should do it manually.

Doing this, you will be able to deploy your blog through capistrano. Do you have any tips on how to improve this capistrano recipe ? Please say them on the comments :).