Letting Git Bisect Help You

30 Nov 2014

Git has a wealth of tools that can improve our workflow. One of them is bisect, which is not as well known since it doesn’t get used too much, but can still prove itself very useful when you need to discover that exact point where a good branch became a bad branch. Which commit broke something? That’s where bisect comes into play.

Binary search

Bisect is based on the binary search algorithm. Given a sorted array of elements, it returns the index where an element can be found (or whether it doesn’t exist in the collection), based on the following invariant: once you have compared an element, you can discard all elements that come before or after based on the outcome of the comparison.

If you’ve ever been to a library, you may have noticed that books are sorted by author within a particular section. Say you’re looking for a book whose author’s last name is Martin, and you have just found out the author of the last book on a given shelf starts with an F, you can then be sure the right shelf is somewhere after this one. By following this method, you can narrow your search until you find the book you’re looking for.

Binary search works the same way. If you’re searching for an x element in an array with n elements, you pick the n/2 element and compare it with x. If it is greater, you do the same with the sub-array starting at (n/2)+1 and ending at n. If it is less, you do the same with the sub-array that starts at 1 and ends at n/2, and then proceed recursively. Here’s an interesting demonstration of this algorithm.

As for the number of comparisons, it is on average a lot smaller than you would get from the naive search algorithm (where you compare each element in the array). It is also faster on average and very useful to find elements on a sorted collection.

Bisect

Bisect uses binary search to find the commit (i.e. the diff) that introduced a particular change to your branch.

To be able to know where a bug appeared, first you need a way to detect it. You can write a test for it – or, if that’s not possible, try to have at least a list of steps that reproduce it.

With this test in hand, you can start bisecting your code. Remember you must have two commits: one where you’re sure the bug exists and one where you’re sure that the bug does not exist. You can use hashes and tags to point to these commits. When you bisect you mark a commit as good (the test passes) or bad (the test fails).

In the diagram above, you can see the steps that the bisect performs (green points are good commits, and red ones are bad commits): it moves to the middle of the commits after this (in case of a good commit) or to the middle of the commits before this (in case of a bad commit), ending when it finds the first bad commit (in our case the commit number 6) after 3 steps.

I have generated a git repository with 1024 commits appending integers (from 1 to 1024) on a text file, one integer per line. Our mission is to find which commit introduced the integer 1013 (suppose having this number in our file is a bug).

First, we find a way to quickly determine whether this number is present. Let’s call this the test. Doing this is very simple:

$ grep 1013 file.txt

With this command we can start doing our bisect:

$ git bisect start

Running this command on master returns the not-desired result, so we will mark it as bad.

$ git bisect bad

Now, let’s point out a good commit: suppose the first commit (7c0dcfa) is free of this bug. So we can either checkout to this commit and mark it as good or run git bisect bad with the hash commit.

$ git bisect good 7c0dcfa
Bisecting: 511 revisions left to test after this (roughly 9 steps)
[8950f7db7e7cad0b2dc394ff9b75fc3d38c9d72a] added 512

A short way to roll those three commands into one is this:

$ git bisect start master 7c0dcfa

The start command accepts the bad and the good (in this order) as arguments.

Nice, git bisect has put us in the middle of the collection. Running the test here doesn’t reveal anything, so we mark it as good.

$ git bisect good
Bisecting: 255 revisions left to test after this (roughly 8 steps)
[a01ba83f3500b48da97c5f5c33052623aaa4161a] added 768

Again, the test returns the desired state. Good it is.

$ git bisect good
Bisecting: 127 revisions left to test after this (roughly 7 steps)
[4a4a668bf3363d09af5fd1906bc4272aacdb4495] added 896

Good again.

$ git bisect good
Bisecting: 63 revisions left to test after this (roughly 6 steps)
[9059c5b8b898159e8d1d797bff3b1febd1fd6a1c] added 960

Take a look at the message: apart from the current commit and how many revisions are between this one and the first bad one, it tells you how many times you are expected to make this comparison: 6 times. As it is good again, we may mark it as good.

$ git bisect good
Bisecting: 31 revisions left to test after this (roughly 5 steps)
[0c844d0b33ef297b742206ebc293f4925705b083] added 992

Again, we’re in a good state.

$ git bisect good
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[0ee17eb17bd96b321a01c73eb13a8929a68b1239] added 1008

4 steps remaining, and we’re in a good state.

$ git bisect good
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[dfb1e71736dcfffa2a30aecd7299f45f757c057e] added 1016

But now, running the test shows us the non-desired result. So, let’s mark it as bad.

$ git bisect bad
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[6e6d08c374df5162fed65fed82859b69f86b936e] added 1012

We’re near the end and we have a desired result, so let’s mark it as good.

$ git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[1d23b7045a8accd254efa859d7fc66f1f58a59f0] added 1014

Almost at the end, we have a bad commit.

$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[458eab0eb8d808e16d98ec7039a7c53855dd9ed6] added 1013

Last step and bad it is.

$ git bisect bad
458eab0eb8d808e16d98ec7039a7c53855dd9ed6 is the first bad commit
commit 458eab0eb8d808e16d98ec7039a7c53855dd9ed6
Author: Rodrigo Flores <mail@rodrigoflores.org>
Date:   Tue Oct 21 22:31:05 2014 -0200

    added 1013

:100644 100644 7bc3db7f48a43ccf1a8cc7c26146912cc88c1009 b393a2138a96c1530f41f701ab43cca893226976 M  file.txt

So, now we finally have the commit that introduced the number 1013, as we wanted. You can review the steps taken by using git bisect log.

git bisect start
# bad: [740cdf012013dc41a39b41d4b09b57a970bfe38f] added 1024
git bisect bad 740cdf012013dc41a39b41d4b09b57a970bfe38f
# good: [7c0dcfa7514379151e0d83ffbf805850d2093538] added 1
git bisect good 7c0dcfa7514379151e0d83ffbf805850d2093538
# good: [8950f7db7e7cad0b2dc394ff9b75fc3d38c9d72a] added 512
git bisect good 8950f7db7e7cad0b2dc394ff9b75fc3d38c9d72a
# good: [a01ba83f3500b48da97c5f5c33052623aaa4161a] added 768
git bisect good a01ba83f3500b48da97c5f5c33052623aaa4161a
# good: [4a4a668bf3363d09af5fd1906bc4272aacdb4495] added 896
git bisect good 4a4a668bf3363d09af5fd1906bc4272aacdb4495
# good: [9059c5b8b898159e8d1d797bff3b1febd1fd6a1c] added 960
git bisect good 9059c5b8b898159e8d1d797bff3b1febd1fd6a1c
# good: [0c844d0b33ef297b742206ebc293f4925705b083] added 992
git bisect good 0c844d0b33ef297b742206ebc293f4925705b083
# good: [0ee17eb17bd96b321a01c73eb13a8929a68b1239] added 1008
git bisect good 0ee17eb17bd96b321a01c73eb13a8929a68b1239
# bad: [dfb1e71736dcfffa2a30aecd7299f45f757c057e] added 1016
git bisect bad dfb1e71736dcfffa2a30aecd7299f45f757c057e
# good: [6e6d08c374df5162fed65fed82859b69f86b936e] added 1012
git bisect good 6e6d08c374df5162fed65fed82859b69f86b936e
# bad: [1d23b7045a8accd254efa859d7fc66f1f58a59f0] added 1014
git bisect bad 1d23b7045a8accd254efa859d7fc66f1f58a59f0
# bad: [458eab0eb8d808e16d98ec7039a7c53855dd9ed6] added 1013
git bisect bad 458eab0eb8d808e16d98ec7039a7c53855dd9ed6
# first bad commit: [458eab0eb8d808e16d98ec7039a7c53855dd9ed6] added 1013

We have on this example 1024 commits, and traversing through all of them took 10 steps. If we had twice this number of commits, it would take us only one more step to find the desired commit, as the efficiency in the worst case scenario is O(log n).

Even so, it is a tedious task to run the same command 10 times. So, let’s think about a way to automate it.

Automating

Git bisect can use a script return code to determine whether a revision is good or bad. If it is 0, it is good; if it is not 0, it is considered bad. Luckily, grep has a return code of 0 if it finds a line with the desired pattern and 1 if it doesn’t find a line. You can check this by running echo $? after running the test.

This is the opposite of good and bad, so we need to write a shell script that negates the result (I’m not used to writing shell script, so please let me know if there’s a cleaner way to do so).

#!/bin/sh

if [[ `grep 1013 file.txt` ]]; then
  exit 1
else
  exit 0
fi

Let’s save it to a file called test.sh and give it permission to execute.

Then, after marking the good and the bad commits, you just need to run:

git bisect run ./test.sh
running ./ola.sh
Bisecting: 255 revisions left to test after this (roughly 8 steps)
[a01ba83f3500b48da97c5f5c33052623aaa4161a] added 768
running ./ola.sh
Bisecting: 127 revisions left to test after this (roughly 7 steps)
[4a4a668bf3363d09af5fd1906bc4272aacdb4495] added 896
running ./ola.sh
Bisecting: 63 revisions left to test after this (roughly 6 steps)
[9059c5b8b898159e8d1d797bff3b1febd1fd6a1c] added 960
running ./ola.sh
Bisecting: 31 revisions left to test after this (roughly 5 steps)
[0c844d0b33ef297b742206ebc293f4925705b083] added 992
running ./ola.sh
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[0ee17eb17bd96b321a01c73eb13a8929a68b1239] added 1008
running ./ola.sh
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[dfb1e71736dcfffa2a30aecd7299f45f757c057e] added 1016
running ./ola.sh
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[6e6d08c374df5162fed65fed82859b69f86b936e] added 1012
running ./ola.sh
Bisecting: 1 revision left to test after this (roughly 1 step)
[1d23b7045a8accd254efa859d7fc66f1f58a59f0] added 1014
running ./ola.sh
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[458eab0eb8d808e16d98ec7039a7c53855dd9ed6] added 1013
running ./ola.sh
458eab0eb8d808e16d98ec7039a7c53855dd9ed6 is the first bad commit
commit 458eab0eb8d808e16d98ec7039a7c53855dd9ed6
Author: Rodrigo Flores <mail@rodrigoflores.org>
Date:   Tue Oct 21 22:31:05 2014 -0200

    added 1013

:100644 100644 7bc3db7f48a43ccf1a8cc7c26146912cc88c1009 b393a2138a96c1530f41f701ab43cca893226976 M  file.txt
bisect run success

And then, you’ve found your broken commit without hassle.

Conclusion

You probably you won’t use it everyday. Maybe once a week or even once a month. But git bisect can give you a hand to find a broken commit. Keeping the practice of doing small commits, while not required, can help this process a lot. Finding out the culprit commit won’t be of much help if you’re forced to look through a gigantic diff. So, having small commits may help you in the long run, even if that means having too many commits.

What about you? Do you use bisect frequently?

Kudos for the friends @mdepolli and @p_balduino for the kind reviews!

Rodrigo Flores's Corner Code, Cats, Books, Coffee

Letting Git Bisect Help You

Binary search

Bisect

Automating

Conclusion