Vendor branches in git

Recently, I’ve been playing with git. It’s been pretty good so far. There are two things that have really impressed me:

  1. The subversion integration with subversion via git-svn is superb. I’m busy coding in git and I can just push all my changes back into subversion with nobody knowing I’m doing anything different.
  2. The visualisation provided by gitk is wonderful. One of our projects at work has about 10 open branches. gitk allowed me to see this quickly and easily. Most importantly, it immediately alerted me to changes on branches that had not been merged back into the trunk.

There are downsides of course. Support for Windows and Eclipse still isn’t wonderful (though it is progressing). But for my personal use, it’s damned handy.

There’s one thing I make use of in subversion that I didn’t immediately understand how to do in git: vendor branches. If you haven’t come across them before, they’re a way of tracking third party software and incorporating your changes (where you either don’t have access to the repository, or don’t want that level of granularity). At work, I have a wordpress blog, and I keep it up to date (including all my customizations) using a vendor branch. It’s worked really well.

It took a bit of searching, but eventually I found a mail message describing tracking changes when the upstream uses snapshots, which is the same thing as “vendor branches”. The process is really quite elegant and simple, and makes use of git’s superb branching.

To start with, you just unpack the software and set it up as a new git repository.

  $ unzip
  $ cd wordpress
  $ git init
  $ git add .
  $ git commit -m "Import wordpress 2.3"
  $ git tag v2.3
  $ git branch upstream

The clever bit is the last line: create an “upstream” branch, which you can use to store the newer releases.

With that, you can go away and start making changes: wp-config.php, add a few themes and plugins, that sort of thing.

Now, when a new release comes along, there’s a simple process (easily wrapped up in a small shell script) to incorporate it.

  $ cd wordpress
  $ git checkout upstream
  $ rm -r *
  $ (cd .. && unzip
  $ git add .
  $ git commit -a -m 'Import wordpress 2.3.1.'
  $ git tag v2.3.1
  $ git checkout master
  $ git merge upstream

There’s a few tricks in there.

  • I know that wordpress always unzips into a wordpress directory, so I can unzip it on top of the current code.
  • git add . catches all the modified and added files.
  • git commit -a catches the removed files.
  • The magic is in the last two lines—switch back to your own lineage, and pull in all the changes in the latest version of the code. If there are any conflicts, git will point them out to you and let you deal with them.

After a few tries, you end up with something like this.

Wordpress on a vendor branch in git

That is, you have a wordpress 2.3.3 install with all your customizations applied.

I confess, this is all probably overkill for something like wordpress. But it’s still a useful technique to know about.

Update: I’ve just realised that the above solution isn’t perfect, particularly for things like wordpress and mediawiki, where there’s a tendency to upload files into the repository area. Frequently these aren’t in version control so the rm -r * above could blow the uploads away. There are two choices:

  1. Configure the software to not modify anything in the repository (e.g. in wordpress, change Options → Miscellaneous → Store uploads in this folder).
  2. Have a cron job to regularly add & commit uploads into the repository. Depending on the size and frequency, this may or may not be preferrable.