Jabbering Giraffe

The Perforce Perspective

I’m a long time user of subversion, and more recently git. Coming to Google, however, everything’s based around perforce. I’m still new enough to it, that I don’t want to criticise it, merely contrast my experiences with it.

The first thing that I noticed with perforce (p4) is quite how server-based it is. Subversion (and CVS) is often criticised for leaving lots of “turds” around: .svn or CVS directories. They’re just clutter that you don’t want to be bothered with. With perforce however, everything lives on the server. There is almost no data stored on the client side (perhaps just a .p4config file). Everything you do has to talk to the server.

The next surprise was how things are checked out. In subversion, you usually check out the trunk of a project, or a branch. You can do that in perforce, but it’s a great deal more flexible. You supply a client spec, which is a small text file describing a mapping from the server’s directory structure to your own workspace. e.g.

Client: myproject-client

Root: /home/dom/myproject

  //depot/myproject/...  //myproject-client/myproject/...
  -//depot/myproject/bigdata/... //myproject-client/myproject/bigdata/...

In this example, I’ve checked out all of myproject, except I’ve also removed some big data which I don’t need for my development. You can create a client workspace which is composed of any part (or parts) of your repository. Unsurprisingly, this is both a blessing and a curse. You can create very complicated setups using these somewhat ephemeral client specs. But they’re not (by default) versioned, so they’re really easy to lose. I’ve also found it very easy to make small mistakes which mean the wrong bits of projects are checked out (or no bits). If you’re new to a project, figuring out the correct client spec is one of the first hurdles you’ll come across.

Once you’ve got some code checked out, it’s not too dissimilar to other version control systems. The most irritating thing that I came across was p4‘s inability to detect added files. So, if I create a file …/myproject/foo.txt and run p4 pending, it says “no change.” You have to explicitly run p4 add. This is terrible — it’s really easy to forget add files. You can convince perforce to list these files, but it’s not trivial:

$ find * -type f | p4 -x - files 2>&1 | awk '/ - no such file/{print $1}'

One feature I quite like is the ability to have “pending changelists.” A changelist is perforce’s equivalent of a commit in subversion. You can create a pending changelist, which essentially allows you to build up a commit a little bit at a time, somewhat like git’s index. But even though you can have multiple pending changelists in a single client, you are still restricted in that a given file can only be in one of them. Personally, I find the git index more useful. Plus, when you submit a pending changelist, it gets assigned a new changelist number. This can make them difficult to track.

The critical feature for perforce is it’s integration (merging) support. Whilst I’ve done a few p4 integrates, I’ve not got the full hang of it yet. But it’s clearly far in advance of svn’s merging.

Internally, perforce is built upon two things:

  1. A collection of RCS *,v files.
  2. A few database files to coordinate metadata.

This architecture is noticeable: as soon as you look at the file log, you can see that each file has its own individual version number.

Over the years, Google have built up tools to work around many of these issues. There’s a nice discussion of perforce and how it’s used at Google in the comments of the LWN article KS2009: How Google uses Linux (which is a fascinating read in and of itself).

In case you’re interested in some of the challenges of running perforce at the scale of Google, it’s worth checking out some of the papers that have been presented at the perforce user conferences: