All Change!

So I got fed up with typo, for one minor reason and another (mostly my fault, I confess). Frankly, I’m just more used to wordpress, as I use it at work (blogs make great notepads). So, I decided to switch. It’s been an interesting exercise…

The bulk of the work is building a converter. Thankfully, typo is built on rails, so it was relatively easy to hack on. I looked through wordpress and reckoned that the easiest way to get data in was through the wordpress backup mechanism, WXR (WordPress eXtended RSS). There’s a dearth of information about this format, but after taking a quick look, I figured it wouldn’t be too hard to get typo to spit out something similar. It was basically RSS with a few extra elements.

The initial work of producing the WXR feed from typo went quite quickly. However, when I tried to import it into a test wordpress installation, the problems started. It turns out that wordpress doesn’t use an XML parser for reading in WXR. It uses regexes.

As any fule no, parsing XML with regexes is bound to lead to pain. And indeed it is. Getting builder.rb to emit non-indented tags (as wordpress desires) when it’s in the middle of emitting indented stuff is yucky.! << '<< 'CDATA['! << html(item)! << ']]'! << '>'

Anyway, I thought I’d cracked the problem, once I got the export working. I immediately saw that all the XML in several posts was coming out double encoded. Doh. One quick wordpress plugin fixed the issue quite neatly (wp-correct-syntax-xml.php).

PHP is very amenable to cut’n’paste.

Then I remembered to look through my access logs. And I realised quite how many RSS and Atom feeds I publish. Much time later, I now have 15 hairy RedirectMatch permanent beckoning feed readers into the correct place.

Then I had a final thought. The IDs of the articles are going to be different in the feeds. In order to correct that, I had to go back to the export and insert the correct guid elements. This led me to a slight wild goose chase in typo. If you look in app/views/xml/_atom10_item_article.atom.builder, you see: "urn:uuid:#{item.guid}"

And the IDs that I was seeing coming out in the feed were Tag URIs. It took me a while to track down that the default feed for typo takes an entirely different route going through articles_controller.rb and into app/views/articles/_atom_feed.atom.builder, which in turn makes use of Rails’ AtomFeedBuilder::entry. That’s what spits out the Tag URI."tag:#{},#{@feed_options[:schema_date]}:#{record.class}/#{}")

Then I realised that the RSS feed was spitting out UUIDs instead of Tag URIs. Sigh. But I looked at the stats and I’m serving more than twice as much Atom as RSS, so I decided to stick with Tag URIs.

Finally, I pulled the big switch, and hit refresh in NetNewsWire. And all of my posts are marked as “new”. Bloody hell, I have no idea what’s caused that. It’s too late and I give up for now. Apart from that, things mostly seem to be working.

In case anybody else is interested, here’s my patch against typo to produce WXR feeds. It’s very simplistic, has no tests and is strictly “works well enough for me”.

Update: I don’t know how, but I completely managed to miss the typo-wordpress converter.  This would have been much simpler.  Although as I was migrating off of textile at the time, it made sense for me to export HTML.  Ah well, it was an interesting learning experience.


Vendor branches in git

Recently, I’ve been playing with git. It’s been pretty good so far. There are two things that have really impressed me:

  1. The subversion integration with subversion via git-svn is superb. I’m busy coding in git and I can just push all my changes back into subversion with nobody knowing I’m doing anything different.
  2. The visualisation provided by gitk is wonderful. One of our projects at work has about 10 open branches. gitk allowed me to see this quickly and easily. Most importantly, it immediately alerted me to changes on branches that had not been merged back into the trunk.

There are downsides of course. Support for Windows and Eclipse still isn’t wonderful (though it is progressing). But for my personal use, it’s damned handy.

There’s one thing I make use of in subversion that I didn’t immediately understand how to do in git: vendor branches. If you haven’t come across them before, they’re a way of tracking third party software and incorporating your changes (where you either don’t have access to the repository, or don’t want that level of granularity). At work, I have a wordpress blog, and I keep it up to date (including all my customizations) using a vendor branch. It’s worked really well.

It took a bit of searching, but eventually I found a mail message describing tracking changes when the upstream uses snapshots, which is the same thing as “vendor branches”. The process is really quite elegant and simple, and makes use of git’s superb branching.

To start with, you just unpack the software and set it up as a new git repository.

  $ unzip
  $ cd wordpress
  $ git init
  $ git add .
  $ git commit -m "Import wordpress 2.3"
  $ git tag v2.3
  $ git branch upstream

The clever bit is the last line: create an “upstream” branch, which you can use to store the newer releases.

With that, you can go away and start making changes: wp-config.php, add a few themes and plugins, that sort of thing.

Now, when a new release comes along, there’s a simple process (easily wrapped up in a small shell script) to incorporate it.

  $ cd wordpress
  $ git checkout upstream
  $ rm -r *
  $ (cd .. && unzip
  $ git add .
  $ git commit -a -m 'Import wordpress 2.3.1.'
  $ git tag v2.3.1
  $ git checkout master
  $ git merge upstream

There’s a few tricks in there.

  • I know that wordpress always unzips into a wordpress directory, so I can unzip it on top of the current code.
  • git add . catches all the modified and added files.
  • git commit -a catches the removed files.
  • The magic is in the last two lines—switch back to your own lineage, and pull in all the changes in the latest version of the code. If there are any conflicts, git will point them out to you and let you deal with them.

After a few tries, you end up with something like this.

Wordpress on a vendor branch in git

That is, you have a wordpress 2.3.3 install with all your customizations applied.

I confess, this is all probably overkill for something like wordpress. But it’s still a useful technique to know about.

Update: I’ve just realised that the above solution isn’t perfect, particularly for things like wordpress and mediawiki, where there’s a tendency to upload files into the repository area. Frequently these aren’t in version control so the rm -r * above could blow the uploads away. There are two choices:

  1. Configure the software to not modify anything in the repository (e.g. in wordpress, change Options → Miscellaneous → Store uploads in this folder).
  2. Have a cron job to regularly add & commit uploads into the repository. Depending on the size and frequency, this may or may not be preferrable.