All Change!

So I got fed up with typo, for one minor reason and another (mostly my fault, I confess). Frankly, I’m just more used to wordpress, as I use it at work (blogs make great notepads). So, I decided to switch. It’s been an interesting exercise…

The bulk of the work is building a converter. Thankfully, typo is built on rails, so it was relatively easy to hack on. I looked through wordpress and reckoned that the easiest way to get data in was through the wordpress backup mechanism, WXR (WordPress eXtended RSS). There’s a dearth of information about this format, but after taking a quick look, I figured it wouldn’t be too hard to get typo to spit out something similar. It was basically RSS with a few extra elements.

The initial work of producing the WXR feed from typo went quite quickly. However, when I tried to import it into a test wordpress installation, the problems started. It turns out that wordpress doesn’t use an XML parser for reading in WXR. It uses regexes.

As any fule no, parsing XML with regexes is bound to lead to pain. And indeed it is. Getting builder.rb to emit non-indented tags (as wordpress desires) when it’s in the middle of emitting indented stuff is yucky.! << '<< 'CDATA['! << html(item)! << ']]'! << '>'

Anyway, I thought I’d cracked the problem, once I got the export working. I immediately saw that all the XML in several posts was coming out double encoded. Doh. One quick wordpress plugin fixed the issue quite neatly (wp-correct-syntax-xml.php).

PHP is very amenable to cut’n’paste.

Then I remembered to look through my access logs. And I realised quite how many RSS and Atom feeds I publish. Much time later, I now have 15 hairy RedirectMatch permanent beckoning feed readers into the correct place.

Then I had a final thought. The IDs of the articles are going to be different in the feeds. In order to correct that, I had to go back to the export and insert the correct guid elements. This led me to a slight wild goose chase in typo. If you look in app/views/xml/_atom10_item_article.atom.builder, you see: "urn:uuid:#{item.guid}"

And the IDs that I was seeing coming out in the feed were Tag URIs. It took me a while to track down that the default feed for typo takes an entirely different route going through articles_controller.rb and into app/views/articles/_atom_feed.atom.builder, which in turn makes use of Rails’ AtomFeedBuilder::entry. That’s what spits out the Tag URI."tag:#{},#{@feed_options[:schema_date]}:#{record.class}/#{}")

Then I realised that the RSS feed was spitting out UUIDs instead of Tag URIs. Sigh. But I looked at the stats and I’m serving more than twice as much Atom as RSS, so I decided to stick with Tag URIs.

Finally, I pulled the big switch, and hit refresh in NetNewsWire. And all of my posts are marked as “new”. Bloody hell, I have no idea what’s caused that. It’s too late and I give up for now. Apart from that, things mostly seem to be working.

In case anybody else is interested, here’s my patch against typo to produce WXR feeds. It’s very simplistic, has no tests and is strictly “works well enough for me”.

Update: I don’t know how, but I completely managed to miss the typo-wordpress converter.  This would have been much simpler.  Although as I was migrating off of textile at the time, it made sense for me to export HTML.  Ah well, it was an interesting learning experience.

6 replies on “All Change!”

It’s really more of a “worked well enough for me” I think. 🙂

Btw, can you stick in that Ajax preview thingie for WordPress, rather than the (IMHO annoying) live preview? (It also has a bug in that it tries to reload my (missing, if that matters) avatar image every time I hit a key.)

Right, I’ve figured out why some of the articles showed up as new — the wp-syntax plugin means I’m spitting out different content to before.

Plus, all the comments are showing up as new because I didn’t preserve comment IDs (naughty me).

Glad I know why…

I just meant that the guarantee is even smaller, since you aren’t even going to keep using the code yourself. It didn’t have to do with grammar. :)

(If there’s anything I want to actually nag about, it’s the fact that your server is unreachable as much of the time as it’s up. What’s up with that?)

Good point. But my grammar was still quite wrong. 🙂

The reason the blog is down so often is that it’s hosted in our “office,” which happens to be the guest bedroom. So, when guests come the server gets switched off at night so that periodic daily doesn’t wake them with disk rattling at 0301. It’s my long term plan to move the server elsewhere, but it’s all a matter of time…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s