So I got fed up with typo, for one minor reason and another (mostly my fault, I confess). Frankly, I’m just more used to wordpress, as I use it at work (blogs make great notepads). So, I decided to switch. It’s been an interesting exercise…
The bulk of the work is building a converter. Thankfully, typo is built on rails, so it was relatively easy to hack on. I looked through wordpress and reckoned that the easiest way to get data in was through the wordpress backup mechanism, WXR (WordPress eXtended RSS). There’s a dearth of information about this format, but after taking a quick look, I figured it wouldn’t be too hard to get typo to spit out something similar. It was basically RSS with a few extra elements.
The initial work of producing the WXR feed from typo went quite quickly. However, when I tried to import it into a test wordpress installation, the problems started. It turns out that wordpress doesn’t use an XML parser for reading in WXR. It uses regexes.
As any fule no, parsing XML with regexes is bound to lead to pain. And indeed it is. Getting builder.rb to emit non-indented tags (as wordpress desires) when it’s in the middle of emitting indented stuff is yucky.
xm.target! << '<< 'CDATA[' xm.target! << html(item) xm.target! << ']]' xm.target! << '>'
Anyway, I thought I’d cracked the problem, once I got the export working. I immediately saw that all the XML in several posts was coming out double encoded. Doh. One quick wordpress plugin fixed the issue quite neatly (wp-correct-syntax-xml.php).
PHP is very amenable to cut’n’paste.
Then I remembered to look through my access logs. And I realised quite how many RSS and Atom feeds I publish. Much time later, I now have 15 hairy RedirectMatch permanent beckoning feed readers into the correct place.
Then I had a final thought. The IDs of the articles are going to be different in the feeds. In order to correct that, I had to go back to the export and insert the correct guid elements. This led me to a slight wild goose chase in typo. If you look in app/views/xml/_atom10_item_article.atom.builder, you see:
And the IDs that I was seeing coming out in the feed were Tag URIs. It took me a while to track down that the default feed for typo takes an entirely different route going through articles_controller.rb and into app/views/articles/_atom_feed.atom.builder, which in turn makes use of Rails’ AtomFeedBuilder::entry. That’s what spits out the Tag URI.
Then I realised that the RSS feed was spitting out UUIDs instead of Tag URIs. Sigh. But I looked at the stats and I’m serving more than twice as much Atom as RSS, so I decided to stick with Tag URIs.
Finally, I pulled the big switch, and hit refresh in NetNewsWire. And all of my posts are marked as “new”. Bloody hell, I have no idea what’s caused that. It’s too late and I give up for now. Apart from that, things mostly seem to be working.
In case anybody else is interested, here’s my patch against typo to produce WXR feeds. It’s very simplistic, has no tests and is strictly “works well enough for me”.
Update: I don’t know how, but I completely managed to miss the typo-wordpress converter. This would have been much simpler. Although as I was migrating off of textile at the time, it made sense for me to export HTML. Ah well, it was an interesting learning experience.
Rails Security Hole
I now have the fixed version of typo (soon to be 4.0.2), around an hour after it was committed.
As to the whole “full disclosure” thing by the rails team? They handled it pretty badly. As somebody else commented, it didn’t work for OpenBSD a while back and if anybody could do that, OpenBSD could.