Subtext to WordPress: Converting blog engines

I think my mom is my only constant reader, but if you used to read jasonkemp.ca when I was writing it more frequently than the last year and a half, you’ll notice a few changes in appearance that only scratch the surface of the changes that I’ve made.

I moved hosts; I moved OSes; and I moved blog engines. I decided to move to WordPress on LAMP from Subtext on IIS. And I’m really happy with the move, too, which is rare, if Google is any indication; everyone else is moving the other way. There was only one, Aaron Lerch, who went in my direction and he wrote an import plug in, which I didn’t use because it required all these extra downloads. [Um, re-reading Aaron’s post, I see now that his BlogML Import plugin is probably the way to go if you’re in this situation. There’s only one extra download.]

To make matters worse, I had a version of Subtext that must’ve been a development version because the BlogML export didn’t work at all. So I had to use all my hacking skills to read directly from the database with the current Subtext source (Thank goodness it was open source!). I’ll spare you the details of that export; it wasn’t perfect or pretty but I got my posts, comments, and categories, which was close enough for what I wanted. So I had a BlogML XML file and I needed to convert to a WordPress export file, which is just some gussied up RSS. In the intervening years since I stopped blogging and now, .NET 3.5 was released, so I thought this was the perfect time to play with LINQ and the new XML APIs.

LINQ is seriously cool. Working with XML is seriously not, no matter what API you have. Those were the two conclusions I came up with. The new XML APIs are pretty sweet, though. They’re the closest I can come to writing code that handles XML the way I think about XML. But that’s not what I’m writing this post about.

The conversion got me about 90% there, but there were a couple things that I needed to fix: mapping old urls to the new ones because your urls are probably the most important thing about people reaching your website and handling image paths; both of which are handled with mod_rewrite and the htaccess file. Until IIS 7, I don’t think Windows had anything as cool as that and even with IIS 7, it may not, certainly not as simple as a text file.

Using this helpful article, I generated all the rewrite rules from the BlogML export file, since it had all the old post urls. I only had roughly one hundred articles so I just hard-coded everyone of them. They all take the form of

RewriteRule ^rss.aspx http://www.jasonkemp.ca/blog/feed [r=301,nc]

I didn’t want to mess with mod_rewrite too much, because regular expressions require a quiet room and lots of testing, so I kept it pretty simple. What the above says is if my web server receives a request for rss.aspx, then permanently redirect to http://www.jasonkemp.ca/blog/feed, that’s what the r=301 means. The ‘nc’ means case doesn’t matter. Spaces matter here! Notice there are no spaces in the the square brackets. It won’t work otherwise.

The last thing I had to do to convert was images. My images were in two spots: the root folder and in an ‘images’ folder. So my posts have those paths in them. Rather than go through all my posts and changing the paths, I added another rewrite rule:

RewriteRule ^(images/)?(.+)\.(gif|jpg)$ blog/img/$2.$3 [nc]

That’s about as fancy as I get in mod_rewrite. What this says is any jpg or gif file either in the root or in the path images/, then redirect to blog/img/ with the same name and extension. So if I had an image in a post with the path images/1.jpg, then mod_rewrite will convert that to blog/img/1.jpg.

Converting this took a few weeks of off and on development. Changing blog engines is anything but trivial right now. If anyone is curious about the code, just ask.

Technorati Tags: ,,,