Exporting Blog Posts from Subtext in BlogML

I’ve recently been asked for more detail on how I extracted my blog posts from Subtext. I hacked my solution together in a few nights of hacking. Once I got it working, I completely flushed all memory of what I did. But I was asked a whole two times, dear readers! With that overwhelming curiosity, I have no choice but to respond. For background, please read the previous post on what I did generally to convert from Subtext to WordPress.

I’m told that the BlogML export always fails in Subtext, which is the same problem I had. If I recall, it would look like it succeeded, but the XML file you got back would be empty. I was using a version that ran on .NET 1.1 (I think around 1.5, but I’m not sure), but the technique I outline below should work for Subtext 1.9.5. I’ll explain (with some code!) how I got around that and it will hopefully help some people out. I had one single-user blog running on Subtext; Multi-user may need far more attention. You will have to get your hands dirty in the code; and by the end of it, you’ll be more familiar with Subtext than you ever wanted to be. I’m writing this from memory, so take whatever I say as a guide rather than gospel. I no longer have a subtext blog set up so I can’t rerun this to see why I made the changes I made. If you’re really determined, you’ll find a way to get your data. I am willing to help anyone extract their posts from Subtext, though. Just contact me through this post, or through the contact form. Maybe we can make this a little more robust to help others.

When I first tried to export using the web interface, as I mentioned, I’d get an XML file, but it would be empty. Bummer! My first thought was that I had to figure out the right query for the database to extract all the posts. Then I would have to format all the posts into some kind of XML format to import into my new blog engine, WordPress. That’s a daunting task when you don’t deal with databases much. Luckily, I played some video games instead of bashing on and realized there was an easier way: using the BlogML exporter in Subtext!

I don’t remember exactly how I came to my solution — maybe I bombed around the code and concluded that the reason it was failing had little to do with the BlogML code? — but it worked, and this was the type of quick one-off problem to which one of my favourite sayings applied: If it’s stupid and it works, it isn’t stupid.

So here’s what I did. I created a ConsoleApplication1 project, referenced just enough of the subtext assemblies (Subtext.BlogML, Subtext.Framework and Subtext.Extensibility) to instantiate their BlogMLWriter and write out the BlogML to disk using an XmlTextWriter. Then I hacked the subtext code to make it compile and run without exceptions. Like I said last time, this solution is not pretty or perfect. To make matters worse, the subtext code I used in my hacked solution was not the same code that my blog was running, so the database versions didn’t match up. Column names had been changed or added; stored procedures had been renamed or added to. This made me do some seriously weird things like tweak stored procedures and DataReaders and I think install stored procedures that weren’t there. I did the types of things that shouldn’t really be recorded for posterity on a blog, you know?

Here’s the only code in my ConsoleApplication1 project:

class Program
{
    static void Main(string[] args)
    {            
        IBlogMLProvider provider = new Subtext.ImportExport.SubtextBlogMLProvider();

        provider.ConnectionString = "yourConnectionString";
        if (provider.GetBlogMlContext() == null)
        {
            Console.WriteLine("ERror'd");
            return;
        }
        BlogMLWriter writer = BlogMLWriter.Create(provider);

        using (XmlTextWriter xmlWriter = new XmlTextWriter("blog.xml", Encoding.UTF8))
        {
            xmlWriter.Formatting = Formatting.Indented;
            writer.Write(xmlWriter);
            xmlWriter.Flush();
        }
        Console.WriteLine("It worked?!");
    }
}

Looks pretty simple, eh? Too bad all the changes are buried deep in completely unrelated classes. I copied the subtext web.config and renamed it app.config. You have to set your connection string in two places in the config file as well as in the Program.cs file shown above (search for ‘yourConnectionString’). Also, since we’re running code written for ASP.NET in a console app, some things simply will not work, like HttpContext.Current. You’ll be shown them quickly after running the first time.

Below is an example of one of the changes I had to make:

private static SqlParameter BlogIdParam
{
    get
    {
        int blogId;
        //if (InstallationManager.IsInHostAdminDirectory)
        //    blogId = NullValue.NullInt32;
        //else
            blogId = 0;

        return DataHelper.MakeInParam("@BlogId", SqlDbType.Int, 4, DataHelper.CheckNull(blogId));
    }
}

This is in Subtext.Framework\Data\SqlDataProvider.cs. See the commented code? InstallationManager, through a number of indirections (like, I don’t know, 57; Subtext is way over-architected) uses HttpContext.Current, so I commented that out. I may have even hardcoded blogId to 0, I don’t remember.

You can download the ConsoleApplication1 project below and try it out if you wish.

Now, if you want to convert BlogML to the WordPress export format…