Consolidating Blogs

Consolidating Blogs
“Just as the constant increase of entropy is the basic law of the universe, so it is the basic law of life to be ever more highly structured and to struggle against entropy.”– Vaclav Havel

I love Vaclav Havel, but I don’t agree. The basic law of my life has been to be ever less structured, and embrace entropy. So when it comes to blogging, for example, I’ve randomly switched around between various platforms over the years:

my on-again/off-again relationship with WordPress

It’s even worse than that, actually: I dallied with BlogEngine.net for a while in 2010/2011 (hosted on Mahugh.com), but luckily I never backed up that data (stored in XML files), so those 3 or 4 posts are simply lost forever. Whew, one less set of blog posts to consolidate.

But the rest of these blog posts still exist, and until last week they were scattered around in several places:

  • The 631 post from my original WordPress blog were in a .SQL backup file that I exported in late 2008 before shutting down my Textdrive hosting account.
  • The 14 Blogger posts were still on my Google Blogger blog.
  • The 51 WordPress posts on Mahugh.com were still right here, on Mahugh.com.

I’ve wanted to get them all onto Mahugh.com and move forward with one consolidated all-inclusive blog, but how? Can you consolidate a WordPress blog with a backup of another WordPress blog? In short, the answer is “not without a lot of manual effort, and risks, and pain, and other bullshit.” If you do searches on the topic, you can find various warnings about those things.

Exacerbating the situation, my old WordPress blog was version 1.5.2, which I installed in October 2005 and never upgraded because I had foolishly hacked up some of the PHP files to customize various things. I know better now, but I didn’t then. So I had a .SQL backup of a version 1.5.2 WordPress blog, and a live version 3.5.1 WordPress blog, and as I learned by looking closely at the schemas, the WordPress team changed many details between those versions.

What’s a guy to do? The answer seemed obvious: roll up your sleeves and write some code, Doug!

I figured that if I wrote a program to migrate my  600+ old WordPress posts into my current blog, I could then find a way to deal with the small number of Blogger posts as a final step. So I imported my .SQL backup into a local MySQL database (using MySQL Workbench), and then started thinking about how to write a PHP program to migrate the data as shown by the red arrow here:

migrating old WordPress posts

But where should that code run? I tried running it locally on my laptop, but when I tried to open the MySQL database on my web site, I got this discouraging message:

“No connection could be made because the target machine actively refused it.”

Well, I guess that makes sense. Wouldn’t want just any old machine being able to access my database. But after spending a bunch of time on things like whitelists and rehashing of passwords (because of different MySQL versions), I still didn’t have a reliable way to make this work. Time to stop and think again.

Then I had a realization: there’s no particular reason my old data needs to be in a local database. I could just put it into a database on Mahugh.com, then run my data migration code on Mahugh.com, and it should have reliable access to both the old-data database and the live-data database powering my current WordPress blog. Something like this:

the revised plan: a temporary MySQL database in the cloud

The folks at MediaTemple (my ISP) have a helpful knowledgebase article on how to import and export MySQL databases, and after following their instructions (Putty and SSH are your friends), I had my old data on Mahugh.com, and verified that a program on Mahugh.com could reliably access both databases. Getting close!

Next, I tried writing code to pull the blog posts and comments out of the old database and store them in the live database. Back in the 80s and 90s, I spent many years of my life helping clients migrate data between undocumented relational databases running on various platforms, so this seemed a simple problem to solve. But after a bit of experimenting (with my live data well backed up, of course!), I realized that there’s a reason WordPress pros say you should never write directly to the database, but rather use the WordPress APIs. I managed to get my blog in various brain-damaged states, and eventually restored things and moved on to studying the documentation for the WordPress APIs.

I decided there was no reason to complicate things with multiple versions of the WordPress API — I didn’t need any help getting at my old posts and comments, I just needed help storing them correctly in the live database. So I wrote a loop to scan through my old posts and add each one to the live blog using the wp_insert_post() function, with a nested loop to store the comments for each post using the wp_insert_comment() function.

And here it is, the code that migrated 631 blog posts and associated comments from my old data into my current live blog:

// mergedata.php - merge old WordPress blog data with live data
// old data is from Doug's World 2005-2008
// live data is from Mahugh.com 2012-2013
 
echo "
Initializing ...";
 
require '../../wp-load.php';       // include WordPress API functionality
require 'db_connect-livedata.php'; // creates $mysqli_livedata
require 'db_connect-olddata.php';  // creates $mysqli_olddata
 
echo "Begin main loop (iterating through old data) ...";
 
$query = "SELECT ID,post_date,post_date_gmt,post_title,post_content FROM wp_posts";
$result = $mysqli_olddata->query( $query );
$num_results = $result->num_rows;
 
$posts_processed = 0;
 
if ( $num_results > 0) {
    while ( $row = $result->fetch_assoc() ) {
        extract($row);
        echo "
Migrating post: old ID={$ID} date={$post_date} title={$post_title}";
 
        // save the post ...
        $post_data = array(
            'post_title'    => $post_title,
            'post_content'  => $post_content,
            'post_status'   => 'publish',
            'post_author'   => 1,
            'post_date'     => $post_date,
            'post_date_gmt' => $post_date_gmt
            );
        // $postID = the auto-incremented ID assigned to this post 
        $postID = wp_insert_post($post_data);
        if ( is_wp_error($postID) )
            exit( $postID->get_error_message() );
 
        // save the comments ...
        $commentquery = "SELECT comment_author,comment_author_email,
            comment_author_url,comment_content,user_id,comment_author_IP,
            comment_agent,comment_date,comment_date_gmt,user_id FROM wp_comments
            WHERE comment_post_ID=$ID AND comment_approved='1'"; // note use of $ID, which is the old/original ID of this post (in the old database)
        $commentresult = $mysqli_olddata->query($commentquery);
        $num_comments = $commentresult->num_rows;
        if ( $num_comments > 0) {
            while ( $commentrow = $commentresult->fetch_assoc() ) {
                extract($commentrow);
                $comment_data = array(
                    'comment_post_ID' => $postID, // use the newly assigned ID, to associate comments with the new post
                    'comment_author' => $comment_author,
                    'comment_author_email' => $comment_author_email,
                    'comment_author_url' => $comment_author_url,
                    'comment_content' => $comment_content,
                    'comment_type' => '',
                    'comment_parent' => 0,
                    'user_id' => $user_id,
                    'comment_author_IP' => $comment_author_IP,
                    'comment_agent' => $comment_agent,
                    'comment_date' => $comment_date,
                    'comment_date_gmt' => $comment_date_gmt,
                    'comment_approved' => 1,
                );
                $commentID = wp_insert_comment($comment_data);
                if ( is_wp_error($commentID) )
                    exit( $commentID->get_error_message() );
            }
        }
 
        $posts_processed += 1;
        // if ($posts_processed >= 1) break; // for testing
    }
} else {
    echo "No records found.";
}
 
echo "
Total posts migrated: {$posts_processed}";
 
$result->free();
 
$mysqli_livedata->close();
$mysqli_olddata->close();

There’s one little detail worth noting, for anyone else who wants to try this sort of thing. The post ID is an auto-incrementing field, so when each post is added to the live blog it gets a new ID assigned — this is the value returned by wp_insert_post(). So when I migrated the comments, I used that ID for each comment, to make them appear on the correct post.  I didn’t do this the first time, which meant my old comments (over 2000 of them) were randomly distributed among the posts on my blog. (Now would be a good time to remind everyone once again of the importance of having backups!)

At this point, I had one thing left to do: migrate the 14 posts from my Google Blogger blog to the live blog. Blogger has an XML export option, but after looking at the messy and surprisingly complex XML it generates I decided that writing and verifying code to deal with that would take longer than just copying and pasting 14 posts and their comments into the WordPress admin interface and back-dating them to match the post datetimes on Blogger. After less than two hours of copy&paste, I finally had all my eggs in one basket:

one blog to rule them all

Going forward, I have plans to pretty things up with a new theme and some customizations. But for now, it’s just nice to finally have eight years of blog posts all back in one place together. And this time around, they’re going to stay that way!

UPDATE: I kept all of these posts on mahugh.com through 2018, but in 2019 I decided to start a new site here (dougmahugh.com), and only migrated the posts I felt are still useful or interesting. For anyone who needs to consolidate old WordPress blogs, you can find the code here: https://github.com/dmahugh/wordpress-migration