General

Fixing basic encoding issues migrating / upgrading a WordPress database

I recently joined Automattic as a Code Wrangler (more on that soon), and I decided now was the time to de-spam, combine and resuscitate my old WordPress blogs.

Along the way I noticed a few annoying encoding issues crept in. The Wordpress XML exporter/importer, for example, double-escaped all my quotes so that <a href=”this”> became <a href=”this”>, which as you can imagine broke… everything.

The SQL query to fix this is:

UPDATE `wp_posts`
  SET post_content_filtered = REPLACE(post_content_filtered, '\"', '"'),
      post_content = REPLACE(post_content, '\"', '"')
  WHERE post_content_filtered like '%"%' OR post_content LIKE '%"%'

Next I noticed a lot of double-quotes and ellipses had turned into gibberish like ““” and “—. Here’s the SQL queries to deal with that:

UPDATE wp_posts SET post_content = REPLACE(post_content, '’', '''), post_content_filtered = REPLACE(post_content_filtered, '’', ''');
UPDATE wp_posts SET post_content = REPLACE(post_content, '“', '“'), post_content_filtered = REPLACE(post_content_filtered, '“', '“');
UPDATE wp_posts SET post_content = REPLACE(post_content, 'â€', '”'), post_content_filtered = REPLACE(post_content_filtered, 'â€', '”');
UPDATE wp_posts SET post_content = REPLACE(post_content, '¦', '…'), post_content_filtered = REPLACE(post_content_filtered, '¦', '…');

Once I ran these few queries, I couldn’t see any more formatting errors. Others online have advised disabling the line “define(‘DB_CHARSET’, ‘utf8’)” in the wp-config.php. This seems inadvisable to me as it may allow your database to become even more inconsistent, by not enforcing a single clear and future-proof encoding on all content.

Now that everything’s been cleaned up my old content looks great and I can move on to customising the theme of my site.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.