December 28, 2011

Bulletproof your blog: a guide for surviving traffic spikes [old version]

I have written an updated version of this article.

It’s frustrating to see a story that looks really interesting on a site like Hacker News, but when you click on the link you find that the traffic surge has brought the blog down. This is probably even more frustrating for the blog’s author, who presumably has put a lot of time/effort into writing their post.

This happens more often than you might think – for example, today this great post about a tool to kill Mac .DS_Store files was down for at least several hours while it was on the front page of HN. It’s back now, but without any photos.

Fortunately, there are some pretty easy things you can do to prepare your website for a traffic spike. Here are some of the best ways you can build a bulletproof blog to avoid traffic-related downtime.

500 HTTP Status Cat
Your readers don't want to see this. (500 HTTP Status Cat, from girliemac on Flickr)

Easiest: Use a free blogging platform

Lots of websites today are powered by blogging platforms such as Tumblr and Posterous. These platforms are good because you can completely customize the template for your blog while letting the platform’s back end and servers do the heavy lifting. You can also use your own domain name with both for free (e.g. 8tracks uses Tumblr). Both are pretty mature, so they should have good uptime and both have social feature baked in (for better or worse). Thanks to Disqus, it’s easy to add comments to your Tumblr or Posterous blog.

You could use Blogger or WordPress (both free). Blogger recently received a much-needed revamp and it integrates with your Google/gmail account. wordpress.com has a free option, but charges for add-ons like using your own domain name. Both do not offer the same level of customization as Tumblr or Posterous.

Advantages: Free, easy, posting is quick, hosting is done for you so you don’t have to worry about dealing with traffic.

Disadvantages: If the platform goes down, there’s nothing you can do. If you want to switch platforms, you have to rely on an import/export tool or wrangle the API yourself. When they change stuff around, there’s nothing you can do about it.

Recommendations: Both Tumblr and Posterous work well in my experience. I prefer Tumblr’s interface to Posterous, but Posterous has a reputation for being easiest blogging site to use. Accounts are free, so try them both out and see which you like better. Blogger, wordpress.com, and other platforms generally aren’t as customizable, so I would skip them.

Harder: Self-host your blog

When people want greater customization than Tumblr or Posterous offer, they often move to self-hosting. WordPress is by far the most popular blog software you can self-host. However, this means you have to pay for a server and worry about bandwidth usage, traffic, etc. You’ll also need to know how to set up WordPress and install some plugins.

Don’t self-host unless you need more customizability than a blogging platform offers and you have experience with, or want to learn about web server basics.

Choosing a host

There are a few different ways you can host your own site. Probably the most popular is a standard shared host, like Dreamhost. Shared hosts run lots of websites on the same physical server. They are cheap, but you won’t have the same level of control you have with other options – usually just (S)FTP access and a web-based control panel. This can be a good thing if you don’t want to deal with server stuff. Shared hosting will run you $5-$10/month.

If you want more control over your sever than is possible with shared hosting, most people go for a VPS or a dedicated server. A VPS is a virtual server that runs on the same physical hardware as other VPSs, but acts like it’s a separate, independent computer. A dedicated server is an entire physical machine all for yourself, so it’s much more expensive than a VPS and is overkill unless you get millions of visitors per month. VPSs start at $10-$20/month and go up from there if you need more CPU, memory, or storage space.

With a VPS (or dedicated server), you will have to worry about properly setting it up, securing it, updating its software, and making sure it has enough available resources to handle hits to your website. There’s a good comparison of VPS providers that can help you choose a provider. It’s a few years old, but still generally matches my own experience.

There are other options that are somewhere in between shared hosting and VPS/dedicated servers. For example, if you are using WordPress there are dedicated WordPress hosts like BlogOnCloud9 and WP Engine (affiliate link) that can take care of some of the hard stuff for you.

Shared hosting recommendation: I would only recommend a shared host that charges for usage instead of a flat monthly fee. Most blogs use very little server resources and bandwidth, so you’re overpaying for a $10/month shared host. I use NearlyFreeSpeech.NET whenever I need a shared host. They cost about $3/month for your average WordPress site and traffic spikes are cheap too.

If you’re new to web hosting, NearlyFreeSpeech.NET can be intimidating. I’ve heard good things about a more traditional shared host called A Small Orange, but haven’t used them myself.

VPS recommendation: If you’re trying to keep it cheap, go with Rackspace Cloud, which starts at $11/month for a low-end Linux VPS. Linode is my favorite VPS provider, but servers there start at $20/month.

Withstanding traffic spikes on your server

Blogging software like WordPress generally stores your content in a database. Every time someone looks at a page on your site, your web server will ask the database for the content that appears on that page and will then dynamically generate the HTML for that page to include this content.

This process takes time (usually less than half a second), but your web server can only generate one page at a time (or a few pages at a time depending on the configuration).

Let’s say it takes 0.5 seconds for your server to generate the home page of your blog. If two people visit your site at the same time, they are put in a queue: your site will appear in 0.5 seconds (the normal amount of time) for the first person, but the second person will have to wait 1 second as the server generates both the first person’s page and his own page.

This becomes a problem when you are receiving multiple visitors each second – the server does not have time to generate pages as requests come in, so the queue gets longer and longer, and your site eventually becomes inaccessible.

The solution to this problem is caching. There’s often no reason to dynamically generate your blog’s home page for each visitor – you can generate it once, save (cache) it, and serve the page from your cache thereafter. Serving a cached page is significantly faster than dynamically generating it.

Caching depends both on the sever you are using (e.g. Apache or nginx) and the blogging software you are using (e.g. WordPress or Textpattern). There is good WordPress documentation on caching and there are lots of tricks you can use, such as microcaching with nginx. In any case, you’ll need to spend some time researching and implementing caching for a self-hosted blog.

Self-hosting summary

Advantages: Control, customizability.

Disadvantages: More expensive than a blogging platform in terms of initial setup time, maintenance, and recurring monthly costs.

Recommendation: Don’t do this unless you really need extra customization or like wrangling servers.

Geekier: Static website

Instead of storing the content of your blog in a database, you can store it as plaintext and generate a static website. This website uses this technique: I write posts in a text editor with a formatting syntax called Markdown and then Jekyll turns these files into the website you’re viewing now.

This has a number of advantages:

However, you have to be comfortable in the command line for a static site to practical. You also have to rely on client-side Javascript for any dynamic content (e.g. blog comments or displaying your Twitter feed).

Update: There is an easy-to-use, simple static blog platform that I should have mentioned here called Calepin.co. It pulls in your content from a folder in Dropbox and publishes it statically at a custom subdomain (see http://masnick.calepin.co/test-post.html for an example). Calepin.co uses nginx to statically serve content, so it should handle traffic spikes easily. You can’t use a custom domain or analytics yet, but Calepin.co’s developer says this is in the pipeline. As it stands, this is an easy-to-use alternative to Tumblr or Posterous for people who like Markdown but don’t like the command line.

Another update: Calepin is now defunct, but DropPages and scriptogr.am are similar options.

Using a static site generator is becoming more and more common among people in the tech industry. Amazon.com’s CTO, Werner Vogels, uses Jekyll and S3 (like this website). Marco Arment, the creator of Instapaper, uses a custom-written static site generator called Second Crack. Paul Stamatiou wrote a long post about switching his blog from WordPress to Jekyll.

If you’re just getting started with Jekyll, take a look at Octopress. It’s a framework on top of Jekyll with a bunch of useful features, such as theming, Octopress-specific plugins, syntax highlighting for code, and more.

Advantages: Hosting is virtually free, complete control over writing environment, easy to back up and search locally.

Disadvantages: Not for the non-geeky. You can only post from a computer where you have your static site generator set up. No ability to do dynamic page generation.

Recommendation: Unless you have some programming or web development experience, static site generation is probably not for you. If you are a hacker, this can work great.

Updates

  1. Add Octopress and WP Engine links. (25 Dec 2011)
  2. Add Calepin.co link. (28 Dec 2011)
  3. Add alternatives for Calepin.co. (30 Jun 2012)

Comments? Please send me a message.

Subscribe via RSS or email.