Don't fear the clouds on the horizon

A few months ago we were looking for a new/better/faster scalable hosting solution for a project. So it was finally time to start looking at cloud computing.

Traditionally, scaling web applications beyond a single machine meant having to architect non trivial network configurations to provide both fault tolerance and flexibility to be able to add additional computer hardware as needed (to provide additional horsepower). It also meant both shopping for rack space and negotiating rates for bandwidth. And most costly of all, it meant paying someone or some group of people to manage the complexity.

To top it all off, they system could only scale as fast as you could add hardware to the equation. That fact usually resulted in buying too much hardware to 'build in' room to grow so you wouldn't have to scale the hardware up any time soon.

Wouldn't it be great if there was a way that you could have as much bandwidth, processing power, memory and storage as you needed when you needed it? Wouldn't that be better than having to guess how much power, memory and storage you might need. Wouldn't it be great if you could scale up your hardware in an instant rather than the hours/days it might take to install additional hardware?

There has to be a better way. Well, there is... kind of.

Amazon.com had to learn how to scale years ago. And with billions of dollars at their disposal they were able to architect their own solution for scaling their applications. For the past year or more they have now made their scalable network / hardware infrastructure available to anyone willing to pay for access to it. How much does it cost? Well it all depends, but the truth is that Amazon has abstracted bandwidth, processing power, memory and storage into the commodities that they are. They aren't sold in boxes or racks - they are sold in bulk by the pound. You only pay for as much as you need and use - and only for as long as you are using it.

Enter into the picture EC2 (Elastic Computing Cloud). As mentioned amazon has abstracted hardware. As an amazon client (or a client of any computing cloud provider) you no longer by a physical hardware, but rather 'instances' of virtual machines. Instead of installing your software on a specific server, you install it on an 'instance'. If your application needs more power you can either move to a more powerful instance or scale your application across multiple instances. In any case you pay by the pound... well actually you pay by the hour - specifically you pay by instance/hour. You decide when you need more power, and for how long you need it. Seeing a major spike in traffic and need to serve more people? Easy - add more instances to handle the traffic. Seeing that traffic is dropping down again? Simple - kill off un-needed instances. (Best of all, the spinning up of new instances and killing them off can be completely automated with scripts.)

What's the catch?

Well, cloud computing is really amazing stuff for people that need a lot of computing power at their fingertips (think serious academic or scientific computing - using non trivial computer science to solve problems). For these kinds of applications all you really care about is the output of the batch computing process and don't need to worry about keeping the computers alive (for the sake of persistent storage) after they have done their job. Nor do you really care about which IP addresses your application uses. But, for web hosting, you actually do care about persistent storage and static IP addresses. EC2 sadly didn't used to provide these. Luckily things are changing.

Up until very recently these were serious drawbacks of EC2. If you started an instance that was hosting a website, and that instance somehow stopped, you could restart the instance from an 'image' but all the data that was stored on the old instance would be gone. Not only that, when you fired up a new instance, you would get a new IP address meaning that you would have to change your DNS settings.

Enter into the picture S3 (Simple Storage Service). S3 is really a glorified file server. You put your files on S3 and it serves them up to whoever wants them (in the most geographically efficient way as possible). You just pay for bandwidth. Storage on S3 is persistent - so if you wanted to persist data (e.g. database snapshots) - you just have your EC2 instance send your data over there. If your EC2 instance fails and you have to start a new instance - you can script a solution to make restoring from backup part of the instance initialization.

But, what about IP addresses? Very recently EC2 has come up with 'elastic IP' which is essentially a virtual 'static' IP.
Meaning that you can fire up a new instance and use the same IP address as before.

A more robust solution to avoid the pitfalls of not having persistent storage is to use multiple instances as fail overs to provide fault tolerance - still using S3 as a backup space. (regarding fault tolerance: There is also something recently introduced by Amazon called 'Availability Zones' that provide a means to try and keep all of your instances from failing at the same time see: Setting up a fault-tolerant site using Amazon’s Availability Zones)

However, the word on the street is that soon EC2 is working on providing persistent storage to clients. If they do the last major barrier for entry will be removed. Working with EC2 instances will be just like working with physical machines without any of the drawbacks highlighted at the beginning of this article and all of the benefits of being able to scale or shrink your processing requirement as needed on demand and never pay for more than you are using.

I have obviously glossed over all the fine details of what it means to actually use EC2 for hosting web applications. But, in the very near future most of those fine technical details will be irrelevant. More accurately, the fine technical details will be transparent to end users.

If you want to start moving your applications to the cloud today, you will have to overcome some hurdles and flex some nerd muscle. In other words you will have to sweat the fine details missing in this article. Luckily there are plenty of resources on-line that discuss EC2 (and other cloud computing providers).

Additional resources:
Amazon EC2
Amazon S3
Right Scale - Blog
Right Scale - an amazon EC2 solution provider

April 16th 2008 12PM
By: andre
File Under: