Your site can have a lot of traffic, for many different reasons. Apart from that, your site can experience peaks of traffic.
To deal with this you can build your own infrastructures, but today there are other solutions available, such as the ones provided by Amazon and by Google.
Amazon web services
They are several platforms:
- s3 is used for storage
- ec2 is an on demand virtual server controlled with web service api (you can use your favourite linux distribution). It provides Acl for port control, you can choose datacenter (currently only in the US), and do a snapshot backup to s3
- simpledb is a hash-like database that store items with attribute/value pairs. It is meant for small items, organized into domains, redundant and distributed, has no schema, in it everything is a strin, it allows to use list values, you use sql-like queries to retrieve data
Google apps engine
With this solutions you run your application directly on the Google infrastructure. There is no concept of hardware – you just deploy an application. For the moment it’s limited to python and for sure it has not the same flexibility of the Amazon Solutions. As a compensation for not having access to low level sockets you can use memcache, image, email, url fetch, google auth and users. The platform is limiting but takes care of scaling problems.
Bigtable is Google solution for database. It is very similar to simple db (no schema, list values) but also very different (data type support, references and multiple tables, blob files (1mb)). What is very limiting is that results can only last for a couple of second, after that they are killed by the system. On the other hand it is very easy to use. In few words, you have to accept the limitations.
With Google Apps Engine you have no background jobs, no possibility to backup/snapshot data, emails can only be sent from google accounts and it’s restricted to pure-python libraries and given apis
Considerations and usage suggestions
The impression taken from this session is that we need to use a lot of tricks to proficiently use these tools, even Amazon. The speaker illustrated some case such as uploading users data with authentication.
If the application I developed needs extra capacity for an unknown period of time with Amazon ec2 is quite easy to start additional instances. It’s a matter of using a time base systems, such as cron (amazon)
If the need is for something that is load balanced a possible solution is to itegrate ec2 usage with some monitoring tool, such as Monit. With these tools I can monitor if the load is too high and eventually add new instances. Monitoring for these solutions is the hard part to do because there is no ready solution for it
Even if the site has its own infrastructure that works it’s possible, if neededn, add extra capacity connecting to ec2, so to combine the best of both worlds. However ec2 is not available in Europe at the moment and so there could be latency problems.
Real life use cases of these platforms:
- googbad.me
- dawanda.com
- g.ho.st
Final thoughts
- get accustomed to eventual consistency (not sure that queries of few milliseonds are updated in all instances)
- be prepared to leave relational database
- many miss strong SLAs – most of the time u can live fine without
- hardware is a commodity – only specialize in it if it really necessary
Jonathan Weiss
A Ruby consultant and partner at Peritor Wissensmanagement GmbH in Berlin, Germany. For the last years he has been developing and consulting large Ruby on Rails projects where he focused on Scalability and Security. He is an active member of the Ruby and Rails community and is the developer of the Open Source deployment tool Webistrano. In his spare time he maintains Rubygems and Rails in the FreeBSD Ports system.