The Zend Framework provides a Zend_Cache which can be plugged into with various backends such as SQLite, Memcached, APC, and so on. Separately, it also provides Zend_Registry which is a “a container for storing objects and values in the application space”. The Zend_Registry is not a cache as its contents are created and used only by the currently executing script.
So, why would you want to use the Registry as a cache when it does not cache anything between page loads? The answer is to provide a transition point for caching additional data in other Zend_Cache backends.
For example, every time a Zend_Db_Table is instatiated it runs a DESCRIBE TABLE query which is a surprisingly expensive query (or at least it was surprising to me). If you are using the MVC model, you can end up running this query dozens of times on one page. So to speed things up you should cache the results of the DESCRIBE TABLE query. You will end up improving performance whether you save the results in the Registry or (even better) in an appropriate Zend_Cache backend.
However, at the moment you have not configured your Memcached daemon so you instead decide to use the Zend_Registry. But Zend_Registry does not follow the same syntax as Zend_Cache. So, when you do finally set up Memcached you will have to go back, edit your code to follow the Zend_Cache sytanx, and then test your cache. It’s better to instead use Zend_Registry as a backend to Zend_Cache which will make it utterly simple to change the cache backend to Memcached at a later date.
Now that I have covered how to load balance multiple web servers and how to keep their content synchronized there is one more major problem to solve: sessions. You need sessions to identify a particular user from request to request (remember HTTP is stateless). Usually session data is stored on the local filesystem. However with multiple load balanced web servers, a user can be thrown from one web server to another meaning that you can not count on saving session data in the local filesystem.
Most load balancers, including nginx (through the ip_hash command), do allow you to make your sessions “sticky” which means that a particular user will be sent to the same web server for the duration of his session. This allows for you to again rely on the local filesystem to save your sessions. However, sticky sessions have a greater likelihood for uneven load distribution. Plus when a particular web server goes down, all of its user’s sessions will be lost.
It would be better if sessions could be stored in a location that all the web servers could access. If you have a SAN, that would be one option. But, what most people already have is their database. So, let’s save our sessions in MySql. The obvious downside to using your database for sessions is that the database is slower than using a local filesystem. However, for most sites (even many large ones), the performance difference will be negligible.
After learning how to load balance, you still need to keep your web files consistent between your web servers. My tool of choice for doing so is rsync which includes smart features such as delta uploads (if it notices a file has changed it will only upload the difference, not the whole file from scratch).
I am assuming that you will have a particular “main” web server which you always update with new content first. The new content can either be “pushed” by the main web server to dependent web servers running rsync daemons, or it can be “pulled” by dependent web servers from the main web server. I suggest running a “pull” environment because your main web server will not need any knowledge of the existence of the dependent web servers.
In a previous post we saw how simple it is to set up nginx in front of apache, and in this post I’ll show you it’s just as easy to use nginx as a load balancer.
Load balancing can be left to either hardware or software. For most of us, the expensive hardware is out of the question, but cheap (free) software will solve our needs just fine. Here’s a look at how nginx does load balancing
upstream mysite {
server www1.mysite.com;
server www2.mysite.com;
}
server {
server_name www.mysite.com;
location / {
proxy_pass http://mysite;
}
}
The above configuration will send 50% of requests for www.mysite.com to www1.mysite.com and the other 50% to www2.mysite.com. However, if you add a “weight” tag onto the end of the “server” definition you can modify the percentages. Other useful options include max_fails and fail_timeout. For sticky sessions use ip_hash. Refer to the full documentation for further details.
Now that you know how to load balance, you will need to learn how to sync your files between multiple web servers.
I’ve been evaluating nginx, a lightweight web server, for the last week and I am coming away impressed. Over the last year or so nginx seems to have overtaken lighttpd for the crown of lightweight web servers.
In our case nginx is used to serve static files while apache is used to serve dynamic content (we also use nginx for simple load balancing). A request for http://www.mysite.com/file.extension will first be sent to nginx which will determine whether to serve the file itself (if its static) or if not it will request the url http://localhost:8080/file.extension from apache and pass back the result seamlessly to the end user.
Sites that accept user uploads (photos, documents, music etc) will need to need to determine an appropriate directory structure to house the large number of files they will collect. At first glance you may decide to just prefix all filenames with a userid and stick them all into one directory. Maybe even broken up into something like:
/uploads /photos /music /documents
If user 75474 uploads a photo, it will be named 75474_randomstring.jpg and put in directory “/uploads/photos”. However, over time the photos (and music and documents) directory will become huge. File systems of practically all kinds do poorly with large directories. Things run slower, become more error prone, and batch operations become difficult. You do not want huge directories
On a recent multi person project, we’ve used a subversion client to directly pull the latest project files into the web directory. We do so because its a complicated environment that we have not yet created individual sandboxes for. To test a change, the code must be committed to subversion and then we execute “svn export” on the webserver to pull the latest files from our subversion repository directly to the web directory. The only downside seemed like we were going to have a crazy number of revisions. But we’ve also run into one other problem: APC.
For those of us who took an intro database class, the topic of database normalization was covered ad homonym as an absolute necessity in any database design. So, it takes a little bit of effort to pull yourself away from such a time tested concept. But, sometimes its absolutely necessary. (For those who are not familiar with the term, database normalization basically means “don’t repeat any information ever”).
Let’s say you have a social network site with a huge table of private messages between users with each row representing an individual message between two users. Each time a user logs in you want to query the number of unread messages to the user with a query like:
select count(*) from privatemessages where unread = 1 and userid = $userid
However, you find that because your private messages table is 6gb’s large and that some users have thousands of messages that the query is taking an unacceptably long time to run. So what do you do?
You could use a cache such as Memcached to store the number of unread messages for each user in memory, which would yield a significant improvement. However, Memcached still needs to occasionally run the query. To speed up the query, you break database normalization. Why not create another table called privatemessagescount which could store the number of unread messages per user (with columns “userid” and “numunread”). Each time the user receives a new message or reads a new message there will be an additional write to the DB, but it will be a simple write. And more importantly, your reads will be much faster because you will be looking up on a primary key (“userid”). Now the query simply becomes
select numunread from privatemessagescount where userid = $userid
So for performance sakes you’ve broken database normalization and added additional overhead to your application. But, I’d rather have a fast site than a normalized site that is too slow to be used.
Hey all, I’m Sameer Parwani a 24 year old web developer from Massachusetts. I aim to use this blog to contribute to the tech community with my own thoughts, tips, insights, and so on. Hopefully, I’ll soon get to making the blog more customized, but for now I’m using this simple nice theme from the guys over at Blog2life.
I am the owner and creator of RateDesi.com and RateHispanic.com which are 2003 era social networks from the time when the word “social network” did not exist. I’ve also lately been working on Funki.jp as well as some other projects I can’t mention here. I’ll try to add a portfolio or resume to the site soon.

