Image 01 Image 02

1
Posted on 4th May 2009 by Sameer

I recently had a conversation with Lalit Sarna of Oxylabs about scalability and he introduced me to Tokyo Cabinet, a key/value store (database). This category of databases, referred to as DBM, differs from a RDBMS (such as MySQL) in that there are no tables and therefore no concept of rows. Instead you soley provide keys and get/set/delete values for that particular key.

Your advantage is lightning speed. And, apparently Tokyo Cabinet is the king of the category. We are talking speeds of 10-50x greater than MySQL. Tokyo Cabinet supports multiple underlying database engines with each providing its own advantages.

  • Hash - Hash tables provide O(1) insert and lookup which can not be beaten so they are your fastest option
  • B+ Tree - The underlying data is sorted allowing for prefix and range matching. Speed is not quite as great as Hash since B-trees are O(logb n) for insert and lookup. See here for more details
  • Fixed Length - Your values are stored in one large array which is as fast as it gets since its O(1) and the data is concurrent. However your keys have to be natural numbers.
  • Table - Attempts to replicate a traditional table database, however no fixed data schema or data types are required. Built on top of the hash db for speed.

Tokyo Tyrant

Tokyo Tyrant is the network interface that sits on top of Tokyo Cabinet allowing your software to communicate with Tokyo Cabinet. Tokyo Cabinet is often referred to as the “database library” while Tokyo Tyrant is the “database server”. Tokyo Tyrant supports the memcached and http protocols.

I love that it supports memcached. That means you can plug and play with Tokyo Cabinet using the many existing memcached clients/libraries. Tokyo Cabinet is not meant to be a replacement for memcached, but you could theoretically use it as such and with minimal setup time. You would get the benefits of persistent data and much cheaper storage with some loss in performance.

In conclusion, targeted and appropriate use of Tokyo Cabinet would allow load to be removed from your traditional RDBMS in cases where the full functionality of a RDBMS is not needed, resulting in major performance improvements.

0
Posted on 20th October 2008 by Sameer

I host all my servers with The Planet and a few days back, all at the same time, my MySQL databases started hangin up. The process list (”show processlist”) was showing many many unauthenticated user connections from 192.168.xxx.xxx. MySQL was trying to do a reverse dns lookup on the connecting IP address and was either stalling or failing on the request. I assume something went wrong with the dns server.

The work around is to insert “skip-name-resolve” into your my.cnf file and restart the server and MySQL will no longer run reverse dns on connecting IP addresses. To avoid your facing sudden downtime like mine, I would recommend inserting that line into your my.cnf immediately before you run into the same problem. Of course, if your mysql.user table authenticates any user based on a domain then you can’t skip resolution of IP addresses.

1
Posted on 21st August 2008 by Sameer

Now that I have covered how to load balance multiple web servers and how to keep their content synchronized there is one more major problem to solve: sessions. You need sessions to identify a particular user from request to request (remember HTTP is stateless). Usually session data is stored on the local filesystem. However with multiple load balanced web servers, a user can be thrown from one web server to another meaning that you can not count on saving session data in the local filesystem.

Most load balancers, including nginx (through the ip_hash command), do allow you to make your sessions “sticky” which means that a particular user will be sent to the same web server for the duration of his session. This allows for you to again rely on the local filesystem to save your sessions. However, sticky sessions have a greater likelihood for uneven load distribution. Plus when a particular web server goes down, all of its user’s sessions will be lost.

It would be better if sessions could be stored in a location that all the web servers could access. If you have a SAN, that would be one option. But, what most people already have is their database. So, let’s save our sessions in MySql. The obvious downside to using your database for sessions is that the database is slower than using a local filesystem. However, for most sites (even many large ones), the performance difference will be negligible.

Read the rest of this entry…

5
Posted on 21st August 2008 by Sameer

I’ve been evaluating nginx, a lightweight web server, for the last week and I am coming away impressed. Over the last year or so nginx seems to have overtaken lighttpd for the crown of lightweight web servers.

In our case nginx is used to serve static files while apache is used to serve dynamic content (we also use nginx for simple load balancing). A request for http://www.mysite.com/file.extension will first be sent to nginx which will determine whether to serve the file itself (if its static) or if not it will request the url http://localhost:8080/file.extension from apache and pass back the result seamlessly to the end user.

Read the rest of this entry…