Image 01 Image 02

1
Posted on 4th May 2009 by Sameer

I recently had a conversation with Lalit Sarna of Oxylabs about scalability and he introduced me to Tokyo Cabinet, a key/value store (database). This category of databases, referred to as DBM, differs from a RDBMS (such as MySQL) in that there are no tables and therefore no concept of rows. Instead you soley provide keys and get/set/delete values for that particular key.

Your advantage is lightning speed. And, apparently Tokyo Cabinet is the king of the category. We are talking speeds of 10-50x greater than MySQL. Tokyo Cabinet supports multiple underlying database engines with each providing its own advantages.

  • Hash – Hash tables provide O(1) insert and lookup which can not be beaten so they are your fastest option
  • B+ Tree – The underlying data is sorted allowing for prefix and range matching. Speed is not quite as great as Hash since B-trees are O(logb n) for insert and lookup. See here for more details
  • Fixed Length – Your values are stored in one large array which is as fast as it gets since its O(1) and the data is concurrent. However your keys have to be natural numbers.
  • Table – Attempts to replicate a traditional table database, however no fixed data schema or data types are required. Built on top of the hash db for speed.

Tokyo Tyrant

Tokyo Tyrant is the network interface that sits on top of Tokyo Cabinet allowing your software to communicate with Tokyo Cabinet. Tokyo Cabinet is often referred to as the “database library” while Tokyo Tyrant is the “database server”. Tokyo Tyrant supports the memcached and http protocols.

I love that it supports memcached. That means you can plug and play with Tokyo Cabinet using the many existing memcached clients/libraries. Tokyo Cabinet is not meant to be a replacement for memcached, but you could theoretically use it as such and with minimal setup time. You would get the benefits of persistent data and much cheaper storage with some loss in performance.

In conclusion, targeted and appropriate use of Tokyo Cabinet would allow load to be removed from your traditional RDBMS in cases where the full functionality of a RDBMS is not needed, resulting in major performance improvements.

6
Posted on 18th December 2008 by Sameer

Facebook Scribe is a “server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network of any specific machine.” At first glance it looks pretty cool and that it has the potential to fill many needs.

But, yesterday I was trying to research whether Scribe is appropriate for a task that I had in mind. Unfortunately, it seems like documentation and tutorials are very limited when it comes to Scribe. And those that exist are hard to find. Of course I could download Scribe and work with it hands on to determine its suitability. But usually its best to gather information about the product before under taking the time consuming task of installing, configuring, and testing it. 

So for others out there who are also struggling to find information about Scribe, here are a few resources to turn to. Let me know if you find others and I will add it to this list.

Also keep in mind that the Scribe download package itself has a couple of example configurations that you can reference.

But, as you can see there just is not much written about Scribe on the web (except news posts announcing its open source launch). I don’t think that reflects on the quality of the product, although it might reflect on the usefulness of the product for most websites (which don’t have millions of users). But I’m sure if the Facebook developers would enhance the documentation and provide a few example of end-to-end use cases, it would spur more developers to try the product. More developers trying (and writing) about the product would spur on even more developers to try it. And so on.

0
Posted on 12th November 2008 by Sameer

As I’ve posted about before, Elgg is a very nice open source social networking platform which unfortunately has scaling problems due to it’s one size fits all architecture. I’d like to point out another area where Elgg is making a suboptimal performance decision.

As in most social networks, users (and groups) can upload a profile photo. In Elgg, this photo (as well as other user uploaded data) is stored within a data directory that is not web accessible. Instead, a call is directly made to Elgg’s page handlers which load all of Elgg’s libraries and then find the image in the data directory and finally output it. Clearly, there is major overhead when even images, which one would think are static content, are actually routed through the php code.

Besides the obvious downsides, there are some hidden implications of not having standalone web accessible images. You will not be able to use a lightweight web server, such as nginx, in front of Apache to speed up serving of static content and take load off Apache. Plus, the Elgg code assumes that the image is available over local disk, which will preclude you from storing your data directory on a seperate server unless you use some sort of shared disk (like a SAN).

On the bright side, some of these problems can be corrected and theres a good chance someone will have written a plugin to do so by the time you are reading this. Currently, the profile photo is an instance of an ElggFile which is stored on an ElggFilestore. As of now, the only file store available is the ElggDiskFilestore. However, implementing an ElggFTPFilestore would allow your web server and data server to be seperate. You would still have two performance issues: a) There is still only one ftp location where your images would be stored. You will not be able to load balance your images over several servers. b) Requests for images would still have to go through the Elgg php code.

To solve the second problem, you would need to overwrite the profile photo plugin (called icon) to instead link to the user’s image with a normal image src tag. The user’s image would of course have to be made web accessible. Setting that up would involve more administrative overhead, but you would have the advantage of being able to use a lightweight web server to serve static content.

If Elgg itself was lightweight, the implications of turning static content into dynamic content would not be as severe. However, each page load of Elgg requires dozens (often hundreds) of database queries, so large installations of Elgg would be best served to make your static content truly static.

8
Posted on 20th October 2008 by Sameer

Elgg is the cream of the crop of open source social network software, a group which includes other products such as Dolphin, PHPizabiLovdbyLess. It’s also significantly superior to low cost white label social network software such as Handshakes, phpFoX, SocialEngine, and so on.

Elgg stands out because
a) It looks beautiful and has a good feature set out of the box
b) Encourages the community to contribute to the project with plugins and themes. It aims to be to social networking what Drupal/Joomla are to CMS systems.

However, if scalability is a top or immediate concern, be warned Elgg may not be suited for you. Read the rest of this entry…

1
Posted on 21st August 2008 by Sameer

Now that I have covered how to load balance multiple web servers and how to keep their content synchronized there is one more major problem to solve: sessions. You need sessions to identify a particular user from request to request (remember HTTP is stateless). Usually session data is stored on the local filesystem. However with multiple load balanced web servers, a user can be thrown from one web server to another meaning that you can not count on saving session data in the local filesystem.

Most load balancers, including nginx (through the ip_hash command), do allow you to make your sessions “sticky” which means that a particular user will be sent to the same web server for the duration of his session. This allows for you to again rely on the local filesystem to save your sessions. However, sticky sessions have a greater likelihood for uneven load distribution. Plus when a particular web server goes down, all of its user’s sessions will be lost.

It would be better if sessions could be stored in a location that all the web servers could access. If you have a SAN, that would be one option. But, what most people already have is their database. So, let’s save our sessions in MySql. The obvious downside to using your database for sessions is that the database is slower than using a local filesystem. However, for most sites (even many large ones), the performance difference will be negligible.

Read the rest of this entry…

20
Posted on 21st August 2008 by Sameer

In a previous post we saw how simple it is to set up nginx in front of apache, and in this post I’ll show you it’s just as easy to use nginx as a load balancer.

Load balancing can be left to either hardware or software. For most of us, the expensive hardware is out of the question, but cheap (free) software will solve our needs just fine. Here’s a look at how nginx does load balancing

upstream  mysite  {
   server   www1.mysite.com;
   server   www2.mysite.com;
}

server {
   server_name www.mysite.com;
   location / {
      proxy_pass  http://mysite;
   }
}

The above configuration will send 50% of requests for www.mysite.com to www1.mysite.com and the other 50% to www2.mysite.com. However, if you add a “weight” tag onto the end of the “server” definition you can modify the percentages. Other useful options include max_fails and fail_timeout. For sticky sessions use ip_hash. Refer to the full documentation for further details.

Now that you know how to load balance, you will need to learn how to sync your files between multiple web servers.

6
Posted on 21st August 2008 by Sameer

I’ve been evaluating nginx, a lightweight web server, for the last week and I am coming away impressed. Over the last year or so nginx seems to have overtaken lighttpd for the crown of lightweight web servers.

In our case nginx is used to serve static files while apache is used to serve dynamic content (we also use nginx for simple load balancing). A request for http://www.mysite.com/file.extension will first be sent to nginx which will determine whether to serve the file itself (if its static) or if not it will request the url http://localhost:8080/file.extension from apache and pass back the result seamlessly to the end user.

Read the rest of this entry…

0
Posted on 21st August 2008 by Sameer

Sites that accept user uploads (photos, documents, music etc) will need to need to determine an appropriate directory structure to house the large number of files they will collect. At first glance you may decide to just prefix all filenames with a userid and stick them all into one directory. Maybe even broken up into something like:

/uploads
   /photos
   /music
   /documents

If user 75474 uploads a photo, it will be named 75474_randomstring.jpg and put in directory “/uploads/photos”. However, over time the photos (and music and documents) directory will become huge. File systems of practically all kinds do poorly with large directories. Things run slower, become more error prone, and batch operations become difficult. You do not want huge directories

Read the rest of this entry…

1
Posted on 18th August 2008 by Sameer

For those of us who took an intro database class, the topic of database normalization was covered ad homonym as an absolute necessity in any database design. So, it takes a little bit of effort to pull yourself away from such a time tested concept. But, sometimes its absolutely necessary. (For those who are not familiar with the term, database normalization basically means “don’t repeat any information ever”).

Let’s say you have a social network site with a huge table of private messages between users with each row representing an individual message between two users. Each time a user logs in you want to query the number of unread messages to the user with a query like:

select count(*) from privatemessages where unread = 1 and userid = $userid

However, you find that because your private messages table is 6gb’s large and that some users have thousands of messages that the query is taking an unacceptably long time to run. So what do you do?

You could use a cache such as Memcached to store the number of unread messages for each user in memory, which would yield a significant improvement. However, Memcached still needs to occasionally run the query. To speed up the query, you break database normalization. Why not create another table called privatemessagescount which could store the number of unread messages per user (with columns “userid” and “numunread”). Each time the user receives a new message or reads a new message there will be an additional write to the DB, but it will be a simple write. And more importantly, your reads will be much faster because you will be looking up on a primary key (“userid”). Now the query simply becomes

select numunread from privatemessagescount where userid = $userid

So for performance sakes you’ve broken database normalization and added additional overhead to your application. But, I’d rather have a fast site than a normalized site that is too slow to be used.

lig tv

ligtv

maç izle

canlı maç

canlı izle

футбол онлайн

трансляция футбол

смотреть онлайн футбол

смотреть футбол

soccer live

soccer tv

live soccer streaming

stream soccer

online football

watch football

football match

football streaming

live streaming

watch football

live football

football tv

futbol vivo

partido en vivo

juegos futbol

futbol online

futbol gratis

roja directa

jogos de futebol

jogo de futebol

futebol online

assistir tv

atdhe

foot en direct

jeux de foot

jeux football

calcio diretta

calcio streaming

giochi calcio

live ποδόσφαιρο

podosfairo live

αγωνεσ ποδοσφαιρου

video sepak bola

game bola sepak

Siaran Langsung Sepakbola

bola siaran langsung

futbol juegos

partidos en vivo

bóng đá online

xem bóng đá

bóng đá trực tuyến

bóng đá trực tiếp

ฟุตบอล online

ฟุตบอลสด

บอล online

ดู ฟุตบอล สด

ถ่ายทอด สด

online futball

live fussball

fussball live stream

live stream fußball

bundesliga live stream

fußball live

bundesliga fußball

piłka nożna na żywo

piłka nożna online

fotbal live

fotbal online

fotbal živě

fotbal zive

fotbollskanalen

fotball live

fotball på tv

футбол онлайн

футбол трансляція

футбол канал

live voetbal

live voetbal

voetbal kijken

voetbal online

фудбал уживо

fudbal uzivo

live fudbal

futbal online

live futbal

live footy

مباريات كرة القدم

مباشر كرة القدم

بث حي مباشر

مباشر مباريات

כדורגל שידור ישיר

שידור חי כדורגל

بث كرة القدم

football forum

ライブサッカー

足球实况

足球

축구생중계