Image 01 Image 02

Latest Post
0
Posted on 12th October 2009 by Sameer

I recently opened a new website ThatsMyDesiLife.com which parodies the “funny” or “different” (depending on your perspective) elements of desi culture via user submitted short stories. This is basically a spin off of FML, similar to what others have done by using the fmyscript software.

However, after purchasing that above script I added an API on top and have now (in conjuction with another developer) created an iPhone application for the website. I’m still in the process of learning the Cocoa framework and Objective C but its an interesting change from years of PHP. It’s nice to have a stateful process that can keep track of asynchronous requests and events unlike HTTP. And, the development tools (xcode and interface builder) are very nice to use. It was a bit of pain getting use to using a Mac though.

Anyway, take a look at screenshots from the app:

   

0
Posted on 21st May 2009 by Sameer

Xdebug is a wonderfully helpful debugger and profiler for PHP. Primarily it serves two main purposes (at least for me). First, it makes error messages so much more helpful because it contains a stack trace that can even be configured to show actual function arguments. Here is an example

Second, it helps profile your code by creating an output file with a full listing of all functions called and execution time for your script. Using a tool like kcachegrind or wincachegrind the output can be visualized in an easy to understand format. I am an avid user of xdebug’s profiling tool as it’s the best way to identify performance problems in your php applications. With all that said:

DON’T RUN XDEBUG ON PRODUCTION.

I’m sure you can imagine the profiler’s output being very large for applications with thousands of function calls. I’ve seen 100MB+ output files. Imagine if php had to create a 100MB file for every request. By turning on xdebug.profiler_enable_trigger, you can at least selectively run the profiler.

Yet, even when the profiler is off, the other primary function of xdebug (detailed stack traces on errors) causes your php instance to track lots of extra information slowing down execution. On a recent real world example where I turned on xdebug temporarily (with profiler off) and then shut it off, xdebug caused average response time to go from 400ms to 700ms.

1
Posted on 18th May 2009 by Sameer

In earlier posts I’ve introduced the benefits of Nginx as a lightweight web server that can be used as a reverse proxy and even to load balance. After using Nginx more extensively, I want to share one lesson that was extremely important for me to get a better understanding of how to configure my reverse proxies to work as intended without wasting hours debugging or needlessly over-configuring.

Let’s take a look at a very basic reverse proxy configuration:

1
2
3
4
5
6
7
8
9
10
11
12
13
#nginx configuration
server {
   listen       80;
   server_name www.mysite.com;
 
   # pass along header with reverse proxy requests
   proxy_set_header Host $host;
 
   # pass along all requests to apache waiting at localhost:8080
   location / {
      proxy_pass http://localhost:8080;
   }
}

Apache’s configuration then would look like

1
2
3
4
5
6
#apache httpd configuration
Listen localhost:8080
<VirtualHost localhost:8080>
        DocumentRoot /path/to/webroot
        ServerName www.mysite.com
</VirtualHost>

I want to point along two important lines. First “proxy_set_header Host $host” in the Nginx configuration and “ServerName www.mysite.com” in the Apache configuration. The first line instructs Nginx to pass along the original host header to Apache in essence tricking Apache into believing the request came directly from the end user.

And because we want Apache to believe it received the request directly, we repeat the server name that was defined in the Nginx configuration. This might seem obvious to some of you, but at first I was tripped up because I did not realize I could pass along the host header request and match it to Apache’s intended Server Name.

Without doing so, it was very difficult to use the same Apache instance as the backend for multiple reverse proxied websites. That was because Apache had no way to differentiate a request for website A from a request for website B. However, once you pass along the requested host, Apache now has a piece of information that it can use to differentiate requests by matching the host to its defined server name.

0
Posted on 17th May 2009 by Sameer

Web Developers use Firefox as their browser of choice for many reasons but maybe most significant are the excellent extensions available to make development quicker, easier, and more effective. The two extensions that pretty much every developer is already aware of are Firebug and the aptly named Web Developer extension. However, an extension I use almost as often as those two is Tamper Data.

In its most basic form, TamperDdata allows you to view the headers for every request and response your browser handles. With that, you are able to examine the POST requests that your browser sends to a server.

But, the extension being called Tamper Data, it lets you do more than just examine the data being passed. It allows you to trap a request and alter the headers and POST data. Why might that be useful? Heres two of many possible use cases.

Use Cases

  • Form Tampering - Imagine you have a nifty registration form and you had all this fancy javascript that prevents and notifies a user if he enters invalid information. However, you still need to make sure your backend validates the data without relying on your javascript. One way to do so would be to use Tamper Data.

    In your browser, begin by completing the form correctly, but before you hit submit open Tamper Data and press “Start Tamper”. Then return to your browser and submit the form. Tamper Data will then popup asking you if you would like to tamper with the request that is being sent. Select tamper and then modify the post values to be invalid, and then hit okay. Tamper Data will submit the modified version of the form with the invalid data. You can then return to your browser window and verify your backend handled the submitted data as intended.

  • Investigating Session Problems - Sessions are identified via cookies. A server provides a cookie to a user upon its initial response. The user provides that cookie back to the site on each successive request allowing the site to identify future requests made by that same user. This concept allows a developer to keep a user “logged in” between requests.

    Several times I’ve had issues where sessions did not seem to persist. The best first step in identifying the issue is to determine if cookies are being handled properly. Is the server sending a cookie with the proper domain and settings to the user? Is the user sending that cookie in subsequent requests? That’s where Tamper Data comes in. Use it to verify the cookie data being sent in the headers.

0
Posted on 16th May 2009 by Sameer

By far the most important change in PHP 5 was its much improved support for object oriented programming. It’s hard to imagine, but it was not so long ago when PHP did not offer support for basic object oriented necessities such as constructors, destructors, interfaces, abstract classes, public, private, magic methods, and so on.

Yet one of the most frustrating problems with PHP’s implementation of object oriented principles is the “self” keyword which is intended to reference methods and variables within a class in a static context. Read the rest of this entry…

0
Posted on 15th May 2009 by Sameer

For a long time I worked under the assumption that PHP is not capable of running asynchronous tasks because it is single-threaded. However, you can work around the limitations of a single thread by forking a process with pcntl_fork. The new child process receives an exact copy of the process up until that point and continues execution at the same point in the script where it was forked… with the exception that it is aware that it is a child process.

No longer will you need to create logically linear code in PHP. For example, if you wanted to write a script to collect analytical data from your database and create 10 different reports, a typical PHP script would create one report a time. Imagine if the initial reports were time consuming to generate. But, by chance, some of the later reports only take a few seconds to generate. Yet, you would have to wait for the time consuming reports to finish generating before being able to view the smaller reports (solely because the smaller reports were randomly put at the end of the queue).

However, using pcntl_fork, you could fork your php code and begin generating all 10 reports asynchronously. Within a few seconds, you would have access to the easily generated reports while the more time consuming reports would still be generating.

The basics of how pcntl_fork works is illustrated with the following code taken from the PHP manual:

$pid = pcntl_fork();
if ($pid == -1) {
     die('could not fork');
} else if ($pid) {
     // we are the parent
     pcntl_wait($status); //Protect against Zombie children
} else {
     // we are the child
}

Take a look at this great introduction to pcntl_fork by Frans-Jan van Steenbeek who explains the concept better than I could.

7
Posted on 14th May 2009 by Sameer

In previous posts I’ve introduced Tokyo Cabinet and shown how to install both Tokyo Cabinet and Tokyo Tyrant. In this post, I will show how simple it is to interact with Tokyo Cabinet using PHP.

To summarize, Tokyo Cabinet is an exceptionally fast key/value store. Tokyo Tyrant sits on top of Tokyo Cabinet exposing access to the underlying data via memcache and http networking proctocols.

First begin by creating your database (we will use a hash) and then starting the Tyrant server on 127.0.0.1:80351.

tchmgr create db.tch
ttserver -dmn -host 127.0.0.1 -port 80351 db.tch

And because, Tokyo Tyrant implements the memcache protocol, we can use PHP’s existing support for memcache to interact with our database:


<?php

// connect to tokyo tyrant via memcache protocol
$memcache_obj = new Memcache;
$memcache_obj->connect('127.0.0.1', 80351);

// set value
$memcache_obj->set('key','Sameer Parwani');

// get
$get = $memcache_obj->get('key');

// it works!
echo "My name is $get"

The above outputs

My name is Sameer Parwani

Keep in mind that Tokyo Tyrant will not support memcache’s automatic expiration option.

0
Posted on 14th May 2009 by Sameer

As mentioned in a recent post, Tokyo Cabinet is a highly performant key/value store. Its speed leaves MySQL and other RDBMS’s in the dust because it replaces their overhead with highly optimal data structures such as hash tables. Please check the link above for more detail. If you are looking to get more performance out of your system, Tokyo Cabinet is worth trying out.

The process for installing Tokyo Cabinet is very simple. Here is what I did on CentOS 5, starting with a few required libraries:

# Tokyo cabinet requires gzip and bzip
yum install gzip bzip2 bzip2-devel

I then proceeded to download and install the underlying Tokyo Cabinet

wget http://voxel.dl.sourceforge.net/sourceforge/tokyocabinet/tokyocabinet-1.4.20.tar.gz
tar zxf tokyocabinet-1.4.20.tar.gz
cd tokyocabinet-1.4.20
./configure
make
make install

On top of Tokyo Cabinet lies Tokyo Tyrant:

wget http://voxel.dl.sourceforge.net/sourceforge/tokyocabinet/tokyotyrant-1.1.26.tar.gz
tar zxf tokyotyrant-1.1.26.tar.gz
cd tokyotyrant-1.1.26
./configure
make
make install

Tokyo Cabinet supports four types of databases: hash, B+ tree, fixed-length, and table. Each type uses different commands, for example the commands that allows you to create, update, and read from a database are tchmgr, tcbmgr, tcfmgr, and tctmgr respectively.

The following shows how to create a hash database and manipulate it from the command line:

[root]# tchmgr create db.tch
[root]# tchmgr put db.tch key1 value1
[root]# tchmgr put db.tch key2 value2
[root]# tchmgr list db.tch
key1
key2
[root]# tchmgr get db2.tch key1
value1

Keep in mind Tokyo Cabinet’s database file extensions must match the type of the database.

  • .tch - Hash
  • .tcb - B+ tree
  • .tcf - Fixed-length
  • .tct - Table

In my next post, I will show how to connect to your Tokyo Cabinet database using PHP through Tokyo Tyrant.

1
Posted on 14th May 2009 by Sameer

Managing files over ftp using PHP can be a pain. One of the problems is dealing with long paths that may or may not exist. Fortunately, you can create functions to ease the problems.

For example, using php’s built-in ftp_mkdir function, to create the directory “/hello/kitty” you must first create the directory “/hello” and then “/hello/kitty”. Wouldn’t it be easier to have a helper function that could be passed the full path of the final directory that you want created (”/hello/kitty”) and it would take care of creating each directory in the path by itself?

Here is the code to do it:

	// recursive make directory function for ftp
	function make_directory($ftp_stream, $dir)
	{
		// if directory already exists or can be immediately created return true
		if (ftp_is_dir($ftp_stream, $dir) || @ftp_mkdir($ftp_stream, $dir)) return true;
		// otherwise recursively try to make the directory
		if (!make_directory($ftp_stream, dirname($dir))) return false;
		// final step to create the directory
		return ftp_mkdir($ftp_stream, $dir);
	}	

	function ftp_is_dir($ftp_stream, $dir)
	{
	   // get current directory
	   $original_directory = ftp_pwd($ftp_stream);
	   // test if you can change directory to $dir
	   // suppress errors in case $dir is not a file or not a directory
	   if ( @ftp_chdir( $ftp_stream, $dir ) ) {
		   // If it is a directory, then change the directory back to the original directory
		   ftp_chdir( $ftp_stream, $original_directory );
		   return true;
	   } else {
		   return false;
	   }
	}
1
Posted on 4th May 2009 by Sameer

I recently had a conversation with Lalit Sarna of Oxylabs about scalability and he introduced me to Tokyo Cabinet, a key/value store (database). This category of databases, referred to as DBM, differs from a RDBMS (such as MySQL) in that there are no tables and therefore no concept of rows. Instead you soley provide keys and get/set/delete values for that particular key.

Your advantage is lightning speed. And, apparently Tokyo Cabinet is the king of the category. We are talking speeds of 10-50x greater than MySQL. Tokyo Cabinet supports multiple underlying database engines with each providing its own advantages.

  • Hash - Hash tables provide O(1) insert and lookup which can not be beaten so they are your fastest option
  • B+ Tree - The underlying data is sorted allowing for prefix and range matching. Speed is not quite as great as Hash since B-trees are O(logb n) for insert and lookup. See here for more details
  • Fixed Length - Your values are stored in one large array which is as fast as it gets since its O(1) and the data is concurrent. However your keys have to be natural numbers.
  • Table - Attempts to replicate a traditional table database, however no fixed data schema or data types are required. Built on top of the hash db for speed.

Tokyo Tyrant

Tokyo Tyrant is the network interface that sits on top of Tokyo Cabinet allowing your software to communicate with Tokyo Cabinet. Tokyo Cabinet is often referred to as the “database library” while Tokyo Tyrant is the “database server”. Tokyo Tyrant supports the memcached and http protocols.

I love that it supports memcached. That means you can plug and play with Tokyo Cabinet using the many existing memcached clients/libraries. Tokyo Cabinet is not meant to be a replacement for memcached, but you could theoretically use it as such and with minimal setup time. You would get the benefits of persistent data and much cheaper storage with some loss in performance.

In conclusion, targeted and appropriate use of Tokyo Cabinet would allow load to be removed from your traditional RDBMS in cases where the full functionality of a RDBMS is not needed, resulting in major performance improvements.