Forking PHP Processes with pcntl_fork

For a long time I worked under the assumption that PHP is not capable of running asynchronous tasks because it is single-threaded. However, you can work around the limitations of a single thread by forking a process with pcntl_fork. The new child process receives an exact copy of the process up until that point and continues execution at the same point in the script where it was forked… with the exception that it is aware that it is a child process.

No longer will you need to create logically linear code in PHP. For example, if you wanted to write a script to collect analytical data from your database and create 10 different reports, a typical PHP script would create one report a time. Imagine if the initial reports were time consuming to generate. But, by chance, some of the later reports only take a few seconds to generate. Yet, you would have to wait for the time consuming reports to finish generating before being able to view the smaller reports (solely because the smaller reports were randomly put at the end of the queue).

However, using pcntl_fork, you could fork your php code and begin generating all 10 reports asynchronously. Within a few seconds, you would have access to the easily generated reports while the more time consuming reports would still be generating.

The basics of how pcntl_fork works is illustrated with the following code taken from the PHP manual:

[code lang="php"]
$pid = pcntl_fork();
if ($pid == -1) {
die('could not fork');
} else if ($pid) {
// we are the parent
pcntl_wait($status); //Protect against Zombie children
} else {
// we are the child
}
[/code]

Take a look at this great introduction to pcntl_fork by Frans-Jan van Steenbeek who explains the concept better than I could.

Tokyo Tyrant with PHP

In previous posts I’ve introduced Tokyo Cabinet and shown how to install both Tokyo Cabinet and Tokyo Tyrant. In this post, I will show how simple it is to interact with Tokyo Cabinet using PHP.

To summarize, Tokyo Cabinet is an exceptionally fast key/value store. Tokyo Tyrant sits on top of Tokyo Cabinet exposing access to the underlying data via memcache and http networking proctocols.

First begin by creating your database (we will use a hash) and then starting the Tyrant server on 127.0.0.1:80351.

[code]
tchmgr create db.tch
ttserver -dmn -host 127.0.0.1 -port 80351 db.tch
[/code]

And because, Tokyo Tyrant implements the memcache protocol, we can use PHP’s existing support for memcache to interact with our database:

[code lang="php"]

// connect to tokyo tyrant via memcache protocol
$memcache_obj = new Memcache;
$memcache_obj->connect('127.0.0.1', 80351);

// set value
$memcache_obj->set('key','Sameer Parwani');

// get
$get = $memcache_obj->get('key');

// it works!
echo "My name is $get"
[/code]

The above outputs

[code]
My name is Sameer Parwani
[/code]

Keep in mind that Tokyo Tyrant will not support memcache’s automatic expiration option.

Installing Tokyo Cabinet and Tokyo Tyrant

As mentioned in a recent post, Tokyo Cabinet is a highly performant key/value store. Its speed leaves MySQL and other RDBMS’s in the dust because it replaces their overhead with highly optimal data structures such as hash tables. Please check the link above for more detail. If you are looking to get more performance out of your system, Tokyo Cabinet is worth trying out.

The process for installing Tokyo Cabinet is very simple. Here is what I did on CentOS 5, starting with a few required libraries:

[code]
# Tokyo cabinet requires gzip and bzip
yum install gzip bzip2 bzip2-devel
[/code]

I then proceeded to download and install the underlying Tokyo Cabinet

[code]
wget http://voxel.dl.sourceforge.net/sourceforge/tokyocabinet/tokyocabinet-1.4.20.tar.gz
tar zxf tokyocabinet-1.4.20.tar.gz
cd tokyocabinet-1.4.20
./configure
make
make install
[/code]

On top of Tokyo Cabinet lies Tokyo Tyrant:

[code]
wget http://voxel.dl.sourceforge.net/sourceforge/tokyocabinet/tokyotyrant-1.1.26.tar.gz
tar zxf tokyotyrant-1.1.26.tar.gz
cd tokyotyrant-1.1.26
./configure
make
make install
[/code]

Tokyo Cabinet supports four types of databases: hash, B+ tree, fixed-length, and table. Each type uses different commands, for example the commands that allows you to create, update, and read from a database are tchmgr, tcbmgr, tcfmgr, and tctmgr respectively.

The following shows how to create a hash database and manipulate it from the command line:

[code]
[root]# tchmgr create db.tch
[root]# tchmgr put db.tch key1 value1
[root]# tchmgr put db.tch key2 value2
[root]# tchmgr list db.tch
key1
key2
[root]# tchmgr get db2.tch key1
value1
[/code]

Keep in mind Tokyo Cabinet’s database file extensions must match the type of the database.

  • .tch – Hash
  • .tcb – B+ tree
  • .tcf – Fixed-length
  • .tct – Table

In my next post, I will show how to connect to your Tokyo Cabinet database using PHP through Tokyo Tyrant.

Recursive FTP Make Directory (mkdir)

Managing files over ftp using PHP can be a pain. One of the problems is dealing with long paths that may or may not exist. Fortunately, you can create functions to ease the problems.

For example, using php’s built-in ftp_mkdir function, to create the directory “/hello/kitty” you must first create the directory “/hello” and then “/hello/kitty”. Wouldn’t it be easier to have a helper function that could be passed the full path of the final directory that you want created (“/hello/kitty”) and it would take care of creating each directory in the path by itself?

Here is the code to do it:

[code lang="php"]
// recursive make directory function for ftp
function make_directory($ftp_stream, $dir)
{
// if directory already exists or can be immediately created return true
if (ftp_is_dir($ftp_stream, $dir) || @ftp_mkdir($ftp_stream, $dir)) return true;
// otherwise recursively try to make the directory
if (!make_directory($ftp_stream, dirname($dir))) return false;
// final step to create the directory
return ftp_mkdir($ftp_stream, $dir);
}

function ftp_is_dir($ftp_stream, $dir)
{
// get current directory
$original_directory = ftp_pwd($ftp_stream);
// test if you can change directory to $dir
// suppress errors in case $dir is not a file or not a directory
if ( @ftp_chdir( $ftp_stream, $dir ) ) {
// If it is a directory, then change the directory back to the original directory
ftp_chdir( $ftp_stream, $original_directory );
return true;
} else {
return false;
}
}
[/code]

Tokyo Cabinet and Tokyo Tyrant

I recently had a conversation with Lalit Sarna of Oxylabs about scalability and he introduced me to Tokyo Cabinet, a key/value store (database). This category of databases, referred to as DBM, differs from a RDBMS (such as MySQL) in that there are no tables and therefore no concept of rows. Instead you soley provide keys and get/set/delete values for that particular key.

Your advantage is lightning speed. And, apparently Tokyo Cabinet is the king of the category. We are talking speeds of 10-50x greater than MySQL. Tokyo Cabinet supports multiple underlying database engines with each providing its own advantages.

  • Hash – Hash tables provide O(1) insert and lookup which can not be beaten so they are your fastest option
  • B+ Tree – The underlying data is sorted allowing for prefix and range matching. Speed is not quite as great as Hash since B-trees are O(logb n) for insert and lookup. See here for more details
  • Fixed Length – Your values are stored in one large array which is as fast as it gets since its O(1) and the data is concurrent. However your keys have to be natural numbers.
  • Table – Attempts to replicate a traditional table database, however no fixed data schema or data types are required. Built on top of the hash db for speed.

Tokyo Tyrant

Tokyo Tyrant is the network interface that sits on top of Tokyo Cabinet allowing your software to communicate with Tokyo Cabinet. Tokyo Cabinet is often referred to as the “database library” while Tokyo Tyrant is the “database server”. Tokyo Tyrant supports the memcached and http protocols.

I love that it supports memcached. That means you can plug and play with Tokyo Cabinet using the many existing memcached clients/libraries. Tokyo Cabinet is not meant to be a replacement for memcached, but you could theoretically use it as such and with minimal setup time. You would get the benefits of persistent data and much cheaper storage with some loss in performance.

In conclusion, targeted and appropriate use of Tokyo Cabinet would allow load to be removed from your traditional RDBMS in cases where the full functionality of a RDBMS is not needed, resulting in major performance improvements.

Using Salts for Extra Security

Typically passwords are saved in databases using one way encryption such as md5. In other words if my password is “hello”, the database stores my password as “5d41402abc4b2a76b9719d911017c592″. Each time a user attempts to log in, the md5 algorithm is applied to the provided password and if the result matches the hash stored in the database then access is granted to the user such as in the following

[code lang="php"]
if (md5($_POST['pwd']) == $saved_hash)
// user is logged in
else
// user password was incorrect
[/code]

Saving this encrypted password is more secure than saving plain text passwords because if a database is temporarily compromised, at least the attacker will not have access to user’s passwords. However, despite not being able to unencrypt the password (remember this is one-way encryption), the intruder might still be able to crack many of your user’s passwords through precomputation.

An attacker could go through the dictionary (or any set of possible passwords) precomputing the md5 hashes. So, if the attacker were to see that my hash was “5d41402abc4b2a76b9719d911017c592″, he would just look it up in his reverse database and see that this hash maps to “hello”. There are many such reverse lookup databases on the web. This one successfully cracks the mentioned password.

Adding Salts

The use of salts greatly decreases the effectiveness of a precomputation attack. A salt is a random string appended to the password before encryption. Typically each user would receive a unique salt.

[code lang="php"]
$saved_hash = md5($pwd . $salt);
[/code]

Let’s examine the implications when the salt is public (stored in the compromised database) as opposed to when the salt is private:

  • Public Salt – The attackers reverse lookup table (commonly known as a rainbow table) will no longer be useful. He will need to generate a rainbow table for the specific user’s salt. While this is still very possible, the attacker will need to perform this operation for each user, which will make it a very challenging process to crack a large number of passwords
  • Private Salt – For the attacker to actually compromise a password, he would need to compute the md5 of each possible password appended to each possible salt. If your salts were 32 bits long the attacker would need to compute 800 trillion hashes or so for the English dictionary to be covered. This would be practically impossible.

Therefore, public salts are better than no salts, but private salts are much better than public salts. So, how does one keep their salt private? You can’t store it in your database because all this assumes your database was compromised. My suggestion is to create a salt based on the md5 of immutable data related to that user (and be very careful to not delete/modify that piece of data). For example, the user’s registration timestamp could be used. As long as your attacker was unable to also steal your application code the salt would be safe. This works out as the following:

[code lang="php"]
$salt = md5($registration_timestamp);
$saved_hash = md5($password . $salt);
[/code]

Creating Builds With Phing

Phing is a great little tool built on PHP for creating project builds. It is based on Apache Ant. For those who are developing in PHP, Phing is a natural choice as both project and build tool can share the same environment.

Reasons to use Phing

  • To automate creation of daily builds of your project. The usefulness of daily builds is an essay in itself but essentially it boils down to the ability to constantly see the affects that different contributors are making to a product before it may be too cumbersome to turn back (easier integration). See here and here for more detail.
  • Easier deployment. If your project requires several steps to create a build, the full process can be automated in the build.xml file. Pre written commands exist for the most common tasks such as svn checkout/update/etc, file system changes (rm, cp, mv), tarring/untarring, and so on. And if a task does not exist, its simple to extend Phing with your own.
  • Database Version Control – One of the largest challenges groups of programmers face is maintaining changes to database schema. Phing would allow you to create a task or set of tasks to download schema changes from either subversion or a database and apply those changes automatically to the developer’s database. (You could of course customize this behavior to suite your needs – for example some people would prefer for phing to create a .sql file that is manually applied)
  • Simplicity – Phing really just boils down to two components. You have a set of variables (aka properties). And then you have a list of instructions (the build.xml file). The properties are used to help phing complete the list of instructions.

Example

To get an idea for how simple the xml for Phing is take a look at the following example:

[code lang='xml']





[/code]

The above takes all the files within the build directory and compresses them into a build.tar.gz file. For more examples like the above check out the User’s Guide.

BarCamp Boston 4

This previous weekend I attended my first BarCamp Boston. I must say it was quite good. BarCamp is a series of “unConferences” which are organized on the fly by attendees, and without any formal registration fee. So, of course, the quality of the talks is not quite up to the standard of formal conferences, but you don’t have to fly around the country to attend (usually to Silicon Valley) and you don’t have to pay $1000+ while you still learn a lot.

Some of my favorite sessions from the weekend included:

  • iPhone – Development, Marketing, Best Practices, & App Store Ideas
  • Twitter for Business
  • Web App Design for Developers

BarCamp Boston is only once a year, but there are some other similar quality groups/events you can participate in throughout the year in Boston.

Secure Communication Over An API With Request Signatures

It’s a very common task for a web application to uniquely identify a visitor by a combination of username and password. However, not as trivial is identifying a third party attempting to use an API to access your web service on behalf of end users of their third party service. You often don’t want to force the end user to create a relationship with your service (such as would be required with OpenID) but instead allow the third party to use your API transparently (such as with Amazon). So, the task at hand is how to uniquely identify the third party making use of your API while preventing forgery and without requiring any sort of login system.

The solution starts with first providing each third party service with a unique public key. The public key is used to determine which third party service the request is claiming to be from. As expected, each public key has an associated private key. The private key is used to encrypt the message request into a signature. The API user will then send along that signature with the request. If the signature sent by the third party service matches the expected signature, then its safe to allow the request.

This method works because only you (the owner of the API) and the third party service have access to the private key. The third party encrypts its message using the private key and then sends along the encrypted version WITH the unencrypted version. The API owner then takes the unencrypted message and encrypts it with the private key (which it looked up based on the public key provided in the request). If the encrypted version generated by the API owner and the encrypted version sent in the request match, it can be trusted that the request came from the owner of the public key.

Here is some php code for the third party side of things. Basically the message is the url with an action of “friends.get”. The message is then encrypted and that encrypted signature is then appended to the url along with the public key. A request is then made to that url. The API owner will then process the request by verifying the identity of the requester (as mentioned above) and send back an appropriate response.

[php]
// your assigned public key which will be included in the api request
$public_key = “abcdefghijklmnopqrstuvxyz”;
// your assigned private key which will always be hidden
$private key = “zyxvutsrqponmlkjihgfedcba”;

// url of the api request which is essentially the message
$url = “http://www.apisite.com/api.php?action=friends.get”;

// create a signature based on the api request using the private key
$signature = hash_hmac(“sha512″, $url, $private_key);

// the final api url with the public key and signature appended
$api_url = $url . “&public_key=” . $public_key . “&signature=” . $signature;

// fetch the url
$api_request_data = file_get_contents($api_url);
[/php]

Rails for PHP Developers

Over the last few years Ruby on Rails has been the “hip” thing in the web development world. For various reasons, I haven’t taken more than a cursory glance at the framework or language.  Primarily, it’s because I’m very proficient in PHP and I’ve had the opportunity to use the language of my choice which ended up being PHP. But, it’s always good to keep up with trends and not limit oneself to a particular language. Increasingly people want Ruby experience. I would recommend a Web Developer have expert proficiency in one web scripting language (PHP, Ruby, Python, Perl, etc) and intermediate proficiency in at least another.

I was at the bookshop today looking at Ruby on Rails books and one particular book struck my eye: Rails for PHP Developers. The book example by example shows how to achieve particular goals with PHP code and then with Ruby code. For those, like me, who want to quickly understand the differences in the two languages, the book seems like it will be very useful.