Category Archives: PHP

Xdebug is slow

Xdebug is a wonderfully helpful debugger and profiler for PHP. Primarily it serves two main purposes (at least for me). First, it makes error messages so much more helpful because it contains a stack trace that can even be configured to show actual function arguments. Here is an example

Second, it helps profile your code by creating an output file with a full listing of all functions called and execution time for your script. Using a tool like kcachegrind or wincachegrind the output can be visualized in an easy to understand format. I am an avid user of xdebug’s profiling tool as it’s the best way to identify performance problems in your php applications. With all that said:

DON’T RUN XDEBUG ON PRODUCTION.

I’m sure you can imagine the profiler’s output being very large for applications with thousands of function calls. I’ve seen 100MB+ output files. Imagine if php had to create a 100MB file for every request. By turning on xdebug.profiler_enable_trigger, you can at least selectively run the profiler.

Yet, even when the profiler is off, the other primary function of xdebug (detailed stack traces on errors) causes your php instance to track lots of extra information slowing down execution. On a recent real world example where I turned on xdebug temporarily (with profiler off) and then shut it off, xdebug caused average response time to go from 400ms to 700ms.

Late Static Binding Coming in PHP 5.3

By far the most important change in PHP 5 was its much improved support for object oriented programming. It’s hard to imagine, but it was not so long ago when PHP did not offer support for basic object oriented necessities such as constructors, destructors, interfaces, abstract classes, public, private, magic methods, and so on.

Yet one of the most frustrating problems with PHP’s implementation of object oriented principles is the “self” keyword which is intended to reference methods and variables within a class in a static context. Continue reading

Forking PHP Processes with pcntl_fork

For a long time I worked under the assumption that PHP is not capable of running asynchronous tasks because it is single-threaded. However, you can work around the limitations of a single thread by forking a process with pcntl_fork. The new child process receives an exact copy of the process up until that point and continues execution at the same point in the script where it was forked… with the exception that it is aware that it is a child process.

No longer will you need to create logically linear code in PHP. For example, if you wanted to write a script to collect analytical data from your database and create 10 different reports, a typical PHP script would create one report a time. Imagine if the initial reports were time consuming to generate. But, by chance, some of the later reports only take a few seconds to generate. Yet, you would have to wait for the time consuming reports to finish generating before being able to view the smaller reports (solely because the smaller reports were randomly put at the end of the queue).

However, using pcntl_fork, you could fork your php code and begin generating all 10 reports asynchronously. Within a few seconds, you would have access to the easily generated reports while the more time consuming reports would still be generating.

The basics of how pcntl_fork works is illustrated with the following code taken from the PHP manual:

[code lang="php"]
$pid = pcntl_fork();
if ($pid == -1) {
die('could not fork');
} else if ($pid) {
// we are the parent
pcntl_wait($status); //Protect against Zombie children
} else {
// we are the child
}
[/code]

Take a look at this great introduction to pcntl_fork by Frans-Jan van Steenbeek who explains the concept better than I could.

Tokyo Tyrant with PHP

In previous posts I’ve introduced Tokyo Cabinet and shown how to install both Tokyo Cabinet and Tokyo Tyrant. In this post, I will show how simple it is to interact with Tokyo Cabinet using PHP.

To summarize, Tokyo Cabinet is an exceptionally fast key/value store. Tokyo Tyrant sits on top of Tokyo Cabinet exposing access to the underlying data via memcache and http networking proctocols.

First begin by creating your database (we will use a hash) and then starting the Tyrant server on 127.0.0.1:80351.

[code]
tchmgr create db.tch
ttserver -dmn -host 127.0.0.1 -port 80351 db.tch
[/code]

And because, Tokyo Tyrant implements the memcache protocol, we can use PHP’s existing support for memcache to interact with our database:

[code lang="php"]

// connect to tokyo tyrant via memcache protocol
$memcache_obj = new Memcache;
$memcache_obj->connect('127.0.0.1', 80351);

// set value
$memcache_obj->set('key','Sameer Parwani');

// get
$get = $memcache_obj->get('key');

// it works!
echo "My name is $get"
[/code]

The above outputs

[code]
My name is Sameer Parwani
[/code]

Keep in mind that Tokyo Tyrant will not support memcache’s automatic expiration option.

Recursive FTP Make Directory (mkdir)

Managing files over ftp using PHP can be a pain. One of the problems is dealing with long paths that may or may not exist. Fortunately, you can create functions to ease the problems.

For example, using php’s built-in ftp_mkdir function, to create the directory “/hello/kitty” you must first create the directory “/hello” and then “/hello/kitty”. Wouldn’t it be easier to have a helper function that could be passed the full path of the final directory that you want created (“/hello/kitty”) and it would take care of creating each directory in the path by itself?

Here is the code to do it:

[code lang="php"]
// recursive make directory function for ftp
function make_directory($ftp_stream, $dir)
{
// if directory already exists or can be immediately created return true
if (ftp_is_dir($ftp_stream, $dir) || @ftp_mkdir($ftp_stream, $dir)) return true;
// otherwise recursively try to make the directory
if (!make_directory($ftp_stream, dirname($dir))) return false;
// final step to create the directory
return ftp_mkdir($ftp_stream, $dir);
}

function ftp_is_dir($ftp_stream, $dir)
{
// get current directory
$original_directory = ftp_pwd($ftp_stream);
// test if you can change directory to $dir
// suppress errors in case $dir is not a file or not a directory
if ( @ftp_chdir( $ftp_stream, $dir ) ) {
// If it is a directory, then change the directory back to the original directory
ftp_chdir( $ftp_stream, $original_directory );
return true;
} else {
return false;
}
}
[/code]

Creating Builds With Phing

Phing is a great little tool built on PHP for creating project builds. It is based on Apache Ant. For those who are developing in PHP, Phing is a natural choice as both project and build tool can share the same environment.

Reasons to use Phing

  • To automate creation of daily builds of your project. The usefulness of daily builds is an essay in itself but essentially it boils down to the ability to constantly see the affects that different contributors are making to a product before it may be too cumbersome to turn back (easier integration). See here and here for more detail.
  • Easier deployment. If your project requires several steps to create a build, the full process can be automated in the build.xml file. Pre written commands exist for the most common tasks such as svn checkout/update/etc, file system changes (rm, cp, mv), tarring/untarring, and so on. And if a task does not exist, its simple to extend Phing with your own.
  • Database Version Control – One of the largest challenges groups of programmers face is maintaining changes to database schema. Phing would allow you to create a task or set of tasks to download schema changes from either subversion or a database and apply those changes automatically to the developer’s database. (You could of course customize this behavior to suite your needs – for example some people would prefer for phing to create a .sql file that is manually applied)
  • Simplicity – Phing really just boils down to two components. You have a set of variables (aka properties). And then you have a list of instructions (the build.xml file). The properties are used to help phing complete the list of instructions.

Example

To get an idea for how simple the xml for Phing is take a look at the following example:

[code lang='xml']





[/code]

The above takes all the files within the build directory and compresses them into a build.tar.gz file. For more examples like the above check out the User’s Guide.

Rails for PHP Developers

Over the last few years Ruby on Rails has been the “hip” thing in the web development world. For various reasons, I haven’t taken more than a cursory glance at the framework or language.  Primarily, it’s because I’m very proficient in PHP and I’ve had the opportunity to use the language of my choice which ended up being PHP. But, it’s always good to keep up with trends and not limit oneself to a particular language. Increasingly people want Ruby experience. I would recommend a Web Developer have expert proficiency in one web scripting language (PHP, Ruby, Python, Perl, etc) and intermediate proficiency in at least another.

I was at the bookshop today looking at Ruby on Rails books and one particular book struck my eye: Rails for PHP Developers. The book example by example shows how to achieve particular goals with PHP code and then with Ruby code. For those, like me, who want to quickly understand the differences in the two languages, the book seems like it will be very useful.

Collect Web Analytics Via Javascript or iFrame

Usually Google Analytics or similar tools cover so many metrics that creating your own web analytics tool is redundant. However, for certain custom metrics you might want to collect your own statistics. For example, on RateDesi Hungama I want to know how many videos are played each day.

At first, I put my counter into my web app source code (in php) within the same function that fetched and displayed the video. However, the number of videos played seemed outrageously high compared to the number of pageviews as reported by Google Analytics. Looking into my server logs, I realized over 50% of the played videos at the time were due to bots or spiders and that Google Analytics must have been excluding those visitors. So what was Google Analytics doing that I wasn’t?

Most bots collect the source code for the html of the page visited. The bot also may or may not visit hyperlinks on the page. However, they will not execute the javascript or iFrames that are included in the page. So, Google Analytics was only counting a visitor when its javascript was executed which more accurately reflected the number of visitors. But my script was incrementing the videos played counter each time the source code was downloaded.

So, to exclude such visits, you could use javascript (like Google Analytics does) or iFrames or even an image tag. Regardless of the method you choose, you will be calling back to a server side script that will increment your counter. Your statistics will now be much more accurate.

Problems with ElggFile and ElggDiskFilestore

As I’ve posted about before, Elgg is a very nice open source social networking platform which unfortunately has scaling problems due to it’s one size fits all architecture. I’d like to point out another area where Elgg is making a suboptimal performance decision.

As in most social networks, users (and groups) can upload a profile photo. In Elgg, this photo (as well as other user uploaded data) is stored within a data directory that is not web accessible. Instead, a call is directly made to Elgg’s page handlers which load all of Elgg’s libraries and then find the image in the data directory and finally output it. Clearly, there is major overhead when even images, which one would think are static content, are actually routed through the php code.

Besides the obvious downsides, there are some hidden implications of not having standalone web accessible images. You will not be able to use a lightweight web server, such as nginx, in front of Apache to speed up serving of static content and take load off Apache. Plus, the Elgg code assumes that the image is available over local disk, which will preclude you from storing your data directory on a seperate server unless you use some sort of shared disk (like a SAN).

On the bright side, some of these problems can be corrected and theres a good chance someone will have written a plugin to do so by the time you are reading this. Currently, the profile photo is an instance of an ElggFile which is stored on an ElggFilestore. As of now, the only file store available is the ElggDiskFilestore. However, implementing an ElggFTPFilestore would allow your web server and data server to be seperate. You would still have two performance issues: a) There is still only one ftp location where your images would be stored. You will not be able to load balance your images over several servers. b) Requests for images would still have to go through the Elgg php code.

To solve the second problem, you would need to overwrite the profile photo plugin (called icon) to instead link to the user’s image with a normal image src tag. The user’s image would of course have to be made web accessible. Setting that up would involve more administrative overhead, but you would have the advantage of being able to use a lightweight web server to serve static content.

If Elgg itself was lightweight, the implications of turning static content into dynamic content would not be as severe. However, each page load of Elgg requires dozens (often hundreds) of database queries, so large installations of Elgg would be best served to make your static content truly static.

The Zend Gdata Client Library

Many of Google’s products use the Google Data protocol to power their API’s. To make interactaction with the API simpler, most programmers will download a client library for their language of choice. Google recommends the Zend Gdata client library for PHP, which overall is a great client library but does have one major downside.

For RateDesi Hungama, I query the YouTube API (which uses Google Data) to retrieve video feeds and video entries. Each time a video page is loaded, the site needs to retrieve the feed from Youtube to display the video information. To prevent needing an API call on every video page, I have two options… either store the video information in my database and update it periodically or cache for a limited time the video information.

I chose caching. The problem with the Zend Gdata client library is that any Gdata feed retrieved is gigantic. Each video entry object is a good 300kb because a ton of metadata is kept within the object. If you allocated 500MB to your cache you would not be able to store even 1700 videos. In this case, you would probably want to look towards using a file based cache.

However, I usually prefer using memcached which is an in-memory distributed cache. Thankfully, memcached does offer the option to automatically compress data. In PHP, when using memcache_set set the flag to MEMCACHE_COMPRESSED and it will automatically serialize and compress the Gdata object. So, instead of a 300kb cache entry, you will be left with a 17kb or smaller cache entry.

Lessons Learnt: Either roll your own Gdata client library for PHP, or use a file based cache with the Zend Gdata client library, or make sure to compress your cache entries.