Monthly Archives: November 2008

Problems with ElggFile and ElggDiskFilestore

As I’ve posted about before, Elgg is a very nice open source social networking platform which unfortunately has scaling problems due to it’s one size fits all architecture. I’d like to point out another area where Elgg is making a suboptimal performance decision.

As in most social networks, users (and groups) can upload a profile photo. In Elgg, this photo (as well as other user uploaded data) is stored within a data directory that is not web accessible. Instead, a call is directly made to Elgg’s page handlers which load all of Elgg’s libraries and then find the image in the data directory and finally output it. Clearly, there is major overhead when even images, which one would think are static content, are actually routed through the php code.

Besides the obvious downsides, there are some hidden implications of not having standalone web accessible images. You will not be able to use a lightweight web server, such as nginx, in front of Apache to speed up serving of static content and take load off Apache. Plus, the Elgg code assumes that the image is available over local disk, which will preclude you from storing your data directory on a seperate server unless you use some sort of shared disk (like a SAN).

On the bright side, some of these problems can be corrected and theres a good chance someone will have written a plugin to do so by the time you are reading this. Currently, the profile photo is an instance of an ElggFile which is stored on an ElggFilestore. As of now, the only file store available is the ElggDiskFilestore. However, implementing an ElggFTPFilestore would allow your web server and data server to be seperate. You would still have two performance issues: a) There is still only one ftp location where your images would be stored. You will not be able to load balance your images over several servers. b) Requests for images would still have to go through the Elgg php code.

To solve the second problem, you would need to overwrite the profile photo plugin (called icon) to instead link to the user’s image with a normal image src tag. The user’s image would of course have to be made web accessible. Setting that up would involve more administrative overhead, but you would have the advantage of being able to use a lightweight web server to serve static content.

If Elgg itself was lightweight, the implications of turning static content into dynamic content would not be as severe. However, each page load of Elgg requires dozens (often hundreds) of database queries, so large installations of Elgg would be best served to make your static content truly static.

The Zend Gdata Client Library

Many of Google’s products use the Google Data protocol to power their API’s. To make interactaction with the API simpler, most programmers will download a client library for their language of choice. Google recommends the Zend Gdata client library for PHP, which overall is a great client library but does have one major downside.

For RateDesi Hungama, I query the YouTube API (which uses Google Data) to retrieve video feeds and video entries. Each time a video page is loaded, the site needs to retrieve the feed from Youtube to display the video information. To prevent needing an API call on every video page, I have two options… either store the video information in my database and update it periodically or cache for a limited time the video information.

I chose caching. The problem with the Zend Gdata client library is that any Gdata feed retrieved is gigantic. Each video entry object is a good 300kb because a ton of metadata is kept within the object. If you allocated 500MB to your cache you would not be able to store even 1700 videos. In this case, you would probably want to look towards using a file based cache.

However, I usually prefer using memcached which is an in-memory distributed cache. Thankfully, memcached does offer the option to automatically compress data. In PHP, when using memcache_set set the flag to MEMCACHE_COMPRESSED and it will automatically serialize and compress the Gdata object. So, instead of a 300kb cache entry, you will be left with a 17kb or smaller cache entry.

Lessons Learnt: Either roll your own Gdata client library for PHP, or use a file based cache with the Zend Gdata client library, or make sure to compress your cache entries.