Image 01 Image 02

5
Posted on 18th December 2008 by Sameer

Facebook Scribe is a “server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network of any specific machine.” At first glance it looks pretty cool and that it has the potential to fill many needs.

But, yesterday I was trying to research whether Scribe is appropriate for a task that I had in mind. Unfortunately, it seems like documentation and tutorials are very limited when it comes to Scribe. And those that exist are hard to find. Of course I could download Scribe and work with it hands on to determine its suitability. But usually its best to gather information about the product before under taking the time consuming task of installing, configuring, and testing it. 

So for others out there who are also struggling to find information about Scribe, here are a few resources to turn to. Let me know if you find others and I will add it to this list.

Also keep in mind that the Scribe download package itself has a couple of example configurations that you can reference.

But, as you can see there just is not much written about Scribe on the web (except news posts announcing its open source launch). I don’t think that reflects on the quality of the product, although it might reflect on the usefulness of the product for most websites (which don’t have millions of users). But I’m sure if the Facebook developers would enhance the documentation and provide a few example of end-to-end use cases, it would spur more developers to try the product. More developers trying (and writing) about the product would spur on even more developers to try it. And so on.

0
Posted on 2nd December 2008 by Sameer

Usually Google Analytics or similar tools cover so many metrics that creating your own web analytics tool is redundant. However, for certain custom metrics you might want to collect your own statistics. For example, on RateDesi Hungama I want to know how many videos are played each day.

At first, I put my counter into my web app source code (in php) within the same function that fetched and displayed the video. However, the number of videos played seemed outrageously high compared to the number of pageviews as reported by Google Analytics. Looking into my server logs, I realized over 50% of the played videos at the time were due to bots or spiders and that Google Analytics must have been excluding those visitors. So what was Google Analytics doing that I wasn’t?

Most bots collect the source code for the html of the page visited. The bot also may or may not visit hyperlinks on the page. However, they will not execute the javascript or iFrames that are included in the page. So, Google Analytics was only counting a visitor when its javascript was executed which more accurately reflected the number of visitors. But my script was incrementing the videos played counter each time the source code was downloaded.

So, to exclude such visits, you could use javascript (like Google Analytics does) or iFrames or even an image tag. Regardless of the method you choose, you will be calling back to a server side script that will increment your counter. Your statistics will now be much more accurate.