Facebook Scribe Server Documentation And Tutorials
Posted on 18th December 2008 by Sameer
Facebook Scribe is a “server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network of any specific machine.” At first glance it looks pretty cool and that it has the potential to fill many needs.
But, yesterday I was trying to research whether Scribe is appropriate for a task that I had in mind. Unfortunately, it seems like documentation and tutorials are very limited when it comes to Scribe. And those that exist are hard to find. Of course I could download Scribe and work with it hands on to determine its suitability. But usually its best to gather information about the product before under taking the time consuming task of installing, configuring, and testing it.
So for others out there who are also struggling to find information about Scribe, here are a few resources to turn to. Let me know if you find others and I will add it to this list.
- Scribe SourceForge Wiki - It’s all of 5 pages right now, but its the best documentation that exists
- Scribe SourceForge Mailing Lists - Activity is sparse but it does seem like the developers reply to the list
- Installing Scribe Tutorial - from Cloudera
- Configuring and Using Scribe for Hadoop Log Collection Tutorial - from Cloudera
- High Scalability Article on Scribe
- Facebook Engineering Blog Post - Explains the major design decisions made while building Scribe
Also keep in mind that the Scribe download package itself has a couple of example configurations that you can reference.
But, as you can see there just is not much written about Scribe on the web (except news posts announcing its open source launch). I don’t think that reflects on the quality of the product, although it might reflect on the usefulness of the product for most websites (which don’t have millions of users). But I’m sure if the Facebook developers would enhance the documentation and provide a few example of end-to-end use cases, it would spur more developers to try the product. More developers trying (and writing) about the product would spur on even more developers to try it. And so on.
Thanks for these useful links. Documentation is pretty sparse. I was also looking around to see if there are any alternatives to Scribe way of doing it.
Does anyone know if there anything else than Scribe and rsyslog?
“Chukwa is an open source data collection system for monitoring large distributed systems.”
http://hadoop.apache.org/chukwa/
My team is trying to decide between Scribe and Chukwa.
Another helpful link about scribe design would be the facebook techtalk video
http://www.facebook.com/video/video.php?v=650882334523&ref=mf

