Facebook, Innovation

Presto goes open source : Facebook’s solution to handle its 300 petabytes of data


Facebook might be seen as just a social network for by any common user who uses it to add friends and post statuses and use all its features, and yes, it is intended to be so. But what is the real challenge faced by Facebook is to handle the data of more than a billion of users using its service. Facebook has one of the largest data warehouse that has more than 300 petabytes of data. And to handle this no ordinary database management system will stand a chance and it needs a efficient one that could handle something which the IT world calls as BigData.

As the word says, BigData is a big data. A voluminous data that you could hardly imagine to process at a given time. At Facebook the data analysts and Engineers need a better way to run more queries to get faster results to improve productivity.This could be for any purpose like batch processing or graph analytics, machine learning or the real-time interactive analytics. Be it anything, the performance is what matters for such a big company handling such a big data.

Facebook had been using Apache Hive, a bigdata solution from Apache.But having reached petabyte level of data in its warehouse, Facebook needed something more and found nothing that exists met its requirements. This is why the company began developing Presto in the fall of 2012.It was completed by the start of 2013 and was rolled out to action in the company making it the major interactive system for its warehouse. The presto is reported to be 10 times faster than the Hive/MapReduce which are commonly used for processing BigData.

Presto is a distributed SQL query engine optimized for ad-hoc analysis at interactive speed. Presto has a large subset of ANSI SQL queries,supporting complex queries, aggregations, joins, and window functions. The company continuously works on improving the system. At the WebScale conference in June 2013, Facebook saw a great welcome from the external community for this technology and the company released the Presto code and binaries to a select set of companies for their est and feed back.To take this a step further Facbook now has made the Presto public for open source development community. Any one can view the source and contribute to the project.

You can view the official Presto site for documentation or get you hands on the source code at GitHub.

About the Author

Tharun is a bit attracted towards computers and stuff.He loves to blog,share and know more about computers and technologies.He shares what he feels is something good on this site...Stay connected.
Tharun is on: Facebook , Google+ , Twitter