Ashish Thusoo, TweetCo-creator of the Apache Hive project
Biography: Ashish Thusoo
Till very recently Ashish Thusoo was running the teams responsible for building and operating the analytics platform at Facebook. These included systems and services for log collection, batch processing, adhoc querying and real time reports. During this period data infrastructure in Facebook grew to handle close to 20PB of compressed data and became a core service which was heavily relied upon by various parts of the company – ranging from engineering to business analysts.
He is also a co-creator of the Apache Hive project which has become a very popular open source data warehousing framework for large scale data analysis on data stored in Hadoop. In the early days of Hive, he served as the lead for that project at Apache. Ashish has deep expertise in data processing and parallel processing technologies, infrastructure and applications built on those infrastructures. In the past he has worked at Oracle in areas of Parallel Query Execution as well as XML Databases. At Oracle he built many core data warehousing and query processing features and was recognized as one of the leaders in the Parallel Execution team.
Presentation: TweetBig Data Architectures at Facebook
In this Ashish Thusoo will present use cases that motivate the collection of large datasets, discussing the infrastructure challenges that these create and what type of solutions and technologies are enabling organizations to surmount these challenges. These facets of big data will be highlighted through a case study around how these technologies enable Facebook to handle data sets at a multi petabyte scale. The talk will conclude with some current challenges that these technologies face and what future evolution paths that they may take.