QCon is a practitioner-driven conference designed for technical team leads, architects, and project managers who influence software innovation in their teams.

Presentation: "Design Patterns for Large-Scale Real-Time Learning"

Track: Real Data Science / Time: Wednesday 14:30 - 15:20 / Location: Mountbatten Room

Having collected Big Data, organizations are now keen on data science and “Big Learning”. Much of the focus has been on data science as exploratory analytics: offline, in the lab. However, building from that a production-ready large-scale operational analytics system remains a difficult and ad-hoc endeavor, especially when real-time answers are required. Design patterns for effective implementations are emerging, which take advantage of relaxed assumptions, adopt a new tiered "lambda" architecture, and pick the right scale-friendly algorithms to succeed. Drawing on experience from customer problems and the open source Oryx project at Cloudera, this session will provide examples of operational analytics projects in the field, and present a reference architecture and algorithm design choices for a successful implementation.

Download slides

Sean Owen, Director of Data Science at Cloudera

Sean Owen

Biography: Sean Owen

Sean is Director of Data Science at Cloudera, based in London. Before Cloudera, he founded Myrrix Ltd, a company commercializing large-scale real-time recommender systems on Apache Hadoop. He has been a primary committer and VP for Apache Mahout, and co-author of Mahout in Action. Previously, Sean was a senior engineer at Google. He holds and MBA from the London Business School and a BA in Computer Science from Harvard.