QCon is a practitioner-driven conference designed for technical team leads, architects, and project managers who influence software innovation in their teams.

Presentation: "Approximate methods for scalable data mining"

Track: Big Data NoSQL / Time: Friday 15:40 - 16:30 / Location: Mountbatten Room

Certain operations on data sets such as membership testing, distinct counts, and nearest-neighbour finding, become much more costly if the data is too large to fit in memory on a single machine. Approximate methods allow you to perform these operations much more efficiently, at the cost of slightly reduced accuracy, and sometimes enable you to work on continuous streams of data where the raw data is never stored. This talk gives an overview of these methods and describes some use cases.

Download slides

Andrew Clegg, Data Analytics Technical Manager at Pearson

Andrew Clegg

Biography: Andrew Clegg

Andrew has been a data scientist since long before they were called that. He has a PhD in computational linguistics and text mining, and has worked in life sciences, healthcare, social and online media, and publishing. Nowadays he heads up the Data Analytics & Visualization team at Pearson in London, helping companies from the Pearson, FT and Penguin groups make the most of their data.

Twitter: @andrew_clegg