QCon is a practitioner-driven conference designed for technical team leads, architects, and project managers who influence software innovation in their teams.
Andrew Clegg, TweetData Analytics Technical Manager at Pearson
Biography: Andrew Clegg
Andrew has been a data scientist since long before they were called that. He has a PhD in computational linguistics and text mining, and has worked in life sciences, healthcare, social and online media, and publishing. Nowadays he heads up the Data Analytics & Visualization team at Pearson in London, helping companies from the Pearson, FT and Penguin groups make the most of their data.
Twitter: @andrew_clegg
Presentation: TweetApproximate methods for scalable data mining
Certain operations on data sets such as membership testing, distinct counts, and nearest-neighbour finding, become much more costly if the data is too large to fit in memory on a single machine. Approximate methods allow you to perform these operations much more efficiently, at the cost of slightly reduced accuracy, and sometimes enable you to work on continuous streams of data where the raw data is never stored. This talk gives an overview of these methods and describes some use cases.