Presentation: Tweet"Approximate methods for scalable data mining"
Certain operations on data sets such as membership testing, distinct counts, and nearest-neighbour finding, become much more costly if the data is too large to fit in memory on a single machine. Approximate methods allow you to perform these operations much more efficiently, at the cost of slightly reduced accuracy, and sometimes enable you to work on continuous streams of data where the raw data is never stored. This talk gives an overview of these methods and describes some use cases.
Download slides