SESSION + Live Q&A

Accuracy as a Failure

When you see a green light, will you cross the street? Or will you still check for cars?

When your machine learning model has demonstrated high accuracy, do you push it to production?

This talk contains cautionary tales of mistakes that might happen when you let your data scientists on a goose chase for accuracy. It may suprise you, but highly accurate models are more damaging than the inaccurate ones. I will also share some work that my team is doing to make sure that chatbots don't fall into this trap.

What is the work you're doing today?

I'm currently in between jobs. Before I was a data scientist, I started with Hadoop stuff and now we're the number one Big Data shop in the Netherlands. And essentially, I just help companies do better stuff using data. In two weeks time, I'm going to start as a Research Advocate at Rasa, a chabot company, and I'll be sitting in between the developer advocacy team and the research team. What I'm trying to do is on one side, talk to the users of the open source stuff that we do but on the other side talk to our researchers and help them explain how the internals work to the developers. Theoretical things are great, but we've got to focus on stuff that people want to use and then find a way to explain to developers why highly abstract mathematical tricks are something it'll make your tooling better. That's the space I'm in.

What are the goals you have for your talk?

I think there's a lot of optimism in artificial intelligence. But what I would also just quickly like to do is show you how quickly that optimism can get to pessimism. A simple example. Suppose I convince myself that I have this algorithm which is really good at detecting fraud. Then I'll send a policeman to catch a criminal. The policeman will come back with a criminal and I will have convinced myself that the algorithm works quite well. But if we would have sent the policeman somewhere else, we would have also gotten the criminal, maybe. It’s really easy to have bias in your system, which in the case might make you catch less criminals. All sorts of things can go wrong there. And demonstrating that this has gone wrong in the past is arbitrarily easy. I would just to demonstrate a couple of instances where the accuracy is actually an artificial thing.

Could you give me another example of something where you've got an overly accurate model effectively?

It's gonna help if I'm able to draw things. I' m gonna use my hands for now. Usually, using a machine learning model, what you're trying to do Is, for example, you're trying to split the red dots from the blue dots, let's say students classification. Let's say, here I've got a chunk of red dots and here I've got a chunk of blue dots. Now, the algorithm is going to say, there is some sort of boundary line in between here, and exactly at this boundary line the algorithm is gonna say, I'm a little bit unsure because it's in between the red thing and the blue thing. And if I move a little bit more to the blue part, then I'm more sure. But here's where the problem starts. If I've got my red dots here, my blue dots here, and it's saying, hey, if you're sort of here instead of in the middle, you're more certain, the algorithm is going to say, if you're over here, then you're also definitely certain. And if you're going miles that way, you're still certain. Even though I've never seen data in that region at all. My algorithm will still say, oh, I'm super certain that the dots are supposed to be blue.

Is that tooling available to look for those false accuracy problems?

Yes, definitely. But it's kind of a mental thing as you're designing, right? Another example could be, for example, there are these things called The Simpsons paradox. If you have the right dataset, I can prove to you that smoking is good for your health. That is, unless I also keep your age in mind, because typically peope who smoke more are younger and younger people are usually more healthy. So, yes, there are tools to solve this, but in the end, it still requires a bit of common knowledge and some common sense. That's kind of the issue. And there are some tools that allow you to say, hey, model the effect of smoking onto your health should be a negative effect. Any model that does not come to that conclusion is wrong. These sorts of models do exist, but you as a designer still have to pick that model. And I think the algorithm is not going to figure this out on its own.

What do you want people to leave your talk with?

It would be nice if people get away from the talk and they're just a little bit more skeptical when they see high accuracy statistics. Hopefully, by being just a little bit more aware, people will just run that one extra test and hopefully prevent something disastrous in production.

Speaker

Vincent Warmerdam

Research Advocate @Rasa

My name is Vincent, ask me anything. I have been evangelizing data and open source for the last 6 years. You might know my from tech talks where I attempt to defend common sense over hype in data science. I currently work at Rasa.

Speaker

Vincent Warmerdam

Research Advocate @Rasa

From the same track

SESSION + Live Q&A Machine Learning

BERT for Sentiment Analysis on Sustainability Reporting

Sentiment analysis is a commonly used technique to assess customer opinion around a product or brand. The data used for these purposes often consists of product reviews, which have (relatively) clear language and are even labeled (e.g. ratings). But when you look at what companies write about...

Susanne Groothuis

Sr. Data Scientist in the Advanced Analytics and Big Data team @KPMG

SESSION + Live Q&A Machine Learning

Visual Intro to Machine Learning and Deep Learning

Break into machine learning with this gentle and intuitive journey through central concepts in machine learning -- from the most basic models up to the latest cutting edge deep learning models. This highly visual presentation will give you the mental map of ML prediction models and how...

Jay Alammar

VC and Machine Learning Explainer @STVcapital

SESSION + Live Q&A Silicon Valley

Speeding Up ML Development with MLFlow

Machine Learning is more approachable than ever before and the number of companies applying Machine Learning to build AI powered applications and products has dramatically increased in recent years. On this journey of adopting Machine Learning, many companies learn successful Machine...

Hien Luu

Engineering Manager @LinkedIn focused on Big Data

UNCONFERENCE + Live Q&A Machine Learning

Machine Learning Open Space

Details to follow.

SESSION + Live Q&A Interview Available

The Fast Track to AI with Javascript and Serverless

Most people associate AI and Machine Learning with the Python language. This talk will explore how to get started building AI enabled platforms and services using full stack Javascript and Serverless technologies. With practical examples drawn from real world projects the talk will get you up and...

Peter Elger

Co-Founder & CEO @fourtheorem

View full Schedule

SESSION + Live Q&A

Accuracy as a Failure

What is the work you're doing today?

What are the goals you have for your talk?

Could you give me another example of something where you've got an overly accurate model effectively?

Is that tooling available to look for those false accuracy problems?

What do you want people to leave your talk with?

Speaker

Vincent Warmerdam

Find Vincent Warmerdam at:

Speaker

Vincent Warmerdam

Location

Track

Topics

Slides

Share

From the same track

BERT for Sentiment Analysis on Sustainability Reporting

Susanne Groothuis

Visual Intro to Machine Learning and Deep Learning

Jay Alammar

Speeding Up ML Development with MLFlow

Hien Luu

Machine Learning Open Space

The Fast Track to AI with Javascript and Serverless

Peter Elger

Follow QCon

Contact

Menu

QCons around the World