You are viewing content from a past/completed QCon

Presentation: Accuracy as a Failure

Track: Machine Learning: The Latest Innovations

Location: Whittle, 3rd flr.

Duration: 10:35am - 11:25am

Day of week: Tuesday

Share this on:

This presentation is now available to view on

Watch video with transcript

What You’ll Learn

  1. Hear about machine language models and stories of “accurate” models which made bad choices.
  2. Learn about the need to reverify models which seem very accurate, because they might hide flaws.


When you see a green light, will you cross the street? Or will you still check for cars?

When your machine learning model has demonstrated high accuracy, do you push it to production?

This talk contains cautionary tales of mistakes that might happen when you let your data scientists on a goose chase for accuracy. It may suprise you, but highly accurate models are more damaging than the inaccurate ones. I will also share some work that my team is doing to make sure that chatbots don't fall into this trap.


What is the work you're doing today?


I'm currently in between jobs. Before I was a data scientist, I started with Hadoop stuff and now we're the number one Big Data shop in the Netherlands. And essentially, I just help companies do better stuff using data. In two weeks time, I'm going to start as a Research Advocate at Rasa, a chabot company, and I'll be sitting in between the developer advocacy team and the research team. What I'm trying to do is on one side, talk to the users of the open source stuff that we do but on the other side talk to our researchers and help them explain how the internals work to the developers. Theoretical things are great, but we've got to focus on stuff that people want to use and then find a way to explain to developers why highly abstract mathematical tricks are something it'll make your tooling better. That's the space I'm in.


What are the goals you have for your talk?


I think there's a lot of optimism in artificial intelligence. But what I would also just quickly like to do is show you how quickly that optimism can get to pessimism. A simple example. Suppose I convince myself that I have this algorithm which is really good at detecting fraud. Then I'll send a policeman to catch a criminal. The policeman will come back with a criminal and I will have convinced myself that the algorithm works quite well. But if we would have sent the policeman somewhere else, we would have also gotten the criminal, maybe. It’s really easy to have bias in your system, which in the case might make you catch less criminals. All sorts of things can go wrong there. And demonstrating that this has gone wrong in the past is arbitrarily easy. I would just to demonstrate a couple of instances where the accuracy is actually an artificial thing.


Could you give me another example of something where you've got an overly accurate model effectively?


It's gonna help if I'm able to draw things. I' m gonna use my hands for now. Usually, using a machine learning model, what you're trying to do Is, for example, you're trying to split the red dots from the blue dots, let's say students classification. Let's say, here I've got a chunk of red dots and here I've got a chunk of blue dots. Now, the algorithm is going to say, there is some sort of boundary line in between here, and exactly at this boundary line the algorithm is gonna say, I'm a little bit unsure because it's in between the red thing and the blue thing. And if I move a little bit more to the blue part, then I'm more sure. But here's where the problem starts. If I've got my red dots here, my blue dots here, and it's saying, hey, if you're sort of here instead of in the middle, you're more certain, the algorithm is going to say, if you're over here, then you're also definitely certain. And if you're going miles that way, you're still certain. Even though I've never seen data in that region at all. My algorithm will still say, oh, I'm super certain that the dots are supposed to be blue.

And you can imagine who does outlier detection in production before they pass it to an algorithm? Well, nobody. And what can go wrong? Well, I've got a couple of examples that could have been solved by just doing stuff this.


Is that tooling available to look for those false accuracy problems?


Yes, definitely. But it's kind of a mental thing as you're designing, right? Another example could be, for example, there are these things called The Simpsons  paradox. If you have the right dataset, I can prove to you that smoking is good for your health. That is, unless I also keep your age in mind, because typically peope who smoke more are younger and younger people are usually more healthy. So, yes, there are tools to solve this, but in the end, it still requires a bit of common knowledge and some common sense. That's kind of the issue. And there are some tools that allow you to say, hey, model the effect of smoking onto your health should be a negative effect. Any model that does not come to that conclusion is wrong. These sorts of models do exist, but you as a designer still have to pick that model. And I think the algorithm is not going to figure this out on its own.


What do you want people to leave your talk with?


It would be nice if people get away from the talk and they're just a little bit more skeptical when they see high accuracy statistics. Hopefully, by being just a little bit more aware, people will just run that one extra test and hopefully prevent something disastrous in production.

Speaker: Vincent Warmerdam

Research Advocate @Rasa

My name is Vincent, ask me anything. I have been evangelizing data and open source for the last 6 years. You might know my from tech talks where I attempt to defend common sense over hype in data science. I currently work at Rasa.

Find Vincent Warmerdam at

Last Year's Tracks