You are viewing content from a past/completed QCon -

Presentation: Understanding Deep Learning

Track: AI/Machine Learning without a PhD

Location: Churchill, G flr.

Duration: 4:10pm - 5:00pm

Day of week:

Slides: Download Slides

This presentation is now available to view on

Watch video with transcript

What You’ll Learn

  1. Hear about machine learning, deep learning and neural networks.
  2. Find out how to build a model, how to debug it, how to tune it.
  3. Learn what are some of the tricks that make building a correct model easier.


No matter what your role is, it is really important to have some understanding of the models you’re working with. In last year's keynote, Rob Harrop talked about the importance of intuition in machine learning. This is a step towards that.

You might already be using neural networks. How can you go beyond just using deep learning and move towards understanding it so you can make your models better?


Deep learning is notoriously opaque, but there are principles behind how neural networks are constructed that can shed a lot of light on how they behave.


The goal of this talk is to help you understand foundational concepts about neural networks that are not often taught in online tutorials (and that even data scientists may not know), so you can better design and deploy neural networks.


We will go from

  1. Dissecting a single layer of a neural network to

  2. How to train (multi-layer) neural networks to

  3. Problems with training very deep networks and how you can tackle them.


At every stage, I will highlight key things to pay attention to, such as learning rates and how to initialise your network. These will all be related to how the networks are constructed and trained, so you can understand why these parameters are so important.


I will end the talk with practical takeaways used by state-of-the-art models to help you kickstart building powerful neural networks.


Tell me a bit about your experience with deep learning.


I frequently use deep learning for a range of different things, very often with time series. Previously I worked in finance. We tried to predict different kinds of stock prices, bond prices and economic indicators. I also worked with self-driving cars. The inspiration for this talk was really very much based in what I was doing because I found that when I first started working deep learning as opposed to other kind of machine learning models, I had this spreadsheet and then I would try to find the best model, for example, for predicting stock prices. And then I would try this one model architecture and then I would try hundreds of configurations of different numbers of layers, numbers of units in each layer, different kind of parameters, different kinds of optimizers. Then I felt that at the time I was just doing trial and error and I spent a lot of time running a lot of experiments without a clear sense of the direction in which I was going. I didn't know how the different parameters of the different components affected the output, how good the model was. When I did understand it many months later I thought, if I'd only known this then I could have saved so much time running those hundreds of experiments, each of which took hours. So I thought it would be a really good thing to talk about.


Tell me what's the plan for your talk.


We're going to start from the really fundamental stuff, but we're going to talk about it in a way that I hope people who have been using neural networks for months can still find it helpful. Neural networks have a lot of layers, so we're going to start off by talking about what happens in a single layer - a linear layer and a non-linearity - and we're going to talk about what kinds of things one or a few layers can model, why you need a non-linearity, and what different non-linearities do. And then we're going to move onto models with more and more layers because the consensus seems to be that the deeper the models are the better. But often it's very difficult to train very deep models. I'm going to go through the usual problems, when those problems might arise, and how people have been tackling those problems in state of the art models. Then I'll finish with practical tips to train really good models.


Can you give me an example of some of the things that you might talk about for recommendations on training models?


One of the things I'm going to talk about is about one parameter that's really important that's called the learning rate. If you don't set the learning rate right, if you set it to be too high then your model is going to jump from one solution to another. It will be very unstable and you really don't want that. Or maybe it's too low, then your model is not going to learn anything. And some other practical things. In the ML community people have found there's a trick called batchnorm that improves the performance of the model a lot. On the day I'll talk about more quick tips that the research community has figured out through years of experimentation.


Who is the main persona this talk addresses?


I wrote my talk to be informative enough for the person who has been working for deep learning for a while but hasn't really understood it, but I really want this talk to be understandable to someone who has not done any deep learning at all. So it will be simple enough for people who have not had any exposure to DL to understand. But also I hope it'll be insightful enough such that even if you've done machine learning for a while and don't have a huge understanding of deep learning that it will still be useful to you.


What do you want someone to leave the talk with?


I want them to leave the talk with an understanding of deep learning, but then the question is what does understanding of deep learning mean, right? In terms of the takeaways, my hope is that firstly they can understand the building blocks of neural networks, what they're made of, how they're trained. And secondly that they can understand why some hacks improve performance.

The overarching theme of it is the hope that understanding these two things can give them the intuition on how to build a model, how to improve it, how to design and to debug these models, because as with a lot of software engineering the most annoying thing and the thing that takes a lot of time is debugging these models, and it makes a big difference if you can use your intuition to figure out what might be going wrong. So I think the main takeaways would be having the confidence to do that, and being able to build on this foundation, being able to pick up new architectures quickly. Those would be the main takeaways.

Speaker: Jessica Yung

Machine Learning blogger and entrepreneur, Self-Driving Car Engineer Scholar @nvidia

Jessica is a research masters student in machine learning at University College London supervised by Prof. John Shawe-Taylor and André Barreto (Google DeepMind). She was previously at the University of Cambridge and was an NVIDIA Self-Driving Car Engineer Scholar. She applied machine learning to finance at Jump Trading and consults on machine learning.


Jessica is keen on sharing knowledge and writes about machine learning and how to learn effectively on her blog at

Find Jessica Yung at

Preliminary tracks

Discover some of the topics you will see at QCon London. *The schedule is subject to change

Event-Based Architectures: The Hard Parts

Architectures You've Always Wondered About

Building High Performing Teams

Scaling Security, from Device to Cloud

From Remote to Hybrid Teams: Return to Office?

Performance/Mechanical Sympathy

Next Generation Microservices: Building Distributed Systems the Right Way

Modern Data Pipelines & Streams

The Cloud Operating Model

Chaos and Resilience: Architecting for Success

Scaling Frontend

Modern JVM Innovations

MLOps: Implementing ML Across the Enterprise

Architecting a Modern Financial Institution

Crafting the Developer Experience