You are viewing content from a past/completed QCon -


Understanding Deep Learning

No matter what your role is, it is really important to have some understanding of the models you’re working with. In last year's keynote, Rob Harrop talked about the importance of intuition in machine learning. This is a step towards that.

You might already be using neural networks. How can you go beyond just using deep learning and move towards understanding it so you can make your models better?


Deep learning is notoriously opaque, but there are principles behind how neural networks are constructed that can shed a lot of light on how they behave.


The goal of this talk is to help you understand foundational concepts about neural networks that are not often taught in online tutorials (and that even data scientists may not know), so you can better design and deploy neural networks.


We will go from

  1. Dissecting a single layer of a neural network to

  2. How to train (multi-layer) neural networks to

  3. Problems with training very deep networks and how you can tackle them.


At every stage, I will highlight key things to pay attention to, such as learning rates and how to initialise your network. These will all be related to how the networks are constructed and trained, so you can understand why these parameters are so important.


I will end the talk with practical takeaways used by state-of-the-art models to help you kickstart building powerful neural networks.

Tell me a bit about your experience with deep learning.

I frequently use deep learning for a range of different things, very often with time series. Previously I worked in finance. We tried to predict different kinds of stock prices, bond prices and economic indicators. I also worked with self-driving cars. The inspiration for this talk was really very much based in what I was doing because I found that when I first started working deep learning as opposed to other kind of machine learning models, I had this spreadsheet and then I would try to find the best model, for example, for predicting stock prices. And then I would try this one model architecture and then I would try hundreds of configurations of different numbers of layers, numbers of units in each layer, different kind of parameters, different kinds of optimizers. Then I felt that at the time I was just doing trial and error and I spent a lot of time running a lot of experiments without a clear sense of the direction in which I was going. I didn't know how the different parameters of the different components affected the output, how good the model was. When I did understand it many months later I thought, if I'd only known this then I could have saved so much time running those hundreds of experiments, each of which took hours. So I thought it would be a really good thing to talk about.

Tell me what's the plan for your talk.

We're going to start from the really fundamental stuff, but we're going to talk about it in a way that I hope people who have been using neural networks for months can still find it helpful. Neural networks have a lot of layers, so we're going to start off by talking about what happens in a single layer - a linear layer and a non-linearity - and we're going to talk about what kinds of things one or a few layers can model, why you need a non-linearity, and what different non-linearities do. And then we're going to move onto models with more and more layers because the consensus seems to be that the deeper the models are the better. But often it's very difficult to train very deep models. I'm going to go through the usual problems, when those problems might arise, and how people have been tackling those problems in state of the art models. Then I'll finish with practical tips to train really good models.

Can you give me an example of some of the things that you might talk about for recommendations on training models?

One of the things I'm going to talk about is about one parameter that's really important that's called the learning rate. If you don't set the learning rate right, if you set it to be too high then your model is going to jump from one solution to another. It will be very unstable and you really don't want that. Or maybe it's too low, then your model is not going to learn anything. And some other practical things. In the ML community people have found there's a trick called batchnorm that improves the performance of the model a lot. On the day I'll talk about more quick tips that the research community has figured out through years of experimentation.

Who is the main persona this talk addresses?

I wrote my talk to be informative enough for the person who has been working for deep learning for a while but hasn't really understood it, but I really want this talk to be understandable to someone who has not done any deep learning at all. So it will be simple enough for people who have not had any exposure to DL to understand. But also I hope it'll be insightful enough such that even if you've done machine learning for a while and don't have a huge understanding of deep learning that it will still be useful to you.

What do you want someone to leave the talk with?

I want them to leave the talk with an understanding of deep learning, but then the question is what does understanding of deep learning mean, right? In terms of the takeaways, my hope is that firstly they can understand the building blocks of neural networks, what they're made of, how they're trained. And secondly that they can understand why some hacks improve performance.


Jessica Yung

Machine Learning blogger and entrepreneur, Self-Driving Car Engineer Scholar @nvidia

Jessica is a research masters student in machine learning at University College London supervised by Prof. John Shawe-Taylor and André Barreto (Google DeepMind). She was previously at the University of Cambridge and was an NVIDIA Self-Driving Car Engineer Scholar. She applied machine learning to...

Read more
Find Jessica Yung at:


Churchill, G flr.


AI/Machine Learning without a PhD


Deep LearningMachine LearningLondonInterview Available


From the same track


How to Prevent Catastrophic Failure in Production ML Systems

AI systems can fail catastrophically and without warning, a characteristic not welcomed in the corporate environment. Martin will describe the unpredictable nature of artificial intelligence systems and how mastering a handful of engineering principles can mitigate the risk of failure. You’ll...

Martin Goodson

Chief Scientist/CEO @EvolutionAI

SESSION + Live Q&A Machine Learning

Test Driven Machine Learning

Software engineers are familiar with test driven development, but are not familiar with the statistical testing required in machine learning. Machine learning specialists are familiar with testing during the model building phase when they withhold data for cross-validation or final testing, but...

Detlef Nauck

Chief Research Scientist for Data Science @BTGroup and Visiting Professor @bournemouthuni

SESSION + Live Q&A London

H2O's Driverless AI: An AI that creates AI

Through my kaggle journey to the top spot, I have noticed that many of the things I do as a data scientist can be automated. In fact automation is critical to achieve good scores and promote accountability, ensuring that common pitfalls in the modelling process are prevented. Through...

Marios Michailidis

Competitive Data Scientist @h2oai


Intuition & Use-Cases of Embeddings in NLP & Beyond

Machine Learning has achieved tremendous advancements in language tasks over the last few years (think of technologies like Google Duplex, Google Translate, Amazon Alexa). One of the fundamental concepts underpinning this progress is the concept of word embeddings (using something like the...

Jay Alammar

VC and Machine Learning Explainer @STVcapital

UNCONFERENCE + Live Q&A Open Space

AI/Machine Learning Open Space

Shane Hastie

Director of Agile Learning Programs @ICAgile

View full Schedule