You are viewing content from a past/completed QCon -


BERT for Sentiment Analysis on Sustainability Reporting

Sentiment analysis is a commonly used technique to assess customer opinion around a product or brand. The data used for these purposes often consists of product reviews, which have (relatively) clear language and are even labeled (e.g. ratings). But when you look at what companies write about their own performance they tend to use more subtle language. According to the Global Reporting Initiative (GRI) guidelines, a sustainability report should be balanced, thus reflecting on both the positive and negative aspects of the companies performance. 

Recent advances in the field of natural language processing (NLP) have brought forth new 'general language understanding' models which obtained great results on a wide range of NLP tasks. One of these models is Google's BERT. In this talk, I will discuss how, in collaboration with our colleagues from the Sustainability department, we created a custom sentiment analysis model capable of detecting these subtleties, and provide them with a metric indicating the balance of a report.

What is the work you are doing today?

I'm a data scientist with KPMG the Netherlands, which means that I'm also a consultant, so I work on projects for clients, which can be external clients like other companies, but they can also be internal. For example, for the case of this talk, we worked together with an internal department. We usually make very customized analysis for people within the company. I work in the data analytics department and then within that department there's the advanced analytics team to which I belong. If clients have a question for which there is not a premade solution, they usually come to us and we will try to make something custom for them.

What are the goals for your talk?

I'm hoping that I can provide a good use case for language models or sophisticated deep learning models that are out there, then how you can use them. What are steps you need to think about? Then I think it can be useful for people to realize that if you do this for someone else, there are different in-between steps you have to take. You have to really get the message clear. People think that you take data, you put it in a machine, you get an answer, but there's a whole bunch of stuff that happens in between. To show what those steps can be very useful.

Can you give us an idea what makes BERT different?

BERT came out, I think, a year or two years ago, which was one of first models that actually was really good at taking the context into account when you're trying to understand different words. You have these so called Word2Vec models, and they basically make a numeric representation of a word. This model took it a step further and tried to understand language in general, which can be used for transfer learning to do different tasks such as classifications or question answering. Over the last year and also quite recently, there's been so many new models coming out that do similar things. Recently Microsoft released a model, which has over 17 billion parameters. It's insanely big, which is performing better than any other model out there. There's so much development still going on, and it's really only starting. I'm hoping that the trend will start to go also more towards making more efficient models. There's still a lot to do, but it's pretty interesting times.

Are these models trained on a particular corpus of text?

They are already pretrained, but you can retrain them when necessary.


Susanne Groothuis

Sr. Data Scientist in the Advanced Analytics and Big Data team @KPMG

Susanne is a Data Scientist working in the Advanced Analytics and Big Data team of KPMG for the past three years. During her time at KPMG she has worked for clients in all different sectors such as governmental institutions, healthcare, transportation and others. In the most recent year she has...

Read more
Find Susanne Groothuis at:


Whittle, 3rd flr.


Machine Learning: The Latest Innovations


Machine LearningNatural Language ProcessingBig DataInterview Available


Slides are not available


From the same track

SESSION + Live Q&A Interview Available

Accuracy as a Failure

When you see a green light, will you cross the street? Or will you still check for cars?When your machine learning model has demonstrated high accuracy, do you push it to production?This talk contains cautionary tales of mistakes that might happen when you let your data scientists on a goose...

Vincent Warmerdam

Research Advocate @Rasa

SESSION + Live Q&A Machine Learning

Visual Intro to Machine Learning and Deep Learning

Break into machine learning with this gentle and intuitive journey through central concepts in machine learning -- from the most basic models up to the latest cutting edge deep learning models. This highly visual presentation will give you the mental map of ML prediction models and how...

Jay Alammar

VC and Machine Learning Explainer @STVcapital

SESSION + Live Q&A Silicon Valley

Speeding Up ML Development with MLFlow

Machine Learning is more approachable than ever before and the number of companies applying Machine Learning to build AI powered applications and products has dramatically increased in recent years.  On this journey of adopting Machine Learning, many companies learn successful Machine...

Hien Luu

Engineering Manager @LinkedIn focused on Big Data

UNCONFERENCE + Live Q&A Machine Learning

Machine Learning Open Space

Details to follow.

SESSION + Live Q&A Interview Available

The Fast Track to AI with Javascript and Serverless

Most people associate AI and Machine Learning with the Python language. This talk will explore how to get started building AI enabled platforms and services using full stack Javascript and Serverless technologies. With practical examples drawn from real world projects the talk will get you up and...

Peter Elger

Co-Founder & CEO @fourtheorem

View full Schedule