You are viewing content from a past/completed QCon

Presentation: BERT for Sentiment Analysis on Sustainability Reporting

Track: Machine Learning: The Latest Innovations

Location: Whittle, 3rd flr.

Duration: 1:40pm - 2:30pm

Day of week: Tuesday

Share this on:

This presentation is now available to view on InfoQ.com

Watch video with transcript

What You’ll Learn

  1. Hear about some of the newest sentiment analysis models.
  2. Learn about KPMG’s model used to take subtle variations of the language into account.

Abstract

Sentiment analysis is a commonly used technique to assess customer opinion around a product or brand. The data used for these purposes often consists of product reviews, which have (relatively) clear language and are even labeled (e.g. ratings). But when you look at what companies write about their own performance they tend to use more subtle language. According to the Global Reporting Initiative (GRI) guidelines, a sustainability report should be balanced, thus reflecting on both the positive and negative aspects of the companies performance. 

Recent advances in the field of natural language processing (NLP) have brought forth new 'general language understanding' models which obtained great results on a wide range of NLP tasks. One of these models is Google's BERT. In this talk, I will discuss how, in collaboration with our colleagues from the Sustainability department, we created a custom sentiment analysis model capable of detecting these subtleties, and provide them with a metric indicating the balance of a report.

Question: 

What is the work you are doing today?

Answer: 

I'm a data scientist with KPMG the Netherlands, which means that I'm also a consultant, so I work on projects for clients, which can be external clients like other companies, but they can also be internal. For example, for the case of this talk, we worked together with an internal department. We usually make very customized analysis for people within the company. I work in the data analytics department and then within that department there's the advanced analytics team to which I belong. If clients have a question for which there is not a premade solution, they usually come to us and we will try to make something custom for them.

Question: 

What are the goals for your talk?

Answer: 

I'm hoping that I can provide a good use case for language models or sophisticated deep learning models that are out there, then how you can use them. What are steps you need to think about? Then I think it can be useful for people to realize that if you do this for someone else, there are different in-between steps you have to take. You have to really get the message clear. People think that you take data, you put it in a machine, you get an answer, but there's a whole bunch of stuff that happens in between. To show what those steps can be very useful.

Question: 

Can you give us an idea what makes BERT different?

Answer: 

BERT came out, I think, a year or two years ago, which was one of first models that actually was really good at taking the context into account when you're trying to understand different words. You have these so called Word2Vec models, and they basically make a numeric representation of a word. This model took it a step further and tried to understand language in general, which can be used for transfer learning to do different tasks such as classifications or question answering. Over the last year and also quite recently, there's been so many new models coming out that do similar things. Recently Microsoft released a model, which has over 17 billion parameters. It's insanely big, which is performing better than any other model out there. There's so much development still going on, and it's really only starting. I'm hoping that the trend will start to go also more towards making more efficient models. There's still a lot to do, but it's pretty interesting times.

Question: 

Are these models trained on a particular corpus of text?

Answer: 

They are already pretrained, but you can retrain them when necessary.

Speaker: Susanne Groothuis

Sr. Data Scientist in the Advanced Analytics and Big Data team @KPMG

Susanne is a Data Scientist working in the Advanced Analytics and Big Data team of KPMG for the past three years. During her time at KPMG she has worked for clients in all different sectors such as governmental institutions, healthcare, transportation and others. In the most recent year she has developed a focus on Natural Language Processing and Deep Learning, creating solutions to automate and assist in the processing of varios types of documents. 

She has a background in Medical Physics, lives in Amsterdam, loves to travel and create art as a hobby.

Find Susanne Groothuis at

Last Year's Tracks