You are viewing content from a past/completed QCon

Presentation: Beyond the Distributed Monolith: Rearchitecting the Big Data Platform

Track: Next Generation Microservices: Building Distributed Systems the Right Way

Location: Fleming, 3rd flr.

Duration: 1:40pm - 2:30pm

Day of week: Monday

Slides: Download Slides

Share this on:

This presentation is now available to view on InfoQ.com

Watch video with transcript

What You’ll Learn

  1. Hear how BBC re-architected a distributed monolith.
  2. Find out what lessons they learned along the way.
  3. Learn about the need to have proper tooling, forecast future growth and cost.

Abstract

The BBC’s Audience Platform Data team collects, transforms and delivers billions of events each day from audience interactions with mobile apps and web sites such as BBC News, BBC Sport,  iPlayer and Sounds.

Last year we migrated to a new analytics provider and we took this as an opportunity to re-architect our distributed monolith. We will share the lessons learnt from operating it for nearly 3 years, how we designed our new microservices architecture so that it is easier to test, scale to cater for increasing demand, keep track of the message flow and replay errors without stopping the rest of the messages from being processed. 

We will also discuss the ideas behind the tooling we have developed which helps us operate our pipeline and has helped new members of the team share the understanding required to troubleshoot problems.

We have been in production for over a year and as demand from our big data platform increases we are beginning to discuss what our platform may look like in the future and the steps we will go through to achieve it.

Question: 

What are you doing now?

Answer: 

I am a Principal Systems Engineer at the BBC, and within the BBC I work in the area that deals with our personalizations in services. What this means is that when you sign up with an account with the BBC, then you enter all this personalization features and you can get show recommendations, you can follow shows, stay up to date by receiving notifications when things that you are interested in are becoming available and things like that. For those personalizations there's a lot of data that is involved from the time that you raised your account and you provide us with your personal information to then saving all the different activities and tracking that with do throughout the website. This is the team that I'm in, and that's called the data platform team. We aim to be the single point of reference within the BBC where we store all this data and we make it available for internal usage within the BBC. There's teams of data scientists in different areas who are reporting on the data, who are trying to use it to understand whether the features that we are providing and the products that we have are actually serving what people need.

Question: 

What is the goal for your talk?

Answer: 

The goal is about rearchitecting the monolith. We used to have a different data platform a few years ago. It was not intentional, but it ended up being a distributed monolith. And this was because at the time that we realized we had operational issues, we saw that even though it was a bunch of microservices, at the end of the day, it was just really, really hard. And it appeared that it was this big thing that was just blowing up and there was no way to cope with it. For a couple of years, we had to operate this platform and there wasn't really a business case to change and re-architect the whole thing. But because things end up changing in the BBC, we migrated to a new analytics provider. And this is when we saw the golden opportunity to re-architect. Sometimes for the BBC we're seeing great increase of data based on a single news item. We can have spikes in load. How can we cope with that? When our users are querying the data, they don't really see the impact of what's going on. My main goals of a talk are applying the lessons learned, then how microservices limit failure, how you can recover and how we also cope with different loads and data evolving over time.

Question: 

You also mentioned in your abstract that you developed your own tooling to operate your pipelines. Can you give us a little preview of what tooling you developed?

Answer: 

Because this microservices architecture can be quite complex, you have to know if you're operating it, the name of the services, where do they reside, how do you deploy this, a lot of names of things that you need to keep track on. Normally most companies have what is called run books. By the end of the day, this run books are these wiki documents that tell you what are the links to all these things. And there's a lot of nitty gritty detail that you have to know. And the interesting thing is that we thought, well, we've had a lot of new people joining the team and they only need to really understand a high level. How do you connect to certain services? We developed a command line interface, that you could post very simple questions. Developing this command line interface has allowed us to automate a lot of this text so we don't have to manually intervene as we used to do back in the day of the distributed monolith.

Question: 

What would be the key takeaways?

Answer: 

The big lesson is that microservices architectures are always evolving. Whenever you build something, always think that it should be easy to change. If you make different components easily replaceable, it is going to make your life easier in the future. Another big question was that when you are designing, think ahead how you want to operate or how you think you would like to operate that new architecture. Because if you think about it at the end, then you probably haven't addressed the questions or don't have enough metrics or this is really hard to expand. And also, invest in testing in the early days. Think of unit tests on the early days and create a framework for testing. Another thing is the need to discuss the technologies to use so people would choose the language for their microservice taking full ownership and responsibility for their work. And the last one, how do you do this in a cost effective way? We spend quite a bit of time doing cost forecasting, how different technologies would impact the cost.

Speaker: Blanca Garcia-Gil

Principal Engineer on data platform @BBC

Blanca Garcia Gil is a principal systems engineer at BBC. She currently works on a team whose aim is to provide a reliable platform at petabyte scale for data engineering and machine learning. She provides leadership on ensuring that the development team has the correct infrastructure and tooling required for the entire delivery and support cycles of the project.

Prior to the BBC she has had a variety of roles from developing web applications for an agency, mobile prototyping (before smartphones came about!), developing a content management system or writing highly scalable APIs. She always enjoyed working closer to the backend and since she started developing in the Cloud she's taken on the challenge of learning about infrastructure. This has led to her deciding to move recently to an infrastructure automation and reliability role.

Find Blanca Garcia-Gil at

Last Year's Tracks