Track:

Location:

Mountbatten, 6th flr.

Duration

Duration:

10:35am - 11:25am

Day of week:

Monday

Key Takeaways

Hear explanations, reasoning, and capabilities of deploying a Kappa architecture.
Understand how one company handles their requirement that maintains data precision in a global, high volume (1.5M msg/sec) streaming environment.
Understand implementation patterns and use cases to solve high volume streaming problems around Apache Storm and Kafka.

Abstract

Modern data streaming systems process millions of messages per second. To extract value from their data, organizations employ horizontally scalable distributed event processors such as Apache Storm. Such architectures are frequently designed under the assumption that data loss and calculation errors are acceptable.

In other cases, Kappa architecture is used to fulfill performance requirements without sacrificing consistency and reliability. And given the typical data consistency and performance requirements, external state and reliance on world clock become a taxing and hardly maintainable choice.

In this talk, we will discuss how we handled the challenges when building 1.5M msg/sec global processing system with Apache Storm and Apache Kafka at Integral Ad Science. We will inspect technology agnostic patterns reemerging in multiple applications including stream rewind technique, volatile in-memory state, derived logical time and synchronization, benefits of components collocation, and precision/performance trade offs.

Interview

Question:

QCon: Can you tell me about the work you are doing today?

Answer:

Alexey: I am leading development of real-time infrastructure for Integral Ad Science plus a few projects in nearby areas like Internet crawling , data science, and data warehousing. I take care of the technical architecture and vision.

Question:

QCon: So what were some of the issues that were challenging with your architecture?

Answer:

Alexey: When people talk about real time, they implicitly assume you can lose precision or lose data during processing. That is not the case for us.

Integral is a media quality measurement company, so our data precision is very important. While we had to do a lot of trade offs, we couldn’t afford to lose data. So we had to come up with something different from a Lambda architecture. Some people call it a Kappa architecture.

The idea behind a Kappa architecture is to actually have your data in Kafka (in a stream broker) and then process it in volatile memory. Kappa architectures avoid any sort of intermediate storage.

Frequently when people use stream processing systems, the idea is to use Cassandra, HBase or something like that. So it maintains state of the system in a persistence storage. With our workload, it was not possible, so we had to maintain everything in memory. From that fact, we had to reuse some patterns from financial data processing.

Another challenge we had was we cannot rely on is wall clocks for our use cases. Basically the idea is that you store all the data in Kafka or some persistent storage and then if you need to recover from failure, you replay data from Kafka. But, in this case, we cannot use your wall clock, so we have to rely on some sort of logical time.

Question:

QCon: Can you explain a bit more about a Kappa architecture, and how it’s different from Lambda architectures?

Answer:

Alexey: Lambda architecture is basically the idea is that you have a real-time view and an offline view of data.

At the end of the day, you take the same data, you process it through your streaming system and you run in an offline job. You use the results as a more precise view than the real-time view. This means if your real-time system goes down, you always can recover by running this offline batch processing.

With a Kappa architecture, you actually rely on your message broker as a system of record, so you don’t have an offline backup system. It is just that the processing, all your data is in your Kafka and then if your real-time pipeline fails, you rewind your stream in Kafka and do your processing.

Building a Modern Security Engineering Team

Real-Time Fraud Detection with Graphs

Successful Go program design, 6 years on

Rust: Systems Programming for Everyone

Using Pony for Fintech

Effortless Eventual Consistency with Weave Mesh

GoshawkDB: Making time with Vector Clocks

Distributed Consensus: Making Impossible Possible

Tracks

Covering innovative topics

Monday, 7 March

Back to Java

What to expect in Java 9 and Spring 5
Stream Processing @ Scale

Big data, fast-moving data. Practical implementation lessons on Real-time Data
DevOps & CI/CD

Lessons/stories on optimizing the deployment pipeline
Head-to-Tail Functional Languages

Free-range Monads, Tackling immutability, tales from production, and more...
Architecting for Failure

Your system will fail. Take control before it takes you with it
21st Century Culture from Geeks on the Ground

New ways to organise technology companies and workplace culture

Tuesday, 8 March

Architectures You've Always Wondered about

In-depth technical case studies from giants like: Microsoft, Netflix, Google, Twitter, and more...
Close to the Metal

Get efficiency back into your code, concepts like: cache efficient algorithm and lock free data structures
Containers (in production)

Real-world lessons on scalability and reliability in production container deployments
Modern CS in the real world

Real-world Industry adoption of modern CS ideas
Security, Incident Response & Fraud Detection

Master-level classes on building security into your system and responding to incidents when things go wrong.
Optimizing You

Keeping life in balance is always a challenge. Learning lifehacks

Wednesday, 9 March

Disrupting Finance

Technology advances in finance (blockchain, P2P, Machine Learning, API's)
Modern Native Languages

Modern native languages: Safe efficiency with Go, Rust, Swift
Full Stack Javascript

Level up Javascript with topics like Angular, React/ReactNative, Node, Mongo/Couch/Other, Falcor, GraphQL, etc
Data Science & Machine Learning Methods

A developer's data science and machine learning toolkit
Microservices for Mega-Architectures

Practical lessons on Microservices success.
Modern Agile Development

Revisiting Agile today and tackling challenges we are seeing in the wild

FULL SCHEDULE

Location:

Duration

Day of week:

Key Takeaways

Abstract

Interview

Find Alexey Kharlamov at

Similar Talks

Tracks

Covering innovative topics

Monday, 7 March

Tuesday, 8 March

Wednesday, 9 March

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Patterns of reliable in-stream processing @ Scale

Location:

Duration

Day of week:

More talks on:

Key Takeaways

Abstract

Interview

Find Alexey Kharlamov at

Similar Talks

Tracks

Covering innovative topics

Monday, 7 March

Tuesday, 8 March

Wednesday, 9 March

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World