Presentation: Resilient Predictive Data Pipelines

Location:

Duration

Duration: 
1:40pm - 12:40pm

Day of week:

Abstract

Big Data companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. In order to meet strict requirements on data security, fault-tolerance, cost control, job scalability, and uptime, they need to closely manage their core technology. Like serving systems (e.g. web application servers and OLTP databases) that need to be up 24x7 to display content to users, data pipelines need to be up and running in order to pick the most engaging and up-to-date content to display. In other words, updated ranking models, new content recommendations, and the like are what make data pipelines an integral part of an end user’s web experience.

Agari, a leading email security company, is applying big data best practices to the problem of securing its customers from email-born threats. Like many start-ups before it, Agari runs a lean organization that leverages the Cloud (AWS) for its infrastructure needs. However, in order to meet the needs of our business, we, at Agari, have designed and implemented a system that leverages the best of both the cloud (AWS's SNS, SQS, Kinesis, Auto-scaling, S3, Lambda, API Gateway, etc...) and Big Data (Spark, Airbnb's Airflow, etc...) in a maintainable way.

Tracks

Covering innovative topics

Monday, 7 March

Tuesday, 8 March

Wednesday, 9 March

Conference for Professional Software Developers