Track:

Data Engineering : Where the Rubber meets the Road in Data Science

Location:

Windsor, 5th flr.

Duration

Duration:

11:50am - 12:40pm

Day of week:

Monday

Level:

Intermediate

Persona:

Data Scientist

Abstract

Any data scientist who works with real data will tell you that the hardest part of any data science task is the data preparation. Everything from cleaning dirty data to understanding where your data is missing and how your data is shaped, the care and feeding of your data is a prime task for the working data scientist.

I will describe my experiences in the field and present some useful open source software to automate some of the necessary but insufficient things that I do every time I'm presented new data. In particular, we'll talk about discovering missing values, values with skewed distributions and discovering likely errors within your data, as well as a novel approach at finding data interconnectedness based on usage using unsupervised learning.

I will describe the impact of these lessons to team construction and how to avoid some of the most painful lessons.

Speaker: Casey Stella

Committer and PMC member on the Apache Metron project

I am a committer and PMC member on the Apache Metron project in the engineering team at Hortonworks. In the past, I've worked as an architect and senior engineer at a healthcare informatics startup spun out of the Cleveland Clinic, as a developer at Oracle and as a Research Geophysicist in the Oil & Gas industry. I specialize in writing software and solving problems where there are either scalability concerns due to large amounts of traffic or large amounts of data. I have a particular passion for data science problems or any thing mathematical.

Find Casey Stella at

Speaker page

@casey_stella

https://www.linkedin.com/in/casey-stella-84b9a11

Similar Talks

Tracks

Architecting for Failure

Building fault tolerate systems that are truly resilient
Architectures You've Always Wondered about

QCon classic track. You know the names. Hear their lessons and challenges.
Modern Distributed Architectures

Migrating, deploying, and realizing modern cloud architecture.
Fast & Furious: Ad Serving, Finance, & Performance

Learn some of the tips and technicals of high speed, low latency systems in Ad Serving and Finance
Java - Performance, Patterns and Predictions

Skills embracing the evolution of Java (multi-core, cloud, modularity) and reenforcing core platform fundamentals (performance, concurrency, ubiquity).
Performance Mythbusting

Performance myths that need busting and the tools & techniques to get there

Dark Code: The Legacy/Tech Debt Dilemma

How do you evolve your code and modernize your architecture when you're stuck with part legacy code and technical debt? Lessons from the trenches.
Modern Learning Systems

Real world use of the latest machine learning technologies in production environments
Practical Cryptography & Blockchains: Beyond the Hype

Looking past the hype of blockchain technologies, alternate title: Weaselfree Cryptography & Blockchain
Applied JavaScript - Atomic Applications and APIs

Angular, React, Electron, Node: The hottest trends and techniques in the JavaScript space
Containers - State Of The Art

What is the state of the art, what's next, & other interesting questions on containers.
Observability Done Right: Automating Insight & Software Telemetry

Tools, practices, and methods to know what your system is doing

Data Engineering : Where the Rubber meets the Road in Data Science

Science does not imply engineering. Engineering tools and techniques for Data Scientists
Modern CS in the Real World

Applied, practical, & real-world dive into industry adoption of modern CS ideas
Workhorse Languages, Not Called Java

Workhorse languages not called Java.
Security: Lessons Learned From Being Pwned

How Attackers Think. Penetration testing techniques, exploits, toolsets, and skills of software hackers
Engineering Culture @{{cool_company}}

Culture, Organization Structure, Modern Agile War Stories
Softskills: Essential Skills for Developers

Skills for the developer in the workplace

LAST YEAR'S SCHEDULE

Location:

Duration

Day of week:

Level:

Persona:

Abstract

Find Casey Stella at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World

Presentation: Data Cleansing and Understanding Best Practices

Location:

Duration

Day of week:

Level:

Persona:

More talks on:

Abstract

Find Casey Stella at

Similar Talks

Tracks

Conference for Professional Software Developers

Follow QCon

Contact

Menu

QCons around the World