Learning Path: Training, Tracking, Testing and Deploying ML

Location: Albert, 2nd flr.

Duration: 9:00am - 4:00pm

Day of week: Thursday, Friday

Level: Beginner

Prerequisites

Attendee Prerequisites:

To attend this workshop you should be:

  • Very familiar with Python. All program code will utilise it. You should be as well familiar with scikit-learn and pandas, as this workshop doesn’t focus on training a machine learning problem, but more on how to track and deploy it.
  • You should have a basic understanding of either sh, bash or zsh
  • You should understand the concept of a Docker container
  • It is definitely a plus, if you are familiar with kubernetes and/or terraform

Attendee Tools/Equipment:

  • You have to have a laptop with WiFi capability
  • You have to have a gitlab account (a free one is enough)
  • We will provide virtual machines with the necessary tooling installed.
  • We will provide access to an AWS account for the time of the workshop.
  • We will provide all code required for the workshop via git

Documentation of the tools required as well as a setup script for Mac and Linux computers will be provided in advance. However, we will not likely be able to resolve errors related to local installations during the workshop, so if you don’t feel comfortable to resolve them yourself, please choose the option of VMs.

Learning and implementing ML and AI methodologies is increasingly on many developers’ ToDo-Lists. Getting trained on these topics is a relatively easy thing to do with recent easy-to-use and ready-made libraries. However, it is more difficult to keep track of multiple experiments and to deploy them in a reproducible fashion into production using CI/CD. This workshop is targeted towards developers who want to learn the tooling to manage these topics. 

The value of this workshop is that it provides you with a guided walkthrough to produce a  working setup that addresses many common problems when deploying machine learning (and other services) into production. In particular, the workshop provides templates for CI/CD pipelines, kubernetes deployments, and machine learning and tracking. Once these templates are well understood, it should be possible to transfer them to other problems at your day job, which is a great advantage over learning on your own or trying to resolve bugs encountered during development. Additionally, this workshop offers a good overview of the possibilities of automation in DevOps. 

This is a two-day workshop and you should join if either of the following points applies to you:

  • You are a developer and want to expand your knowledge in DevOps and traceable machine learning
  • You are a technical lead, you can code and you want to see DevOps in action
  • You need to find a way to reliably deploy machine learning to production

The objectives of this workshop are that you can:

  • Set up a kubernetes cluster in AWS that provides an MLFlow instance logging to a database located in the cluster using the provided terraform files
  • Train a model in sklearn that is tracked by MLFlow
  • Write a CI/CD pipeline that tests and lints your code and automatically creates a docker container
  • Deploy a model using MLFlow and the created docker container
  • Test the deployed model to ensure it provides reasonable responses

It is explicitly not the goal of this workshop to introduce the tools in all their details. The focus is to provide attendees with a “starter-kit” of working code and knowledge, which can be expanded upon. This workshop shall provide guidance and points to start from which you can utilize in your day-to-day job.

As stated in the objectives, we will start with setting up a kubernetes cluster with services we will need, namely MLFlow and a PostgreSQL database. For this we will utilise a provided terraform script, which will be explained in detail and deployed by everyone individually.

This will leave everyone with their individual EKS cluster and services.

Then, we write code to train a machine learning model. This code will be tested, linted and dockerized using a gitlab CI/CD pipeline. After the Docker image is available we will execute it as a Kubernetes Job, which stores the final model to MLFlow.

On the second day, we will build a service that hosts the model we trained and exposes REST Api. This service will be again tested, linted and dockerized by a gitlab CI/CD pipeline. Then we will adapt our terraform files to deploy the new service into our kubernetes cluster.

Finally, we will test our deployment and query it for predictions.

Speaker: Jendrik Jördening

Data Scientist @Nooxit

Jendrik is Head of Data Science at Nooxit. He formerly worked at Aurubis and Akka Germany on Data Science and Deep Learning in the field of industry 4.0 and autonomous machines.
At the same time he took part in the Udacity Self-Driving Car Nanodegree, participating with a group of other Udacity student in the Self-Racing Cars event at the Thunderhill race-track in California.
There, the group of students taught a car to drive around every turn of the race track autonomously. 

Find Jendrik Jördening at

Other Workshops:

Tracks

  • Architectures You've Always Wondered About

    Hard-earned lessons from the names you know on scalability, reliability, security, and performance.

  • Machine Learning: The Latest Innovations

    AI and machine learning is more approachable than ever. Discover how ML, deep learning, and other modern approaches are being used in practice.

  • Kubernetes and Cloud Architectures

    Learn about cloud native architectural approaches from the leading industry experts who have operated Kubernetes and FaaS at scale, and explore the associated modern DevOps practices.

  • Evolving Java

    JVM futures, JIT directions and improvements to the runtimes stack is the theme of this year’s JVM track.

  • Next Generation Microservices: Building Distributed Systems the Right Way

    Microservice-based applications are everywhere, but well-built distributed systems are not so common. Early adopters of microservices share their insights on how to design systems the right way.

  • Chaos and Resilience: Architecting for Success

    Making systems resilient involves people and tech. Learn about strategies being used, from cognitive systems engineering to chaos engineering.

  • The Future of the API: REST, gRPC, GraphQL and More

    The humble web-based API is evolving. This track provides the what, how, and why of future APIs.

  • Streaming Data Architectures

    Today's systems move huge volumes of data. Hear how the innovators in this space are designing systems and leveraging modern data stream processing platforms.

  • Modern Compilation Targets

    Learn about the innovation happening in the compilation target space. WebAssembly is only the tip of the iceberg.

  • Modern CS in the Real World

    Head back to academia to solve today's problems in software engineering.

  • Bare Knuckle Performance

    Crushing latency and getting the most out of your hardware.

  • Leading Distributed Teams

    Remote and distributed working are increasing in popularity, but many organisations underestimate the leadership challenges. Learn from those who are doing this effectively.

  • Driving Full Cycle Engineering Teams at Every Level

    "Full cycle developers" is not just another catch phrase; it's about engineers taking ownership and delivering value, and doing so with the support of their entire organisation. Learn more from the pioneers.

  • JavaScript: Pushing the Client Beyond the Browser

    JavaScript is not just the language of the web. Join this track to learn how the innovators are pushing the boundaries of this classic language and ecosystem

  • When Things Go Wrong: GDPR, Ethics, & Politics

    Privacy, confidentiality, safety and security: learning from the frontlines, from both good and bad experiences

  • Growing Unicorns in the EU: Building, Leading and Scaling Financial Tech Start Ups

    Learn how EU FinTech innovators have designed, built, and led both their technologies and organisations.

  • Building High Performing Teams

    There are many discussions outlining the secret sauce of high-performing teams. Learn how to balance the essential ingredients of high performing teams such as trust and delegation, as well as recognising the pitfalls and problems that will ruin any recipe.

  • Scaling Security, from Device to Cloud

    Implementing effective security is vitally important, regardless of where you are deploying software applications