Hands-on AI for Vision: From Diffusion to Production

Dive into advanced image generation and computer vision techniques in this hands-on workshop. Participants will explore the evolution of image generation models, learn to perform inpainting, outpainting, and image-to-image transformations, and master several methods of image generation model fine-tuning. On the computer vision side, attendees will experiment with powerful zero-shot open-source tools for tagging, object detection, OCR, and segmentation — culminating in next-level use cases like visual question answering, and image captioning. Finally, discover best practices for productionizing your models, from optimizing inference and hardware utilization to deploying scalable pipelines in Vertex AI. By the end, you’ll be equipped with the expertise needed to integrate next-generation Vision AI solutions into real-world applications.

Key Takeaways

1 Understand diffusion models and practical image generation techniques.

2 Learn to fine-tune image generation models with adapters or full-weight training.

3 Explore zero-shot computer vision frameworks for tagging, object detection, OCR, and segmentation.

4 Implement visual question-answering, and image captioning functionalities by utilising multi-modal LLMs.

5 Optimize and deploy AI solutions using best practices, hardware strategies, and scalable cloud infrastructure.


Speaker

Iaroslav Amerkhanov

Senior Data Scientist @Delivery Hero

Iaroslav pioneered projects in Food Science at Delivery Hero and is now focused on generative AI solutions. He previously founded an EdTech startup and co-founded a sentiment analysis platform.

Read more

Date

Friday Apr 11 / 09:00AM BST ( 7 hours )

Level

Level intermediate

Share

Prerequisites

  • Familiarity with Python programming and basic command-line operations.  
  • Solid understanding of machine learning concepts and experience with PyTorch or TensorFlow.
  • A laptop with installed software: Python 3.8+, PyTorch, relevant libraries (Torchvision, diffusers, OpenCV), and Docker (recommended).
  • Participants should have access to GCP.