
Building and Scaling Inference workloads on Amazon EKS (Workshop)
Steve Messenger | Senior Specialist Solution Architect at AWS
Chhavi Negi | Senior Accelerated Compute Sales Specialist at AWS
Chris Merrett | Principal Consultant at Steamhaus
Join us for an immersive hands-on workshop exploring how to build and scale production-ready inference deployments on Amazon EKS using NVIDIA GPUs. As organizations move beyond experimentation to production deployment of GenAI applications, Kubernetes has emerged as a preferred platform for managing inference workloads at scale, offering robust orchestration, cost optimization, and enterprise-grade reliability.
Whether you're looking to deploy your first language model or scale existing inference workloads, this workshop will provide you with best practices and hands-on experience using industry-leading tools and frameworks. Learn directly from AWS experts who have helped organizations successfully deploy and manage large-scale GenAI infrastructure.
Through hands-on labs and real-world examples, you'll learn to master:
-
EKS cluster setup optimised for NVIDIA GPU workloads
-
Efficient model serving and scaling using vLLM
-
Distributed inference architecture implementation with Ray
-
Comprehensive monitoring and observability using Prometheus and Grafana
-
Best practices for production GenAI deployments on Kubernetes
-
Agentic AI on Amazon EKS
Who should attend: DevOps/MLOps engineers, AI/ML developers, Solution Architects, Product owners (or similar functions/titles).
Please do not forget to bring your laptops as this is a hands-on workshop.
Registration for this workshop will be available for ticket holders in the days prior to the event.



More workshops TBA!