Modern AI systems are no longer bottlenecked by models β they are bottlenecked by infrastructure. Training models, processing terabytes of data, deploying LLMs, and orchestrating GPU clusters all require tooling that simplifies distributed systems.
Ray is that tooling.
This repository contains everything used in my 60-minute workshop on AI Infrastructure with Ray. Each folder contains:
- A baseline implementation using traditional Python / PyTorch / multiprocessing
- A Ray-powered implementation showing how the same workflow becomes scalable, cleaner, and fault-tolerant
If you are new to Ray, this repo will help you understand not just APIs β but how Ray changes the way ML engineers build systems.
Inside this repo you will learn how to:
Ray Tasks & Actors turn normal Python functions into distributed workloads effortlessly.
Load, stream, preprocess, and batch terabytes of text or images using a completely Pythonic interface.
Use Ray Train to scale PyTorch/HuggingFace/PEFT models with fault tolerance and automatic checkpointing.
Ray Serve lets you deploy, autoscale, route, batch, and version models β including vLLM deployments.
Use vLLM with Ray Serve for blazing-fast, production-grade LLM inference.
ray_tutorials/
βββ ray_core/
β βββ baseline # Standard Python multiprocessing / threading examples
β βββ ray_tasks # Same examples rewritten using Ray Tasks & Actors
β βββ ray_actors # How to run on real Ray clusters (VMs, LAN, K8s)
β
βββ ray_data/
β βββ baseline # Pandas, plain Python data pipelines
β βββ ray_version # Ray Data: distributed loading, batching, streaming
β
βββ ray_train/
β βββ baseline # Single-GPU PyTorch training (DDP optional)
β βββ ray_version # Ray Train distributed training, FT, checkpoints
β
βββ ray_serve/
β βββ baseline # Simple Flask/FastAPI serving patterns
β βββ ray_version # Ray Serve deployments, autoscaling, routing, batching
β
βββ ray_tune/
β βββ examples # will be added soon
β
βββ vllm_examples/
Every module includes:
β Baseline Python code β Ray implementation β Explanations + comments β Cluster-ready examples
- Learn how to scale experiments without rewriting everything
- Build reproducible ML pipelines
- Run multi-GPU training in your lab or on the cloud
- Build data pipelines, training pipelines, and serving pipelines
- Turn your laptop code into distributed code
- Deploy LLMs with autoscaling and batching
- Build production ML infra without managing complex distributed systems
- Replace 5β6 tools with a unified Ray-based workflow
- Save engineering time and avoid infrastructure glue code
This workshop is built around a simple idea:
βDistributed ML should not require learning distributed systems.β
Ray lets you scale your code using:
- your existing Python functions
- your existing PyTorch models
- your existing HuggingFace workflows
- your existing serving patterns
No MPI. No Kubernetes YAML. No Spark jobs. No complicated Docker setups (unless you want them).
You get the power of a large-scale distributed system with the simplicity of standard Python.
Feel free to open issues or PRs if you have improvements, bug fixes, or new examples.