Workers are the containerized environments that run your code on Runpod Serverless. After creating and testing your handler function, you need to package it into a Docker image and deploy it to an endpoint.This page provides an overview of the worker deployment process.
Build your Docker image locally, push it to Docker Hub (or another container registry), and deploy it to Runpod. This gives you full control over the build process and allows you to test the image locally before deployment.
Connect your GitHub repository to Runpod and deploy directly from your code. Runpod automatically builds the Docker image from your repository and deploys it to an endpoint. This streamlines the deployment process and enables continuous deployment workflows.
To deploy your workers with AI/ML models, follow this order of preference:
Use cached models: If your model is available on Hugging Face (public or gated), this is the recommended approach. Cached models provide the fastest cold starts, eliminate download costs, and persist across worker restarts.
Bake the model into your Docker image: If your model is private and not available on Hugging Face, embed it directly in your worker’s container image using COPY or RUN wget. This ensures the model is always available, but it increases image size and build time.
Use network volumes: You can use network volumes to store models and other files that need to persist between workers. Models loaded from network storage are slower than cached or baked models, so you should only use this option when the preceeding approaches don’t fit your needs.
Active workers: “Always on” workers that eliminate cold start delays. They never scale down, so you are charged as long as they are active, but they receive a discount (up to 30%) compared to flex workers. (Default: 0).
Flex workers: “Sometimes on” workers that scale during traffic surges. They transition to idle after completing jobs. (Default: max_workers - active_workers = 3).
The system will also sometimes add additiona extra workers during traffic spikes when Docker images are cached on host servers. (Default: 2).
Workers move through different states as they handle requests and respond to changes in traffic patterns. Understanding these states helps you monitor and troubleshoot your workers effectively.
Initializing: The worker starts up while the system downloads and prepares the Docker image. The container starts and loads your code.
Idle: The worker is ready but not processing requests. No charges apply while idle.
Running: The worker actively processes requests. Billing occurs per second.
Throttled: The worker is ready but temporarily unable to run due to host machine resource constraints.
Outdated: The system marks the worker for replacement after endpoint updates. It continues processing current jobs during rolling updates (10% of max workers at a time).
Unhealthy: The worker has crashed due to Docker image issues, incorrect start commands, or machine problems. The system automatically retries with exponential backoff for up to 7 days.
You can view the state of your workers using the Workers tab of the Serverless endpoint details page in the Runpod console. This page provides real-time information about each worker’s current state, resource utilization, and job processing history, allowing you to monitor performance and troubleshoot issues effectively.
By default, each Runpod account can allocate a maximum of 5 workers (flex + active combined) across all endpoints. If your account balance exceeds a certain threshold, you can increase this limit: