Programmatic Cluster Scaling#

ray.autoscaler.sdk.request_resources#

Within a Ray program, you can command the autoscaler to scale the cluster up to a desired size with request_resources() call. The cluster will immediately attempt to scale to accommodate the requested resources, bypassing normal upscaling speed constraints.

ray.autoscaler.sdk.request_resources(num_cpus: int | None = None, bundles: List[dict] | None = None) → None[source]

Command the autoscaler to scale to accommodate the specified requests.

The cluster will immediately attempt to scale to accommodate the requested resources, bypassing normal upscaling speed constraints. This takes into account existing resource usage.

For example, suppose you call request_resources(num_cpus=100) and there are 45 currently running tasks, each requiring 1 CPU. Then, enough nodes will be added so up to 100 tasks can run concurrently. It does not add enough nodes so that 145 tasks can run.

This call is only a hint to the autoscaler. The actual resulting cluster size may be slightly larger or smaller than expected depending on the internal bin packing algorithm and max worker count restrictions.

Parameters:

num_cpus – Scale the cluster to ensure this number of CPUs are available. This request is persistent until another call to request_resources() is made to override.
bundles (List[ResourceDict]) – Scale the cluster to ensure this set of resource shapes can fit. This request is persistent until another call to request_resources() is made to override.

Examples

>>> from ray.autoscaler.sdk import request_resources
>>> # Request 1000 CPUs.
>>> request_resources(num_cpus=1000) 
>>> # Request 64 CPUs and also fit a 1-GPU/4-CPU task.
>>> request_resources( 
...     num_cpus=64, bundles=[{"GPU": 1, "CPU": 4}])
>>> # Same as requesting num_cpus=3.
>>> request_resources( 
...     bundles=[{"CPU": 1}, {"CPU": 1}, {"CPU": 1}])

DeveloperAPI: This API may change across minor Ray releases.