Virtual Cluster Management#
Create a Virtual Cluster#
Description:
Creates a new Virtual Cluster with specified parameters, allowing you to define clusters with specific replica distributions and divisibility settings.
Command:
ray vcluster create [OPTIONS]
Options:
Option |
Type |
Default |
Required |
Description |
---|---|---|---|---|
–address TEXT |
str |
None |
NO |
Specifies the Ray cluster address. If not provided, the RAY_ADDRESS environment variable is used. |
–id TEXT |
str |
N/A |
YES |
Assigns a unique identifier to the Virtual Cluster being created. |
–divisible |
bool |
False |
NO |
Determines if the Virtual Cluster is divisible into smaller logical or job clusters. |
–replica-sets TEXT |
dict |
N/A |
YES |
JSON-serialized dictionary defining the replica sets for the cluster (e.g., |
Usage Examples#
Example: Creating a Divisible Virtual Cluster
ray vcluster create --id logical1 --divisible --replica-sets '{"group2":1}'
Output:
Virtual cluster 'logical1' created successfully
Update a Virtual Cluster#
Description:
Update an existing Virtual Cluster with specified parameters.
Command:
ray vcluster update [OPTIONS]
Options:
Option |
Type |
Default |
Required |
Description |
---|---|---|---|---|
–address TEXT |
str |
None |
NO |
Specifies the Ray cluster address. If not provided, the RAY_ADDRESS environment variable is used. |
–id TEXT |
str |
N/A |
YES |
Assigns a unique identifier to the Virtual Cluster being created. |
–divisible |
bool |
False |
NO |
Determines if the Virtual Cluster is divisible into smaller logical or job clusters. |
–replica-sets TEXT |
dict |
N/A |
YES |
JSON-serialized dictionary defining the replica sets for the cluster (e.g., |
–revision INTEGER |
int |
0 |
NO |
Indicates the revision number for updating the Virtual Cluster. |
Usage Examples#
Example 1: Updating a Divisible Virtual Cluster
ray vcluster update --id logical1 --divisible --replica-sets '{"group2":2}'
Output:
Virtual cluster 'logical1' updated successfully
Example 2: Handling Updating Failure Due to Incorrect Revision
ray vcluster update --id logical1 --divisible --replica-sets '{"group1":2}' --revision 2
Output:
Failed to update virtual cluster 'logical1': The revision (2) is expired, the latest revision of the virtual cluster logical1 is 1736911613521214948
Remove a Virtual Cluster#
Description:
Removes an existing Virtual Cluster by its unique identifier from your Ray environment.
Command:
ray vcluster remove [OPTIONS] <virtual-cluster-id>
Options:
Option |
Type |
Default |
Required |
Description |
---|---|---|---|---|
–address TEXT |
str |
None |
NO |
Specifies the Ray cluster address. If not provided, the RAY_ADDRESS environment variable is used. |
<virtual-cluster-id> |
str |
N/A |
YES |
The unique identifier of the Virtual Cluster to be removed. |
Usage Example:
Example 1: Removing a Virtual Cluster by ID
ray vcluster remove logical1
Output:
Virtual cluster 'logical1' removed successfully
Example 2: Handling Removal Failure Due to Non-Existent ID
ray vcluster remove unknownCluster
Output:
Failed to remove virtual cluster 'unknownCluster': The logical cluster unknownCluster does not exist.
List Virtual Clusters#
Description:
Displays a summary of all Virtual Clusters in your Ray environment. By default, it presents a table listing each cluster’s ID, divisibility status, and any subdivided clusters. The --detail
flag enriches the output with comprehensive information, including replica distributions and node instance statuses. The --format
option allows output customization in default
, json
, yaml
, or table
formats.
Command:
ray list vclusters [OPTIONS]
Options:
Option |
Type |
Description |
---|---|---|
–format <format> |
str |
Specify the output format: |
-f, –filter TEXT |
str |
Apply filter expressions to narrow down the list based on specific criteria. Multiple filters are combined using logical AND. |
–limit INTEGER |
int |
Maximum number of entries to return (default: |
–detail |
bool |
Include detailed information in the output. |
–timeout INTEGER |
int |
Timeout in seconds for the API requests (default: |
–address TEXT |
str |
Address of the Ray API server. If not provided, it is configured automatically. |
Sample Output:
Brief outputs:
$ ray list vclusters
======== List: 2025-01-20 16:50:30.665928 ========
Stats:
------------------------------
Total: 4
Table:
------------------------------
VIRTUAL_CLUSTER_ID DIVISIBLE DIVIDED_CLUSTERS REPLICA_SETS UNDIVIDED_REPLICA_SETS RESOURCES_USAGE
0 kPrimaryClusterID True kPrimaryClusterID##job1: indivisible group0: 2 group0: 1 CPU: 2.0 / 41.0
logical1: divisble group1: 1 group1: 1 memory: 2.000 GiB / 68.931 GiB
group2: 2 object_store_memory: 0.000 B / 23.793 GiB
1 kPrimaryClusterID##job1 False {} group0: 1 group0: 1 CPU: 1.0 / 9.0
memory: 1.000 GiB / 9.327 GiB
object_store_memory: 0.000 B / 4.663 GiB
2 logical1 True logical1##job2: indivisible group2: 2 group2: 1 CPU: 1.0 / 16.0
memory: 1.000 GiB / 29.802 GiB
object_store_memory: 0.000 B / 9.565 GiB
3 logical1##job2 False {} group2: 1 group2: 1 CPU: 1.0 / 8.0
memory: 1.000 GiB / 14.901 GiB
object_store_memory: 0.000 B / 4.783 GiB
Detailed outputs:
$ ray list vclusters --detail
---
- virtual_cluster_id: kPrimaryClusterID
divisible: true
divided_clusters:
logical1: divisble
kPrimaryClusterID##job1: indivisible
replica_sets:
group0: 2
group1: 1
group2: 2
undivided_replica_sets:
group1: 1
group0: 1
resources_usage:
CPU: 2.0 / 41.0
object_store_memory: 0.000 B / 23.793 GiB
memory: 2.000 GiB / 68.931 GiB
visible_node_instances:
fe8e2961e1d7f72c8f9da7bea38ebb650cbee685f541e8ceedb2a8e3:
hostname: arconkube-40-100083029097
template_id: group1
is_dead: false
740273507b09c082c33909e9134ce136d1743e0da1d5b68ec2574988:
hostname: arconkube-40-100083029138
template_id: group0
is_dead: false
3505335a78b9955a1c2ed1de0a0fa92449b8011afddb621b2bab23d5:
hostname: arconkube-40-100083029093
template_id: group0
is_dead: false
undivided_nodes:
fe8e2961e1d7f72c8f9da7bea38ebb650cbee685f541e8ceedb2a8e3:
hostname: arconkube-40-100083029097
template_id: group1
is_dead: false
740273507b09c082c33909e9134ce136d1743e0da1d5b68ec2574988:
hostname: arconkube-40-100083029138
template_id: group0
is_dead: false
Explanation:
Primary Cluster (
kPrimaryClusterID
):Divisible:
true
- can create sub-clusters.Divided Clusters: Includes
kPrimaryClusterID##job1
andlogical1
.Replica Sets: Distribution across
group2
,group1
, andgroup0
.Visible Node Instances: Lists active nodes with their details.
Undivided Nodes: Empty, as all nodes are part of sub-clusters.
Logical Cluster (
logical1
):Divisible:
true
- can be subdivided.Replica Sets & Undivided Replica Sets: Reflects replica distribution.
Visible Node Instances & Undivided Nodes: Lists nodes associated exclusively with this logical cluster.
Filtering Options:
The --filter
flag enables you to narrow down the list of Virtual Clusters based on specific attributes. Multiple --filter
options can be specified, and they are concatenated using logical AND. Filter expressions support predicates such as key=value
or key!=value
, and string filter values are case-insensitive.
Supported Filter Expressions
Divisibility:
divisible=true
: Lists only divisible clusters.divisible=false
: Lists only indivisible clusters.
Virtual Cluster ID:
virtual_cluster_id=vid1
: Retrieves information for the cluster with IDvid1
.
Usage Guidelines
Single Filter:
ray list vclusters --detail --filter "divisible=true"
Multiple Filters:
ray list vclusters --detail --filter "divisible=true" --filter "virtual_cluster_id=kPrimaryClusterID"
Note: Combining multiple filters results in a logical AND operation, meaning only clusters that satisfy all filter conditions will be listed.
Get Specific Virtual Cluster#
Description:
Fetches detailed information about a single Virtual Cluster identified by its virtual_cluster_id
.
Command:
ray get vclusters <virtual_cluster_id> [OPTIONS]
Options:
Option |
Type |
Description |
---|---|---|
–format <format> |
str |
Specify the output format: |
–timeout INTEGER |
int |
Timeout in seconds for the API requests (default: |
–address TEXT |
str |
Address of the Ray API server. If not provided, it is configured automatically. |
Understanding Command Outputs#
Each Virtual Cluster’s information comprises several key fields:
Common Fields
virtual_cluster_id:
A unique identifier for the Virtual Cluster. IDs may include suffixes (e.g., ##job1, ##logical1) indicating Job Clusters with specific job IDs or Logical Clusters.
divisible:
Indicates whether the cluster is Divisible (true) or Indivisible (false).
Divisible Cluster (true): Can be subdivided into Logical Clusters or Job Clusters.
Indivisible Cluster (false): Cannot be subdivided and is used exclusively for hosting user-submitted jobs.
divided_clusters:
Lists sub-clusters that have been subdivided from the parent cluster. This field is empty for Indivisible Clusters.
replica_sets:
Details the distribution of replicas across different template groups within the cluster, excluding any inactive nodes.
undivided_replica_sets:
Similar to replica_sets but specifically for replicas not associated with any sub-cluster.
visible_node_instances:
A dictionary of visible node instances within the cluster, including:
Node ID: Unique identifier for each node.
hostname: Network name of the node.
template_id: Indicates the template group the node belongs to (e.g., group2).
is_dead: Boolean flag indicating node status (false for active, true for inactive or failed).
undivided_nodes:
Visible nodes that are part of the cluster but not associated with any divided sub-cluster.