Virtual Cluster Management#
Create a Virtual Cluster#
Description:
Creates a new Virtual Cluster with specified parameters, allowing you to define clusters with specific replica distributions and divisibility settings.
Command:
ray vcluster create [OPTIONS]
Options:
| Option | Type | Default | Required | Description | 
|---|---|---|---|---|
| –address TEXT | str | None | NO | Specifies the Ray cluster address. If not provided, the RAY_ADDRESS environment variable is used. | 
| –id TEXT | str | N/A | YES | Assigns a unique identifier to the Virtual Cluster being created. | 
| –divisible | bool | False | NO | Determines if the Virtual Cluster is divisible into smaller logical or job clusters. | 
| –replica-sets TEXT | dict | N/A | YES | JSON-serialized dictionary defining the replica sets for the cluster (e.g.,  | 
Usage Examples#
Example: Creating a Divisible Virtual Cluster
ray vcluster create --id logical1 --divisible --replica-sets '{"group2":1}'
Output:
Virtual cluster 'logical1' created successfully
Update a Virtual Cluster#
Description:
Update an existing Virtual Cluster with specified parameters.
Command:
ray vcluster update [OPTIONS]
Options:
| Option | Type | Default | Required | Description | 
|---|---|---|---|---|
| –address TEXT | str | None | NO | Specifies the Ray cluster address. If not provided, the RAY_ADDRESS environment variable is used. | 
| –id TEXT | str | N/A | YES | Assigns a unique identifier to the Virtual Cluster being created. | 
| –divisible | bool | False | NO | Determines if the Virtual Cluster is divisible into smaller logical or job clusters. | 
| –replica-sets TEXT | dict | N/A | YES | JSON-serialized dictionary defining the replica sets for the cluster (e.g.,  | 
| –revision INTEGER | int | 0 | NO | Indicates the revision number for updating the Virtual Cluster. | 
Usage Examples#
Example 1: Updating a Divisible Virtual Cluster
ray vcluster update --id logical1 --divisible --replica-sets '{"group2":2}'
Output:
Virtual cluster 'logical1' updated successfully
Example 2: Handling Updating Failure Due to Incorrect Revision
ray vcluster update --id logical1 --divisible --replica-sets '{"group1":2}' --revision 2
Output:
Failed to update virtual cluster 'logical1': The revision (2) is expired, the latest revision of the virtual cluster logical1 is 1736911613521214948
Remove a Virtual Cluster#
Description:
Removes an existing Virtual Cluster by its unique identifier from your Ray environment.
Command:
ray vcluster remove [OPTIONS] <virtual-cluster-id>
Options:
| Option | Type | Default | Required | Description | 
|---|---|---|---|---|
| –address TEXT | str | None | NO | Specifies the Ray cluster address. If not provided, the RAY_ADDRESS environment variable is used. | 
| <virtual-cluster-id> | str | N/A | YES | The unique identifier of the Virtual Cluster to be removed. | 
Usage Example:
Example 1: Removing a Virtual Cluster by ID
ray vcluster remove logical1
Output:
Virtual cluster 'logical1' removed successfully
Example 2: Handling Removal Failure Due to Non-Existent ID
ray vcluster remove unknownCluster
Output:
Failed to remove virtual cluster 'unknownCluster': The logical cluster unknownCluster does not exist.
List Virtual Clusters#
Description:
Displays a summary of all Virtual Clusters in your Ray environment. By default, it presents a table listing each cluster’s ID, divisibility status, and any subdivided clusters. The --detail flag enriches the output with comprehensive information, including replica distributions and node instance statuses. The --format option allows output customization in default, json, yaml, or table formats.
Command:
ray list vclusters [OPTIONS]
Options:
| Option | Type | Description | 
|---|---|---|
| –format <format> | str | Specify the output format:  | 
| -f, –filter TEXT | str | Apply filter expressions to narrow down the list based on specific criteria. Multiple filters are combined using logical AND. | 
| –limit INTEGER | int | Maximum number of entries to return (default:  | 
| –detail | bool | Include detailed information in the output. | 
| –timeout INTEGER | int | Timeout in seconds for the API requests (default:  | 
| –address TEXT | str | Address of the Ray API server. If not provided, it is configured automatically. | 
Sample Output:
- Brief outputs: 
$ ray list vclusters
======== List: 2025-01-20 16:50:30.665928 ========
Stats:
------------------------------
Total: 4
Table:
------------------------------
    VIRTUAL_CLUSTER_ID       DIVISIBLE    DIVIDED_CLUSTERS                      REPLICA_SETS    UNDIVIDED_REPLICA_SETS    RESOURCES_USAGE
 0  kPrimaryClusterID        True         kPrimaryClusterID##job1: indivisible  group0: 2       group0: 1                 CPU: 2.0 / 41.0
                                          logical1: divisble                    group1: 1       group1: 1                 memory: 2.000 GiB / 68.931 GiB
                                                                                group2: 2                                 object_store_memory: 0.000 B / 23.793 GiB
 1  kPrimaryClusterID##job1  False        {}                                    group0: 1       group0: 1                 CPU: 1.0 / 9.0
                                                                                                                          memory: 1.000 GiB / 9.327 GiB
                                                                                                                          object_store_memory: 0.000 B / 4.663 GiB
 2  logical1                 True         logical1##job2: indivisible           group2: 2       group2: 1                 CPU: 1.0 / 16.0
                                                                                                                          memory: 1.000 GiB / 29.802 GiB
                                                                                                                          object_store_memory: 0.000 B / 9.565 GiB
 3  logical1##job2           False        {}                                    group2: 1       group2: 1                 CPU: 1.0 / 8.0
                                                                                                                          memory: 1.000 GiB / 14.901 GiB
                                                                                                                          object_store_memory: 0.000 B / 4.783 GiB
- Detailed outputs: 
$ ray list vclusters --detail
---
-   virtual_cluster_id: kPrimaryClusterID
    divisible: true
    divided_clusters:
        logical1: divisble
        kPrimaryClusterID##job1: indivisible
    replica_sets:
        group0: 2
        group1: 1
        group2: 2
    undivided_replica_sets:
        group1: 1
        group0: 1
    resources_usage:
        CPU: 2.0 / 41.0
        object_store_memory: 0.000 B / 23.793 GiB
        memory: 2.000 GiB / 68.931 GiB
    visible_node_instances:
        fe8e2961e1d7f72c8f9da7bea38ebb650cbee685f541e8ceedb2a8e3:
            hostname: arconkube-40-100083029097
            template_id: group1
            is_dead: false
        740273507b09c082c33909e9134ce136d1743e0da1d5b68ec2574988:
            hostname: arconkube-40-100083029138
            template_id: group0
            is_dead: false
        3505335a78b9955a1c2ed1de0a0fa92449b8011afddb621b2bab23d5:
            hostname: arconkube-40-100083029093
            template_id: group0
            is_dead: false
    undivided_nodes:
        fe8e2961e1d7f72c8f9da7bea38ebb650cbee685f541e8ceedb2a8e3:
            hostname: arconkube-40-100083029097
            template_id: group1
            is_dead: false
        740273507b09c082c33909e9134ce136d1743e0da1d5b68ec2574988:
            hostname: arconkube-40-100083029138
            template_id: group0
            is_dead: false
Explanation:
- Primary Cluster ( - kPrimaryClusterID):- Divisible: - true- can create sub-clusters.
- Divided Clusters: Includes - kPrimaryClusterID##job1and- logical1.
- Replica Sets: Distribution across - group2,- group1, and- group0.
- Visible Node Instances: Lists active nodes with their details. 
- Undivided Nodes: Empty, as all nodes are part of sub-clusters. 
 
- Logical Cluster ( - logical1):- Divisible: - true- can be subdivided.
- Replica Sets & Undivided Replica Sets: Reflects replica distribution. 
- Visible Node Instances & Undivided Nodes: Lists nodes associated exclusively with this logical cluster. 
 
Filtering Options:
The --filter flag enables you to narrow down the list of Virtual Clusters based on specific attributes. Multiple --filter options can be specified, and they are concatenated using logical AND. Filter expressions support predicates such as key=value or key!=value, and string filter values are case-insensitive.
- Supported Filter Expressions - Divisibility: - divisible=true: Lists only divisible clusters.
- divisible=false: Lists only indivisible clusters.
 
- Virtual Cluster ID: - virtual_cluster_id=vid1: Retrieves information for the cluster with ID- vid1.
 
 
- Usage Guidelines - Single Filter: - ray list vclusters --detail --filter "divisible=true"
- Multiple Filters: - ray list vclusters --detail --filter "divisible=true" --filter "virtual_cluster_id=kPrimaryClusterID"
 - Note: Combining multiple filters results in a logical AND operation, meaning only clusters that satisfy all filter conditions will be listed. 
Get Specific Virtual Cluster#
Description:
Fetches detailed information about a single Virtual Cluster identified by its virtual_cluster_id.
Command:
ray get vclusters <virtual_cluster_id> [OPTIONS]
Options:
| Option | Type | Description | 
|---|---|---|
| –format <format> | str | Specify the output format:  | 
| –timeout INTEGER | int | Timeout in seconds for the API requests (default:  | 
| –address TEXT | str | Address of the Ray API server. If not provided, it is configured automatically. | 
Understanding Command Outputs#
Each Virtual Cluster’s information comprises several key fields:
Common Fields
- virtual_cluster_id: - A unique identifier for the Virtual Cluster. IDs may include suffixes (e.g., ##job1, ##logical1) indicating Job Clusters with specific job IDs or Logical Clusters. 
 
- divisible: - Indicates whether the cluster is Divisible (true) or Indivisible (false). - Divisible Cluster (true): Can be subdivided into Logical Clusters or Job Clusters. 
- Indivisible Cluster (false): Cannot be subdivided and is used exclusively for hosting user-submitted jobs. 
 
 
- divided_clusters: - Lists sub-clusters that have been subdivided from the parent cluster. This field is empty for Indivisible Clusters. 
 
- replica_sets: - Details the distribution of replicas across different template groups within the cluster, excluding any inactive nodes. 
 
- undivided_replica_sets: - Similar to replica_sets but specifically for replicas not associated with any sub-cluster. 
 
- visible_node_instances: - A dictionary of visible node instances within the cluster, including: - Node ID: Unique identifier for each node. 
- hostname: Network name of the node. 
- template_id: Indicates the template group the node belongs to (e.g., group2). 
- is_dead: Boolean flag indicating node status (false for active, true for inactive or failed). 
 
 
- undivided_nodes: - Visible nodes that are part of the cluster but not associated with any divided sub-cluster.