RayService Quickstart#
Prerequisites#
This guide mainly focuses on the behavior of KubeRay v1.3.0 and Ray 2.41.0.
What’s a RayService?#
A RayService manages these components:
RayCluster: Manages resources in a Kubernetes cluster.
Ray Serve Applications: Manages users’ applications.
What does the RayService provide?#
Kubernetes-native support for Ray clusters and Ray Serve applications: After using a Kubernetes configuration to define a Ray cluster and its Ray Serve applications, you can use
kubectlto create the cluster and its applications.In-place updating for Ray Serve applications: See RayService for more details.
Zero downtime upgrading for Ray clusters: See RayService for more details.
High-availabilable services: See RayService high availability for more details.
Example: Serve two simple Ray Serve applications using RayService#
Step 1: Create a Kubernetes cluster with Kind#
kind create cluster --image=kindest/node:v1.26.0
Step 2: Install the KubeRay operator#
Follow this document to install the latest stable KubeRay operator from the Helm repository.
Note that the YAML file in this example uses serveConfigV2 to specify a multi-application Serve configuration, available starting from KubeRay v0.6.0.
Step 3: Install a RayService#
kubectl apply -f https://raw.githubusercontent.com/ray-project/kuberay/v1.3.0/ray-operator/config/samples/ray-service.sample.yaml
Step 4: Verify the Kubernetes cluster status#
# List all RayService custom resources in the `default` namespace.
kubectl get rayservice
NAME SERVICE STATUS NUM SERVE ENDPOINTS
rayservice-sample Running 2
# List all RayCluster custom resources in the `default` namespace.
kubectl get raycluster
NAME DESIRED WORKERS AVAILABLE WORKERS CPUS MEMORY GPUS STATUS AGE
rayservice-sample-raycluster-czjtm 1 1 2500m 4Gi 0 ready 4m21s
# List all Ray Pods in the `default` namespace.
kubectl get pods -l=ray.io/is-ray-node=yes
NAME READY STATUS RESTARTS AGE
rayservice-sample-raycluster-czjtm-head-ldxl7 1/1 Running 0 4m21s
rayservice-sample-raycluster-czjtm-small-group-worker-pk88k 1/1 Running 0 4m21s
# Check the `Ready` condition of the RayService.
# The RayService is ready to serve requests when the condition is `True`.
# Users can also use `kubectl describe rayservices.ray.io rayservice-sample` to check the condition section
kubectl get rayservice rayservice-sample -o json | jq -r '.status.conditions[] | select(.type=="Ready") | to_entries[] | "\(.key): \(.value)"'
lastTransitionTime: 2025-04-11T16:17:01Z
message: Number of serve endpoints is greater than 0
observedGeneration: 1
reason: NonZeroServeEndpoints
status: True
type: Ready
# List services in the `default` namespace.
kubectl get services -o json | jq -r '.items[].metadata.name'
kuberay-operator
kubernetes
rayservice-sample-head-svc
rayservice-sample-raycluster-czjtm-head-svc
rayservice-sample-serve-svc
When the Ray Serve applications are healthy and ready, KubeRay creates a head service and a Ray Serve service for the RayService custom resource. For example, rayservice-sample-head-svc and rayservice-sample-serve-svc.
What do these services do?
rayservice-sample-head-svc
This service points to the head pod of the active RayCluster and is typically used to view the Ray Dashboard (port8265).rayservice-sample-serve-svc
This service exposes the HTTP interface of Ray Serve, typically on port8000.
Use this service to send HTTP requests to your deployed Serve applications (e.g., REST API, ML inference, etc.).
Step 5: Verify the status of the Serve applications#
# (1) Forward the dashboard port to localhost.
# (2) Check the Serve page in the Ray dashboard at http://localhost:8265/#/serve.
kubectl port-forward svc/rayservice-sample-head-svc 8265:8265 > /dev/null &
Refer to rayservice-troubleshooting.md for more details on RayService observability. Below is a screenshot example of the Serve page in the Ray dashboard.

Step 6: Send requests to the Serve applications by the Kubernetes serve service#
# Step 6.1: Run a curl Pod.
# If you already have a curl Pod, you can use `kubectl exec -it <curl-pod> -- sh` to access the Pod.
kubectl run curl --image=radial/busyboxplus:curl --command -- tail -f /dev/null
# Step 6.3: Send a request to the calculator app.
kubectl exec curl -- curl -sS -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:8000/calc/ -d '["MUL", 3]'
15 pizzas please!
# Step 6.2: Send a request to the fruit stand app.
kubectl exec curl -- curl -sS -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:8000/fruit/ -d '["MANGO", 2]'
6
Step 7: Clean up the Kubernetes cluster#
# Kill the `kubectl port-forward` background job in the earlier step
killall kubectl
kind delete cluster
Next steps#
See RayService document for the full list of RayService features, including in-place update, zero downtime upgrade, and high-availability.
See RayService troubleshooting guide if you encounter any issues.
See Examples for more RayService examples. The MobileNet example is a good example to start with because it doesn’t require GPUs and is easy to run on a local machine.