ray.serve.llm.LLMServer.chat#

async LLMServer.chat(request: ChatCompletionRequest) → AsyncGenerator[ChatCompletionStreamResponse | ChatCompletionResponse | ErrorResponse, None][source]#

Runs a chat request to the LLM engine and returns the response.