ray.serve.llm.LLMServer.chat#
- async LLMServer.chat(request: ChatCompletionRequest) AsyncGenerator[ChatCompletionStreamResponse | ChatCompletionResponse | ErrorResponse, None][source]#
 Runs a chat request to the LLM engine and returns the response.
- Parameters:
 request – A ChatCompletionRequest object.
- Returns:
 A LLMChatResponse object.