ray.serve.llm.LLMServer.embeddings#
- async LLMServer.embeddings(request: EmbeddingCompletionRequest) AsyncGenerator[EmbeddingResponse | ErrorResponse, None][source]#
 Runs an embeddings request to the vllm engine, and return the response.
- Parameters:
 request – An EmbeddingRequest object.
- Returns:
 A LLMEmbeddingsResponse object.