ray.serve.llm.LLMServer.embeddings#

async LLMServer.embeddings(request: EmbeddingCompletionRequest) → AsyncGenerator[EmbeddingResponse | ErrorResponse, None][source]#

Runs an embeddings request to the vllm engine, and return the response.