Why Ray Serve?

Speed and simplicity are just 2 of the many reasons to consider building your machine learning serving APIs with Ray Serve.

icn

Pythonic API

Configure your model serving declaratively in pure Python, without needing YAML or JSON configs.

icn

Low latency, high throughput

Horizontally scale across hundreds of processes or machines, while keeping the overhead in single-digit milliseconds.

icn

Multi-model composition

Easily compose multiple models, mix model serving with business logic, and independently scale components, without complex microservices.

icn

Framework-agnostic

Use a single tool to serve all types of models — from PyTorch and Tensorflow to scikit-Learn models — and business logic.

icn

FastAPI Integration

Scale an existing FastAPI server easily or define an HTTP interface for your model using its simple, elegant API.

icn

Native GPU support

Using GPUs is as simple as adding one line of Python code. Maximize hardware utilization by sharing CPUs or GPUs between different models.

Try it yourself

Install Ray Serve with pip install "ray[serve]" scikit-learn requests and give this example a try.

import requests

from sklearn.datasets import load_iris
from sklearn.ensemble import GradientBoostingClassifier

from ray import serve

@serve.deployment(route_prefix="/iris")
class BoostingModel:
    def __init__(self, model):
        self.model = model
        self.label_list = iris_dataset["target_names"].tolist()

    async def __call__(self, request):
        payload = await request.json()
        print(f"Received flask request with data {payload}")

        prediction = self.model.predict([payload["vector"]])[0]
        human_name = self.label_list[prediction]
        return {"result": human_name}

if __name__ == "__main__":

    # Train model.
    iris_dataset = load_iris()
    model = GradientBoostingClassifier()
    model.fit(iris_dataset["data"], iris_dataset["target"])

    # Deploy model
    serve.run(BoostingModel.bind(model))

    # Query model
    sample_request_input = {"vector": [1.2, 1.0, 1.1, 0.9]}
    response = requests.get("http://localhost:8000/iris", json=sample_request_input)
    print(response.text)
    
    # prints
    # Result:
    # {"result": "versicolor"}
Code sample background image

See Ray Serve in action

See how companies are using Ray Serve to run their production model serving systems in a fast, reliable, and scalable way.

thumbnail-wildlife

Wildlife Studios

Wildlife Studios serves in-game offers 3X faster, while simultaneously reducing infrastructure spend by 95% cost, with Ray Serve.

Read the case study
thumbnail-ikigai-labs

Ikigai Labs

See how a small team of data scientists built a dynamic, scalable data pipeline service for their users using Ray Serve.

Read the story
thumbnail-widas

WidasConcepts

Learn how German tech services giant built their next-generation identity management platform on top of Ray Serve, running on Kubernetes.

Watch the video

Scale more than just serving

Expand your Ray journey beyond model serving and scale other parts of your machine learning pipeline.

Ray Train

Scalable deep learning

Ray Tune

Scale hyperparameter search

Ray Datasets

Scale data loading and collections use cases

O'Reilly Learning Ray Book

Get your free copy of early release chapters of Learning Ray, the first and only comprehensive book on Ray and its ecosystem, authored by members on the Ray engineering team

Group 5