Install Inference Server¶

Choose the installation method that matches your platform:

Docker Windows (Native App) macOS (Native App)

The preferred way to use Inference is via Docker (see Why Docker). This works on Linux, macOS, Jetson, and other Docker-capable devices.

Install Docker (and NVIDIA Container Toolkit for GPU acceleration if you have a CUDA-enabled GPU). Then install and run inference-cli :

pip install inference-cli && inference server start

This automatically chooses and configures the optimal container for your machine.

The --dev flag starts a companion Jupyter notebook server with a quickstart guide on localhost:9002:

inference server start --dev

Download and run the installer to get an Inference Server on Windows — no Docker required.

Download for Windows

Download the latest installer and run it
When the install finishes, it will offer to launch the Inference Server
To stop the server, close the terminal window it opens
To start it again later, find Roboflow Inference in your Start Menu

I need a previous release

Download the native app to get an Inference Server on macOS — no Docker required.

Download for Mac

Download the DMG and open it
Drag the Roboflow Inference app to your Applications folder
Double-click the app in Applications to start the server

I need a previous release

Device-specific documentation¶

Special installation notes and performance tips by device are available. Browse the navigation on the left for detailed install guides:

Using Your New Server¶

Once you have Inference server running, you can access it via its API or using the Python Inference SDK.

Python SDKNode.jsHTTP / cURL

Install the Python Inference SDK

pip install inference-sdk

Run an example model comparison Workflow on an Inference Server running on your local machine:

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001", # use local inference server
    # api_key="<YOUR API KEY>" # optional to access your private data and models
)

result = client.run_workflow(
    workspace_name="roboflow-docs",
    workflow_id="model-comparison",
    images={
        "image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
    },
    parameters={
        "model1": "rfdetr-small",
        "model2": "rfdetr-medium"
    }
)

print(result)

From a JavaScript app, hit your new server with an HTTP request.

const response = await fetch('http://localhost:9001/infer/workflows/roboflow-docs/model-comparison', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        // api_key: "<YOUR API KEY>" // optional to access your private data and models
        inputs: {
            "image": {
                "type": "url",
                "value": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
            },
            "model1": "rfdetr-small",
            "model2": "rfdetr-medium"
        }
    })
});

const result = await response.json();
console.log(result);

Warning

Be careful not to expose your API Key to external users (in other words: don't use this snippet in a public-facing front-end app).

Using the server's API you can access it from any other client application. From the command line using cURL:

curl -X POST "http://localhost:9001/infer/workflows/roboflow-docs/model-comparison" \
-H "Content-Type: application/json" \
-d '{
    "api_key": "<YOUR API KEY -- REMOVE THIS LINE IF NOT FILLING>",
    "inputs": {
        "image": {
            "type": "url",
            "value": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
        },
        "model1": "rfdetr-small",
        "model2": "rfdetr-medium"
    }
}'

Tip: AI Coding agents are really good at converting snippets like this into other languages.