Skip to content

Install on MacOS

First, you'll need to install Docker Desktop. Then, use the CLI to start the container.

pip install inference-cli
inference server start

Manually Starting the Container

If you want more control of the container settings you can also start it manually:

sudo docker run -d \
    --name inference-server \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    roboflow/roboflow-inference-server-cpu:latest

Apple does not yet support passing the Metal Performance Shader (MPS) device to Docker so hardware acceleration is not possible inside a container on Mac.

Tip

It's easiest to get started with the CPU Docker and switch to running outside of Docker with MPS acceleration later if you need more speed.

We recommend using pyenv and pyenv-virtualenv to manage your Python environments on Mac (especially because, in 2025, homebrew is defaulting to Python 3.13 which is not yet compatible with several of the machine learning dependencies that Inference uses).

Once you have installed and setup pyenv and pyenv-virtualenv (be sure to follow the full instructions for setting up your shell), create and activate an inference virtual environment with Python 3.12:

pyenv install 3.12
pyenv virtualenv 3.12 inference
pyenv activate inference

To install and run the server outside of Docker, clone the repo, install the dependencies, copy cpu_http.py into the top level of the repo, and start the server with uvicorn:

git clone https://github.com/roboflow/inference.git
cd inference
pip install .
cp docker/config/cpu_http.py .
uvicorn cpu_http:app --port 9001 --host 0.0.0.0

Your server is now running at localhost:9001 with MPS acceleration.

Using Your New Server

Once you have a server running, you can access it via its API or using the Python SDK. You can also use it to build Workflows using the Roboflow Platform UI.

Install the SDK

pip install inference-sdk

Run a workflow

This code runs an example model comparison Workflow on an Inference Server running on your local machine:

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001", # use local inference server
    # api_key="<YOUR API KEY>" # optional to access your private data and models
)

result = client.run_workflow(
    workspace_name="roboflow-docs",
    workflow_id="model-comparison",
    images={
        "image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
    },
    parameters={
        "model1": "yolov8n-640",
        "model2": "yolov11n-640"
    }
)

print(result)

From a JavaScript app, hit your new server with an HTTP request.

const response = await fetch('http://localhost:9001/infer/workflows/roboflow-docs/model-comparison', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        // api_key: "<YOUR API KEY>" // optional to access your private data and models
        inputs: {
            "image": {
                "type": "url",
                "value": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
            },
            "model1": "yolov8n-640",
            "model2": "yolov11n-640"
        }
    })
});

const result = await response.json();
console.log(result);

Warning

Be careful not to expose your API Key to external users (in other words: don't use this snippet in a public-facing front-end app).

Using the server's API you can access it from any other client application. From the command line using cURL:

curl -X POST "http://localhost:9001/infer/workflows/roboflow-docs/model-comparison" \
-H "Content-Type: application/json" \
-d '{
    "api_key": "<YOUR API KEY -- REMOVE THIS LINE IF NOT FILLING>",
    "inputs": {
        "image": {
            "type": "url",
            "value": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
        },
        "model1": "yolov8n-640",
        "model2": "yolov11n-640"
    }
}'

Tip

ChatGPT is really good at converting snippets like this into other languages. If you need help, try pasting it in and asking it to translate it to your language of choice.