Install Inference Server¶
Choose the installation method that matches your platform:
The preferred way to use Inference is via Docker (see Why Docker). This works on Linux, macOS, Jetson, and other Docker-capable devices.
Install Docker (and NVIDIA Container Toolkit for GPU acceleration if you have a CUDA-enabled GPU). Then install and run inference-cli :
pip install inference-cli && inference server start
This automatically chooses and configures the optimal container for your machine.
The --dev flag starts a companion Jupyter notebook server with a quickstart guide on
localhost:9002:
inference server start --dev
Download and run the installer to get an Inference Server on Windows — no Docker required.
- Download the latest installer and run it
- When the install finishes, it will offer to launch the Inference Server
- To stop the server, close the terminal window it opens
- To start it again later, find Roboflow Inference in your Start Menu
Download the native app to get an Inference Server on macOS — no Docker required.
- Download the DMG and open it
- Drag the Roboflow Inference app to your Applications folder
- Double-click the app in Applications to start the server
Device-specific documentation¶
Special installation notes and performance tips by device are available. Browse the navigation on the left for detailed install guides:
Using Your New Server¶
Once you have Inference server running, you can access it via its API or using the Python Inference SDK.
Install the Python Inference SDK
pip install inference-sdk
Run an example model comparison Workflow on an Inference Server running on your local machine:
from inference_sdk import InferenceHTTPClient
client = InferenceHTTPClient(
api_url="http://localhost:9001", # use local inference server
# api_key="<YOUR API KEY>" # optional to access your private data and models
)
result = client.run_workflow(
workspace_name="roboflow-docs",
workflow_id="model-comparison",
images={
"image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
},
parameters={
"model1": "rfdetr-small",
"model2": "rfdetr-medium"
}
)
print(result)
From a JavaScript app, hit your new server with an HTTP request.
const response = await fetch('http://localhost:9001/infer/workflows/roboflow-docs/model-comparison', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
// api_key: "<YOUR API KEY>" // optional to access your private data and models
inputs: {
"image": {
"type": "url",
"value": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
},
"model1": "rfdetr-small",
"model2": "rfdetr-medium"
}
})
});
const result = await response.json();
console.log(result);
Warning
Be careful not to expose your API Key to external users (in other words: don't use this snippet in a public-facing front-end app).
Using the server's API you can access it from any other client application. From the command line using cURL:
curl -X POST "http://localhost:9001/infer/workflows/roboflow-docs/model-comparison" \
-H "Content-Type: application/json" \
-d '{
"api_key": "<YOUR API KEY -- REMOVE THIS LINE IF NOT FILLING>",
"inputs": {
"image": {
"type": "url",
"value": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
},
"model1": "rfdetr-small",
"model2": "rfdetr-medium"
}
}'
Tip: AI Coding agents are really good at converting snippets like this into other languages.