About Inference Server¶
The Inference Server is a standalone microservice that wraps the Inference Python package and exposes it over an HTTP API.
Deployment Options¶
You can self-host Inference Server, or use our hosted APIs:
- Serverless API — hosted by Roboflow, scales to zero, pay per inference.
- Dedicated Deployments — hosted by Roboflow, single-tenant VMs with optional GPU.
- Self-Hosted — run on your own edge hardware (Raspberry Pi, NVIDIA GPU, NVIDIA Jetson...) with Docker
- Deploy in Your Own Cloud — run on your own cloud infrastructure (AWS, GCP, Azure) with Docker
You can interact with an Inference Server using Inference SDK.
Running with Dockers¶
Before you begin, ensure that you have Docker installed on your machine. The easiest way to start the Inference Server is with the Inference CLI:
pip install inference-cli && inference server start
This pulls the appropriate Docker image for your machine (with pre-installed dependencies) and starts the Inference Server on port 9001. Check server status:
inference server status
Manually Set Up a Docker Container¶
inference server start runs docker run under the hood with recommended security settings, caching, and platform-specific options.
If you want to manually start the inference server container, refer to Manually Starting the Container section in your platform's install guide: