Skip to content

About Inference Server

The Inference Server is a standalone microservice that wraps the Inference Python package and exposes it over an HTTP API.

Deployment Options

You can self-host Inference Server, or use our hosted APIs:

  • Serverless API — hosted by Roboflow, scales to zero, pay per inference.
  • Dedicated Deployments — hosted by Roboflow, single-tenant VMs with optional GPU.
  • Self-Hosted — run on your own edge hardware (Raspberry Pi, NVIDIA GPU, NVIDIA Jetson...) with Docker
  • Deploy in Your Own Cloud — run on your own cloud infrastructure (AWS, GCP, Azure) with Docker

You can interact with an Inference Server using Inference SDK.

Running with Dockers

Before you begin, ensure that you have Docker installed on your machine. The easiest way to start the Inference Server is with the Inference CLI:

pip install inference-cli && inference server start

This pulls the appropriate Docker image for your machine (with pre-installed dependencies) and starts the Inference Server on port 9001. Check server status:

inference server status

Manually Set Up a Docker Container

inference server start runs docker run under the hood with recommended security settings, caching, and platform-specific options.

If you want to manually start the inference server container, refer to Manually Starting the Container section in your platform's install guide: