Skip to content

Reference Overview

Inference has several components that work together to serve computer vision models. The diagram below shows how they fit together.

  • inference-sdk - Lightweight Python client for communicating with the Inference Server.
  • inference-cli - Command-line tool for managing the Inference Server and running common tasks.
  • Inference Server - HTTP server (Docker) that wraps the inference package as a REST API.
  • inference - Core Python package for model loading, inference, and Workflows execution.

Inference Architecture