`inference_sdk` API Reference¶

Top-level¶

Top-level SDK configuration: API URLs, timeouts, environment variable loading, and remote execution settings.

inference_sdk.config ¶

Classes¶

InferenceSDKDeprecationWarning ¶

Bases: Warning

Class used for warning of deprecated features in the Inference SDK

Source code in inference_sdk/config.py

class InferenceSDKDeprecationWarning(Warning):
    """Class used for warning of deprecated features in the Inference SDK"""

    pass

RemoteProcessingTimeCollector ¶

Thread-safe collector for GPU processing times from remote execution responses.

A single instance is shared across all threads handling a single request. Each entry stores a model_id alongside the processing time.

Uses threading.Lock (not asyncio.Lock) because add() is only called from synchronous worker threads (ThreadPoolExecutor). The middleware reads via drain() after await call_next() returns, at which point all worker threads have completed — so there is no contention in the async context.

Source code in inference_sdk/config.py

class RemoteProcessingTimeCollector:
    """Thread-safe collector for GPU processing times from remote execution responses.

    A single instance is shared across all threads handling a single request.
    Each entry stores a model_id alongside the processing time.

    Uses threading.Lock (not asyncio.Lock) because add() is only called from
    synchronous worker threads (ThreadPoolExecutor). The middleware reads via
    drain() after await call_next() returns, at which point all worker threads
    have completed — so there is no contention in the async context.
    """

    def __init__(self):
        self._entries: list = []  # list of (model_id, time) tuples
        self._lock = threading.Lock()

    def add(self, processing_time: float, model_id: str = "unknown") -> None:
        with self._lock:
            self._entries.append((model_id, processing_time))

    def drain(self) -> list:
        """Atomically return all entries and clear the internal list."""
        with self._lock:
            entries = self._entries
            self._entries = []
            return entries

    def has_data(self) -> bool:
        with self._lock:
            return len(self._entries) > 0

    def summarize(self, max_detail_bytes: int = 4096) -> Tuple[float, Optional[str]]:
        """Atomically drain entries and return (total_time, entries_json_or_none).

        Returns the total processing time and a JSON string of individual entries.
        If the JSON exceeds max_detail_bytes, the detail string is omitted (None).
        """
        entries = self.drain()
        total = sum(t for _, t in entries)
        detail = json.dumps([{"m": m, "t": t} for m, t in entries])
        if len(detail) > max_detail_bytes:
            detail = None
        return total, detail

Functions¶

drain ¶

drain()

Atomically return all entries and clear the internal list.

Source code in inference_sdk/config.py

def drain(self) -> list:
    """Atomically return all entries and clear the internal list."""
    with self._lock:
        entries = self._entries
        self._entries = []
        return entries

summarize ¶

summarize(max_detail_bytes=4096)

Atomically drain entries and return (total_time, entries_json_or_none).

Returns the total processing time and a JSON string of individual entries. If the JSON exceeds max_detail_bytes, the detail string is omitted (None).

Source code in inference_sdk/config.py

def summarize(self, max_detail_bytes: int = 4096) -> Tuple[float, Optional[str]]:
    """Atomically drain entries and return (total_time, entries_json_or_none).

    Returns the total processing time and a JSON string of individual entries.
    If the JSON exceeds max_detail_bytes, the detail string is omitted (None).
    """
    entries = self.drain()
    total = sum(t for _, t in entries)
    detail = json.dumps([{"m": m, "t": t} for m, t in entries])
    if len(detail) > max_detail_bytes:
        detail = None
    return total, detail

Functions¶

`http`¶

Core HTTP client for making inference requests. InferenceHTTPClient supports object detection, classification, segmentation, keypoint detection, OCR, CLIP embeddings, and workflow execution.

inference_sdk.http.client ¶

Classes¶

InferenceHTTPClient ¶

HTTP client for making inference requests to Roboflow's API.

This client handles authentication, request formatting, and error handling for interacting with Roboflow's inference endpoints. It supports both synchronous and asynchronous requests.

Attributes:

Name	Type	Description
`inference_configuration`	`InferenceConfiguration`	Configuration settings for inference requests.
`client_mode`	`HTTPClientMode`	The API version mode being used (V0 or V1).
`selected_model`	`Optional[str]`	Currently selected model identifier, if any.

Example

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001", # use local inference server
    # api_key="<YOUR API KEY>" # optional to access your private data and models
)

result = client.run_workflow(
    workspace_name="roboflow-docs",
    workflow_id="model-comparison",
    images={
        "image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
    },
    parameters={
        "model1": "yolov8n-640",
        "model2": "yolov11n-640"
    }
)

Source code in inference_sdk/http/client.py

class InferenceHTTPClient:
    """HTTP client for making inference requests to Roboflow's API.

    This client handles authentication, request formatting, and error handling for
    interacting with Roboflow's inference endpoints. It supports both synchronous
    and asynchronous requests.

    Attributes:
        inference_configuration (InferenceConfiguration): Configuration settings for
            inference requests.
        client_mode (HTTPClientMode): The API version mode being used (V0 or V1).
        selected_model (Optional[str]): Currently selected model identifier, if any.

    Example:
        ```python
        from inference_sdk import InferenceHTTPClient

        client = InferenceHTTPClient(
            api_url="http://localhost:9001", # use local inference server
            # api_key="<YOUR API KEY>" # optional to access your private data and models
        )

        result = client.run_workflow(
            workspace_name="roboflow-docs",
            workflow_id="model-comparison",
            images={
                "image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
            },
            parameters={
                "model1": "yolov8n-640",
                "model2": "yolov11n-640"
            }
        )
        ```
    """

    @classmethod
    def init(
        cls,
        api_url: str,
        api_key: Optional[str] = None,
    ) -> "InferenceHTTPClient":
        """Initialize a new InferenceHTTPClient instance.

        Args:
            api_url (str): The base URL for the inference API.
            api_key (Optional[str], optional): API key for authentication. Defaults to None.

        Returns:
            InferenceHTTPClient: A new instance of the InferenceHTTPClient.
        """
        return cls(api_url=api_url, api_key=api_key)

    def __init__(
        self,
        api_url: str,
        api_key: Optional[str] = None,
    ):
        """Initialize a new InferenceHTTPClient instance.

        Args:
            api_url (str): The base URL for the inference API.
            api_key (Optional[str], optional): API key for authentication. Defaults to None.
        """
        self.__api_url = api_url
        self.__api_key = api_key
        self.__inference_configuration = InferenceConfiguration.init_default()
        self.__client_mode = _determine_client_mode(api_url=api_url)
        self.__selected_model: Optional[str] = None
        self.__webrtc_client: Optional["WebRTCClient"] = None

    @property
    def inference_configuration(self) -> InferenceConfiguration:
        """Get the current inference configuration.

        Returns:
            InferenceConfiguration: The current inference configuration settings.
        """
        return self.__inference_configuration

    @property
    def client_mode(self) -> HTTPClientMode:
        """Get the current client mode.

        Returns:
            HTTPClientMode: The current API version mode (V0 or V1).
        """
        return self.__client_mode

    @property
    def selected_model(self) -> Optional[str]:
        """Get the currently selected model identifier.

        Returns:
            Optional[str]: The identifier of the currently selected model, if any.
        """
        return self.__selected_model

    @property
    def webrtc(self) -> "WebRTCClient":
        """Lazy accessor for the WebRTC client namespace.

        Returns:
            WebRTCClient: Namespaced WebRTC API bound to this HTTP client.
        """
        from inference_sdk.webrtc.client import WebRTCClient

        if self.__webrtc_client is None:
            self.__webrtc_client = WebRTCClient(self.__api_url, self.__api_key)
        return self.__webrtc_client

    @contextmanager
    def use_configuration(
        self, inference_configuration: InferenceConfiguration
    ) -> Generator["InferenceHTTPClient", None, None]:
        """Temporarily use a different inference configuration.

        Args:
            inference_configuration (InferenceConfiguration): The temporary configuration to use.

        Yields:
            Generator[InferenceHTTPClient, None, None]: The client instance with temporary configuration.
        """
        previous_configuration = self.__inference_configuration
        self.__inference_configuration = inference_configuration
        try:
            yield self
        finally:
            self.__inference_configuration = previous_configuration

    def configure(
        self, inference_configuration: InferenceConfiguration
    ) -> "InferenceHTTPClient":
        """Configure the client with new inference settings.

        Args:
            inference_configuration (InferenceConfiguration): The new configuration to apply.

        Returns:
            InferenceHTTPClient: The client instance with updated configuration.
        """
        self.__inference_configuration = inference_configuration
        return self

    def select_api_v0(self) -> "InferenceHTTPClient":
        """Select API version 0 for client operations.

        Returns:
            InferenceHTTPClient: The client instance with API v0 selected.
        """
        self.__client_mode = HTTPClientMode.V0
        return self

    def select_api_v1(self) -> "InferenceHTTPClient":
        """Select API version 1 for client operations.

        Returns:
            InferenceHTTPClient: The client instance with API v1 selected.
        """
        self.__client_mode = HTTPClientMode.V1
        return self

    @contextmanager
    def use_api_v0(self) -> Generator["InferenceHTTPClient", None, None]:
        """Temporarily use API version 0 for client operations.

        Yields:
            Generator[InferenceHTTPClient, None, None]: The client instance temporarily using API v0.
        """
        previous_client_mode = self.__client_mode
        self.__client_mode = HTTPClientMode.V0
        try:
            yield self
        finally:
            self.__client_mode = previous_client_mode

    @contextmanager
    def use_api_v1(self) -> Generator["InferenceHTTPClient", None, None]:
        """Temporarily use API version 1 for client operations.

        Yields:
            Generator[InferenceHTTPClient, None, None]: The client instance temporarily using API v1.
        """
        previous_client_mode = self.__client_mode
        self.__client_mode = HTTPClientMode.V1
        try:
            yield self
        finally:
            self.__client_mode = previous_client_mode

    def select_model(self, model_id: str) -> "InferenceHTTPClient":
        """Select a model for inference operations.

        Args:
            model_id (str): The identifier of the model to select.

        Returns:
            InferenceHTTPClient: The client instance with the selected model.
        """
        self.__selected_model = model_id
        return self

    @contextmanager
    def use_model(self, model_id: str) -> Generator["InferenceHTTPClient", None, None]:
        """Temporarily use a specific model for inference operations.

        Args:
            model_id (str): The identifier of the model to use.

        Yields:
            Generator[InferenceHTTPClient, None, None]: The client instance temporarily using the specified model.
        """
        previous_model = self.__selected_model
        self.__selected_model = model_id
        try:
            yield self
        finally:
            self.__selected_model = previous_model

    @wrap_errors
    def get_server_info(self) -> ServerInfo:
        """Get information about the inference server.

        Returns:
            ServerInfo: Information about the server configuration and status.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        response = requests.get(f"{self.__api_url}/info")
        response.raise_for_status()
        response_payload = response.json()
        return ServerInfo.from_dict(response_payload)

    def infer_on_stream(
        self,
        input_uri: str,
        model_id: Optional[str] = None,
    ) -> Generator[Tuple[Union[str, int], np.ndarray, dict], None, None]:
        """Run inference on a video stream or sequence of images.

        Args:
            input_uri (str): URI of the input stream or directory.
            model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

        Yields:
            Generator[Tuple[Union[str, int], np.ndarray, dict], None, None]: Tuples of (frame reference, frame data, prediction).
        """
        for reference, frame in load_stream_inference_input(
            input_uri=input_uri,
            image_extensions=self.__inference_configuration.image_extensions_for_directory_scan,
        ):
            prediction = self.infer(
                inference_input=frame,
                model_id=model_id,
            )
            yield reference, frame, prediction

    @wrap_errors
    def infer(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Run inference on one or more images.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for inference.
            model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

        Returns:
            Union[dict, List[dict]]: Inference results for the input image(s).

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        if self.__client_mode is HTTPClientMode.V0:
            return self.infer_from_api_v0(
                inference_input=inference_input,
                model_id=model_id,
            )
        return self.infer_from_api_v1(
            inference_input=inference_input,
            model_id=model_id,
        )

    @wrap_errors_async
    async def infer_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Run inference asynchronously on one or more images.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for inference.
            model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

        Returns:
            Union[dict, List[dict]]: Inference results for the input image(s).

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        if self.__client_mode is HTTPClientMode.V0:
            return await self.infer_from_api_v0_async(
                inference_input=inference_input,
                model_id=model_id,
            )
        return await self.infer_from_api_v1_async(
            inference_input=inference_input,
            model_id=model_id,
        )

    def infer_from_api_v0(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Run inference using API v0.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for inference.
            model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

        Returns:
            Union[dict, List[dict]]: Inference results for the input image(s).

        Raises:
            ModelNotSelectedError: If no model is selected.
            APIKeyNotProvided: If API key is required but not provided.
            InvalidModelIdentifier: If the model identifier format is invalid.
        """
        requests_data = self._prepare_infer_from_api_v0_request_data(
            inference_input=inference_input,
            model_id=model_id,
        )
        responses = self._execute_infer_from_api_request(
            requests_data=requests_data,
        )
        results = []
        for request_data, response in zip(requests_data, responses):
            if response_contains_jpeg_image(response=response):
                visualisation = transform_visualisation_bytes(
                    visualisation=response.content,
                    expected_format=self.__inference_configuration.output_visualisation_format,
                )
                parsed_response = {"visualization": visualisation}
            else:
                parsed_response = response.json()
                if parsed_response.get("visualization") is not None:
                    parsed_response["visualization"] = transform_base64_visualisation(
                        visualisation=parsed_response["visualization"],
                        expected_format=self.__inference_configuration.output_visualisation_format,
                    )
            parsed_response = adjust_prediction_to_client_scaling_factor(
                prediction=parsed_response,
                scaling_factor=request_data.image_scaling_factors[0],
            )
            results.append(parsed_response)
        return unwrap_single_element_list(sequence=results)

    def _execute_infer_from_api_request(
        self,
        requests_data: List[RequestData],
    ) -> List[Response]:
        responses = execute_requests_packages(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        return responses

    def _prepare_infer_from_api_v0_request_data(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: Optional[str] = None,
    ) -> List[RequestData]:
        model_id_to_be_used = model_id or self.__selected_model
        _ensure_model_is_selected(model_id=model_id_to_be_used)
        _ensure_api_key_provided(api_key=self.__api_key)
        model_id_to_be_used = resolve_roboflow_model_alias(model_id=model_id_to_be_used)
        model_id_chunks = model_id_to_be_used.split("/")
        if len(model_id_chunks) != 2:
            raise InvalidModelIdentifier(
                f"Invalid model id: {model_id}. Expected format: project_id/model_version_id."
            )
        max_height, max_width = _determine_client_downsizing_parameters(
            client_downsizing_disabled=self.__inference_configuration.client_downsizing_disabled,
            model_description=None,
            default_max_input_size=self.__inference_configuration.default_max_input_size,
        )
        encoded_inference_inputs = load_static_inference_input(
            inference_input=inference_input,
            max_height=max_height,
            max_width=max_width,
        )
        params = {
            "api_key": self.__api_key,
        }
        params.update(self.__inference_configuration.to_legacy_call_parameters())

        execution_id_value = execution_id.get()
        headers = DEFAULT_HEADERS
        if execution_id_value:
            headers = headers.copy()
            headers[EXECUTION_ID_HEADER] = execution_id_value

        requests_data = prepare_requests_data(
            url=f"{self.__api_url}/{model_id_chunks[0]}/{model_id_chunks[1]}",
            encoded_inference_inputs=encoded_inference_inputs,
            headers=headers,
            parameters=params,
            payload=None,
            max_batch_size=1,
            image_placement=ImagePlacement.DATA,
        )
        return requests_data

    async def infer_from_api_v0_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Run inference using API v0 asynchronously.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for inference.
            model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

        Returns:
            Union[dict, List[dict]]: Inference results for the input image(s).

        Raises:
            ModelNotSelectedError: If no model is selected.
            APIKeyNotProvided: If API key is required but not provided.
            InvalidModelIdentifier: If the model identifier format is invalid.
        """
        model_id_to_be_used = model_id or self.__selected_model
        _ensure_model_is_selected(model_id=model_id_to_be_used)
        _ensure_api_key_provided(api_key=self.__api_key)
        model_id_to_be_used = resolve_roboflow_model_alias(model_id=model_id_to_be_used)
        model_id_chunks = model_id_to_be_used.split("/")
        if len(model_id_chunks) != 2:
            raise InvalidModelIdentifier(
                f"Invalid model id: {model_id}. Expected format: project_id/model_version_id."
            )
        max_height, max_width = _determine_client_downsizing_parameters(
            client_downsizing_disabled=self.__inference_configuration.client_downsizing_disabled,
            model_description=None,
            default_max_input_size=self.__inference_configuration.default_max_input_size,
        )
        encoded_inference_inputs = await load_static_inference_input_async(
            inference_input=inference_input,
            max_height=max_height,
            max_width=max_width,
        )
        params = {
            "api_key": self.__api_key,
        }
        params.update(self.__inference_configuration.to_legacy_call_parameters())

        execution_id_value = execution_id.get()
        headers = DEFAULT_HEADERS
        if execution_id_value:
            headers = headers.copy()
            headers[EXECUTION_ID_HEADER] = execution_id_value

        requests_data = prepare_requests_data(
            url=f"{self.__api_url}/{model_id_chunks[0]}/{model_id_chunks[1]}",
            encoded_inference_inputs=encoded_inference_inputs,
            headers=headers,
            parameters=params,
            payload=None,
            max_batch_size=1,
            image_placement=ImagePlacement.DATA,
        )
        responses = await execute_requests_packages_async(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        results = []
        for request_data, response in zip(requests_data, responses):
            if not issubclass(type(response), dict):
                visualisation = transform_visualisation_bytes(
                    visualisation=response,
                    expected_format=self.__inference_configuration.output_visualisation_format,
                )
                parsed_response = {"visualization": visualisation}
            else:
                parsed_response = response
                if parsed_response.get("visualization") is not None:
                    parsed_response["visualization"] = transform_base64_visualisation(
                        visualisation=parsed_response["visualization"],
                        expected_format=self.__inference_configuration.output_visualisation_format,
                    )
            parsed_response = adjust_prediction_to_client_scaling_factor(
                prediction=parsed_response,
                scaling_factor=request_data.image_scaling_factors[0],
            )
            results.append(parsed_response)
        return unwrap_single_element_list(sequence=results)

    def infer_from_api_v1(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        requests_data = self._prepare_infer_from_api_v1_request_data(
            inference_input=inference_input,
            model_id=model_id,
        )
        responses = self._execute_infer_from_api_request(
            requests_data=requests_data,
        )
        results = []
        for request_data, response in zip(requests_data, responses):
            parsed_response = response.json()
            if not issubclass(type(parsed_response), list):
                parsed_response = [parsed_response]
            for parsed_response_element, scaling_factor in zip(
                parsed_response, request_data.image_scaling_factors
            ):
                if parsed_response_element.get("visualization") is not None:
                    parsed_response_element["visualization"] = (
                        transform_base64_visualisation(
                            visualisation=parsed_response_element["visualization"],
                            expected_format=self.__inference_configuration.output_visualisation_format,
                        )
                    )
                parsed_response_element = adjust_prediction_to_client_scaling_factor(
                    prediction=parsed_response_element,
                    scaling_factor=scaling_factor,
                )
                results.append(parsed_response_element)
        return unwrap_single_element_list(sequence=results)

    def _prepare_infer_from_api_v1_request_data(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: Optional[str] = None,
    ) -> List[RequestData]:
        self.__ensure_v1_client_mode()
        model_id_to_be_used = model_id or self.__selected_model
        _ensure_model_is_selected(model_id=model_id_to_be_used)
        model_id_to_be_used = resolve_roboflow_model_alias(model_id=model_id_to_be_used)
        model_description = self.get_model_description(model_id=model_id_to_be_used)
        max_height, max_width = _determine_client_downsizing_parameters(
            client_downsizing_disabled=self.__inference_configuration.client_downsizing_disabled,
            model_description=model_description,
            default_max_input_size=self.__inference_configuration.default_max_input_size,
        )
        if model_description.task_type not in NEW_INFERENCE_ENDPOINTS:
            raise ModelTaskTypeNotSupportedError(
                f"Model task {model_description.task_type} is not supported by API v1 client."
            )
        encoded_inference_inputs = load_static_inference_input(
            inference_input=inference_input,
            max_height=max_height,
            max_width=max_width,
        )
        payload = {
            "api_key": self.__api_key,
            "model_id": model_id_to_be_used,
        }
        endpoint = NEW_INFERENCE_ENDPOINTS[model_description.task_type]
        payload.update(
            self.__inference_configuration.to_api_call_parameters(
                client_mode=self.__client_mode,
                task_type=model_description.task_type,
            )
        )
        requests_data = prepare_requests_data(
            url=f"{self.__api_url}{endpoint}",
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=self.__inference_configuration.max_batch_size,
            image_placement=ImagePlacement.JSON,
        )
        return requests_data

    async def infer_from_api_v1_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        self.__ensure_v1_client_mode()
        model_id_to_be_used = model_id or self.__selected_model
        _ensure_model_is_selected(model_id=model_id_to_be_used)
        model_id_to_be_used = resolve_roboflow_model_alias(model_id=model_id_to_be_used)
        model_description = await self.get_model_description_async(
            model_id=model_id_to_be_used
        )
        max_height, max_width = _determine_client_downsizing_parameters(
            client_downsizing_disabled=self.__inference_configuration.client_downsizing_disabled,
            model_description=model_description,
            default_max_input_size=self.__inference_configuration.default_max_input_size,
        )
        if model_description.task_type not in NEW_INFERENCE_ENDPOINTS:
            raise ModelTaskTypeNotSupportedError(
                f"Model task {model_description.task_type} is not supported by API v1 client."
            )
        encoded_inference_inputs = await load_static_inference_input_async(
            inference_input=inference_input,
            max_height=max_height,
            max_width=max_width,
        )
        payload = {
            "api_key": self.__api_key,
            "model_id": model_id_to_be_used,
        }
        endpoint = NEW_INFERENCE_ENDPOINTS[model_description.task_type]
        payload.update(
            self.__inference_configuration.to_api_call_parameters(
                client_mode=self.__client_mode,
                task_type=model_description.task_type,
            )
        )
        requests_data = prepare_requests_data(
            url=f"{self.__api_url}{endpoint}",
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=self.__inference_configuration.max_batch_size,
            image_placement=ImagePlacement.JSON,
        )
        responses = await execute_requests_packages_async(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        results = []
        for request_data, parsed_response in zip(requests_data, responses):
            if not issubclass(type(parsed_response), list):
                parsed_response = [parsed_response]
            for parsed_response_element, scaling_factor in zip(
                parsed_response, request_data.image_scaling_factors
            ):
                if parsed_response_element.get("visualization") is not None:
                    parsed_response_element["visualization"] = (
                        transform_base64_visualisation(
                            visualisation=parsed_response_element["visualization"],
                            expected_format=self.__inference_configuration.output_visualisation_format,
                        )
                    )
                parsed_response_element = adjust_prediction_to_client_scaling_factor(
                    prediction=parsed_response_element,
                    scaling_factor=scaling_factor,
                )
                results.append(parsed_response_element)
        return unwrap_single_element_list(sequence=results)

    def get_model_description(
        self, model_id: str, allow_loading: bool = True
    ) -> ModelDescription:
        """Get the description of a model.

        Args:
            model_id (str): The identifier of the model.
            allow_loading (bool, optional): Whether to load the model if not already loaded. Defaults to True.

        Returns:
            ModelDescription: Description of the model.

        Raises:
            WrongClientModeError: If not in API v1 mode.
            ModelNotInitializedError: If the model is not initialized and cannot be loaded.
        """
        self.__ensure_v1_client_mode()
        de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
        registered_models = self.list_loaded_models()
        matching_model = filter_model_descriptions(
            descriptions=registered_models.models,
            model_id=de_aliased_model_id,
        )
        if matching_model is None and allow_loading is True:
            registered_models = self.load_model(model_id=de_aliased_model_id)
            matching_model = filter_model_descriptions(
                descriptions=registered_models.models,
                model_id=de_aliased_model_id,
            )
        if matching_model is not None:
            return matching_model
        raise ModelNotInitializedError(
            f"Model {model_id} (de-aliased: {de_aliased_model_id}) is not initialised and cannot "
            f"retrieve its description."
        )

    async def get_model_description_async(
        self, model_id: str, allow_loading: bool = True
    ) -> ModelDescription:
        """Get the description of a model asynchronously.

        Args:
            model_id (str): The identifier of the model.
            allow_loading (bool, optional): Whether to load the model if not already loaded. Defaults to True.

        Returns:
            ModelDescription: Description of the model.

        Raises:
            WrongClientModeError: If not in API v1 mode.
            ModelNotInitializedError: If the model is not initialized and cannot be loaded.
        """
        self.__ensure_v1_client_mode()
        de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
        registered_models = await self.list_loaded_models_async()
        matching_model = filter_model_descriptions(
            descriptions=registered_models.models,
            model_id=de_aliased_model_id,
        )
        if matching_model is None and allow_loading is True:
            registered_models = await self.load_model_async(
                model_id=de_aliased_model_id
            )
            matching_model = filter_model_descriptions(
                descriptions=registered_models.models,
                model_id=de_aliased_model_id,
            )
        if matching_model is not None:
            return matching_model
        raise ModelNotInitializedError(
            f"Model {model_id} (de-aliased: {de_aliased_model_id}) is not initialised and cannot "
            f"retrieve its description."
        )

    @wrap_errors
    def list_loaded_models(self) -> RegisteredModels:
        """List all models currently loaded on the server.

        Returns:
            RegisteredModels: Information about registered models.

        Raises:
            WrongClientModeError: If not in API v1 mode.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        self.__ensure_v1_client_mode()
        response = requests.get(
            f"{self.__api_url}/model/registry?api_key={self.__api_key}"
        )
        response.raise_for_status()
        response_payload = response.json()
        return RegisteredModels.from_dict(response_payload)

    @wrap_errors_async
    async def list_loaded_models_async(self) -> RegisteredModels:
        """List all models currently loaded on the server asynchronously.

        Returns:
            RegisteredModels: Information about registered models.

        Raises:
            WrongClientModeError: If not in API v1 mode.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        self.__ensure_v1_client_mode()
        async with aiohttp.ClientSession() as session:
            async with session.get(
                f"{self.__api_url}/model/registry?api_key={self.__api_key}"
            ) as response:
                response.raise_for_status()
                response_payload = await response.json()
                return RegisteredModels.from_dict(response_payload)

    @wrap_errors
    def load_model(
        self, model_id: str, set_as_default: bool = False
    ) -> RegisteredModels:
        """Load a model onto the server.

        Args:
            model_id (str): The identifier of the model to load.
            set_as_default (bool, optional): Whether to set this model as the default. Defaults to False.

        Returns:
            RegisteredModels: Updated information about registered models.

        Raises:
            WrongClientModeError: If not in API v1 mode.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        self.__ensure_v1_client_mode()
        de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
        response = requests.post(
            f"{self.__api_url}/model/add",
            json={
                "model_id": de_aliased_model_id,
                "api_key": self.__api_key,
            },
            headers=DEFAULT_HEADERS,
        )
        response.raise_for_status()
        response_payload = response.json()
        if set_as_default:
            self.__selected_model = de_aliased_model_id
        return RegisteredModels.from_dict(response_payload)

    @wrap_errors_async
    async def load_model_async(
        self, model_id: str, set_as_default: bool = False
    ) -> RegisteredModels:
        """Load a model onto the server asynchronously.

        Args:
            model_id (str): The identifier of the model to load.
            set_as_default (bool, optional): Whether to set this model as the default. Defaults to False.

        Returns:
            RegisteredModels: Updated information about registered models.

        Raises:
            WrongClientModeError: If not in API v1 mode.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        self.__ensure_v1_client_mode()
        de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
        payload = {
            "model_id": de_aliased_model_id,
            "api_key": self.__api_key,
        }
        async with aiohttp.ClientSession() as session:
            async with session.post(
                f"{self.__api_url}/model/add",
                json=payload,
                headers=DEFAULT_HEADERS,
            ) as response:
                response.raise_for_status()
                response_payload = await response.json()
        if set_as_default:
            self.__selected_model = de_aliased_model_id
        return RegisteredModels.from_dict(response_payload)

    @wrap_errors
    def unload_model(self, model_id: str) -> RegisteredModels:
        """Unload a model from the server.

        Args:
            model_id (str): The identifier of the model to unload.

        Returns:
            RegisteredModels: Updated information about registered models.

        Raises:
            WrongClientModeError: If not in API v1 mode.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        self.__ensure_v1_client_mode()
        de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
        response = requests.post(
            f"{self.__api_url}/model/remove",
            json={
                "model_id": de_aliased_model_id,
            },
            headers=DEFAULT_HEADERS,
        )
        response.raise_for_status()
        response_payload = response.json()
        if (
            de_aliased_model_id == self.__selected_model
            or model_id == self.__selected_model
        ):
            self.__selected_model = None
        return RegisteredModels.from_dict(response_payload)

    @wrap_errors_async
    async def unload_model_async(self, model_id: str) -> RegisteredModels:
        self.__ensure_v1_client_mode()
        de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
        async with aiohttp.ClientSession() as session:
            async with session.post(
                f"{self.__api_url}/model/remove",
                json={
                    "model_id": de_aliased_model_id,
                },
                headers=DEFAULT_HEADERS,
            ) as response:
                response.raise_for_status()
                response_payload = await response.json()
        if (
            de_aliased_model_id == self.__selected_model
            or model_id == self.__selected_model
        ):
            self.__selected_model = None
        return RegisteredModels.from_dict(response_payload)

    @wrap_errors
    def unload_all_models(self) -> RegisteredModels:
        self.__ensure_v1_client_mode()
        response = requests.post(f"{self.__api_url}/model/clear")
        response.raise_for_status()
        response_payload = response.json()
        self.__selected_model = None
        return RegisteredModels.from_dict(response_payload)

    @wrap_errors_async
    async def unload_all_models_async(self) -> RegisteredModels:
        self.__ensure_v1_client_mode()
        async with aiohttp.ClientSession() as session:
            async with session.post(f"{self.__api_url}/model/clear") as response:
                response.raise_for_status()
                response_payload = await response.json()
        self.__selected_model = None
        return RegisteredModels.from_dict(response_payload)

    @wrap_errors
    def ocr_image(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model: str = "doctr",
        version: Optional[str] = None,
        quantize: Optional[bool] = None,
        generate_bounding_boxes: Optional[bool] = None,
        language_codes: Optional[List[str]] = None,
    ) -> Union[dict, List[dict]]:
        """Run OCR on input image(s).

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for OCR.
            model (str, optional): OCR model to use ('doctr' or 'trocr'). Defaults to "doctr".
            version (Optional[str], optional): Model version to use. Defaults to None.
                For trocr, supported versions are: 'trocr-small-printed', 'trocr-base-printed', 'trocr-large-printed'.
            quantize: (Optional[bool]): flag of EasyOCR to decide which version of model to load
            generate_bounding_boxes: (Optional[bool]): flag of some models (like DocTR) to decide if output variant
                with sv.Detections(...) compatible bounding boxes should be returned (due to historical reasons, some
                old implementations were flattening detected OCR structure into text and were only returning that as
                results).
            language_codes: (Optional[List[str]]): Parameter of EasyOCR that dictates the code of languages that
                model should recognise (leave blank for default for given OCR model version).
        Returns:
            Union[dict, List[dict]]: OCR results for the input image(s).

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        encoded_inference_inputs = load_static_inference_input(
            inference_input=inference_input,
        )
        payload = self.__initialise_payload()
        if version:
            key = f"{model.lower()}_version_id"
            payload[key] = version
        if quantize is not None:
            payload["quantize"] = quantize
        if generate_bounding_boxes is not None:
            payload["generate_bounding_boxes"] = generate_bounding_boxes
        if language_codes is not None:
            payload["language_codes"] = language_codes
        model_path = resolve_ocr_path(model_name=model)
        url = self.__wrap_url_with_api_key(f"{self.__api_url}{model_path}")
        requests_data = prepare_requests_data(
            url=url,
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=1,
            image_placement=ImagePlacement.JSON,
        )
        responses = execute_requests_packages(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        results = [r.json() for r in responses]
        return unwrap_single_element_list(sequence=results)

    @wrap_errors_async
    async def ocr_image_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model: str = "doctr",
        version: Optional[str] = None,
        quantize: Optional[bool] = None,
        generate_bounding_boxes: Optional[bool] = None,
        language_codes: Optional[List[str]] = None,
    ) -> Union[dict, List[dict]]:
        """Run OCR on input image(s) asynchronously.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for OCR.
            model (str, optional): OCR model to use ('doctr' or 'trocr'). Defaults to "doctr".
            version (Optional[str], optional): Model version to use. Defaults to None.
                For trocr, supported versions are: 'trocr-small-printed', 'trocr-base-printed', 'trocr-large-printed'.
            quantize: (Optional[bool]): flag of EasyOCR to decide which version of model to load
            generate_bounding_boxes: (Optional[bool]): flag of some models (like DocTR) to decide if output variant
                with sv.Detections(...) compatible bounding boxes should be returned (due to historical reasons, some
                old implementations were flattening detected OCR structure into text and were only returning that as
                results).
            language_codes: (Optional[List[str]]): Parameter of EasyOCR that dictates the code of languages that
                model should recognise (leave blank for default for given OCR model version).
        Returns:
            Union[dict, List[dict]]: OCR results for the input image(s).

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        encoded_inference_inputs = await load_static_inference_input_async(
            inference_input=inference_input,
        )
        payload = self.__initialise_payload()
        if version:
            key = f"{model.lower()}_version_id"
            payload[key] = version
        if quantize is not None:
            payload["quantize"] = quantize
        if generate_bounding_boxes is not None:
            payload["generate_bounding_boxes"] = generate_bounding_boxes
        if language_codes is not None:
            payload["language_codes"] = language_codes
        model_path = resolve_ocr_path(model_name=model)
        url = self.__wrap_url_with_api_key(f"{self.__api_url}{model_path}")
        requests_data = prepare_requests_data(
            url=url,
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=1,
            image_placement=ImagePlacement.JSON,
        )
        responses = await execute_requests_packages_async(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        return unwrap_single_element_list(sequence=responses)

    @wrap_errors
    def detect_gazes(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
    ) -> Union[dict, List[dict]]:
        """Detect gazes in input image(s).

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for gaze detection.

        Returns:
            Union[dict, List[dict]]: Gaze detection results for the input image(s).

        Raises:
            WrongClientModeError: If not in API v1 mode.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        self.__ensure_v1_client_mode()  # Lambda does not support Gaze, so we require v1 mode of client
        result = self._post_images(
            inference_input=inference_input, endpoint="/gaze/gaze_detection"
        )
        return combine_gaze_detections(detections=result)

    @wrap_errors_async
    async def detect_gazes_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
    ) -> Union[dict, List[dict]]:
        """Detect gazes in input image(s) asynchronously.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for gaze detection.

        Returns:
            Union[dict, List[dict]]: Gaze detection results for the input image(s).

        Raises:
            WrongClientModeError: If not in API v1 mode.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        self.__ensure_v1_client_mode()  # Lambda does not support Gaze, so we require v1 mode of client
        result = await self._post_images_async(
            inference_input=inference_input, endpoint="/gaze/gaze_detection"
        )
        return combine_gaze_detections(detections=result)

    @wrap_errors
    def get_clip_image_embeddings(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        clip_version: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Get CLIP embeddings for input image(s).

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) to embed.
            clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

        Returns:
            Union[dict, List[dict]]: CLIP embeddings for the input image(s).

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        extra_payload = {}
        if clip_version is not None:
            extra_payload["clip_version_id"] = clip_version
        result = self._post_images(
            inference_input=inference_input,
            endpoint="/clip/embed_image",
            extra_payload=extra_payload,
        )
        result = combine_clip_embeddings(embeddings=result)
        return unwrap_single_element_list(result)

    @wrap_errors_async
    async def get_clip_image_embeddings_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        clip_version: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Get CLIP embeddings for input image(s) asynchronously.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) to embed.
            clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

        Returns:
            Union[dict, List[dict]]: CLIP embeddings for the input image(s).

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        extra_payload = {}
        if clip_version is not None:
            extra_payload["clip_version_id"] = clip_version
        result = await self._post_images_async(
            inference_input=inference_input,
            endpoint="/clip/embed_image",
            extra_payload=extra_payload,
        )
        result = combine_clip_embeddings(embeddings=result)
        return unwrap_single_element_list(result)

    @wrap_errors
    def get_clip_text_embeddings(
        self,
        text: Union[str, List[str]],
        clip_version: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Get CLIP embeddings for input text(s).

        Args:
            text (Union[str, List[str]]): Input text(s) to embed.
            clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

        Returns:
            Union[dict, List[dict]]: CLIP embeddings for the input text(s).

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        payload = self.__initialise_payload()
        payload["text"] = text
        if clip_version is not None:
            payload["clip_version_id"] = clip_version
        headers = DEFAULT_HEADERS.copy()
        execution_id_value = execution_id.get()
        if execution_id_value is not None:
            headers[EXECUTION_ID_HEADER] = execution_id_value

        response = requests.post(
            self.__wrap_url_with_api_key(f"{self.__api_url}/clip/embed_text"),
            json=payload,
            headers=headers,
        )
        _collect_processing_time_from_response(
            response, model_id=clip_version or "clip"
        )
        api_key_safe_raise_for_status(response=response)
        return unwrap_single_element_list(sequence=response.json())

    @wrap_errors_async
    async def get_clip_text_embeddings_async(
        self,
        text: Union[str, List[str]],
        clip_version: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Get CLIP embeddings for input text(s) asynchronously.

        Args:
            text (Union[str, List[str]]): Input text(s) to embed.
            clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

        Returns:
            Union[dict, List[dict]]: CLIP embeddings for the input text(s).

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        payload = self.__initialise_payload()
        payload["text"] = text
        if clip_version is not None:
            payload["clip_version_id"] = clip_version
        async with aiohttp.ClientSession() as session:
            async with session.post(
                self.__wrap_url_with_api_key(f"{self.__api_url}/clip/embed_text"),
                json=payload,
                headers=DEFAULT_HEADERS,
            ) as response:
                response.raise_for_status()
                response_payload = await response.json()
        return unwrap_single_element_list(sequence=response_payload)

    @wrap_errors
    def clip_compare(
        self,
        subject: Union[str, ImagesReference],
        prompt: Union[str, List[str], ImagesReference, List[ImagesReference]],
        subject_type: str = "image",
        prompt_type: str = "text",
        clip_version: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Compare a subject against prompts using CLIP embeddings.

        Args:
            subject (Union[str, ImagesReference]): The subject to compare (image or text).
            prompt (Union[str, List[str], ImagesReference, List[ImagesReference]]): The prompt(s) to compare against.
            subject_type (str, optional): Type of subject ('image' or 'text'). Defaults to "image".
            prompt_type (str, optional): Type of prompt(s) ('image' or 'text'). Defaults to "text".
            clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

        Returns:
            Union[dict, List[dict]]: Comparison results between subject and prompt(s).

        Raises:
            InvalidParameterError: If subject_type or prompt_type is invalid.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        if (
            subject_type not in CLIP_ARGUMENT_TYPES
            or prompt_type not in CLIP_ARGUMENT_TYPES
        ):
            raise InvalidParameterError(
                f"Could not accept `subject_type` and `prompt_type` with values different than {CLIP_ARGUMENT_TYPES}"
            )
        payload = self.__initialise_payload()
        payload["subject_type"] = subject_type
        payload["prompt_type"] = prompt_type
        if clip_version is not None:
            payload["clip_version_id"] = clip_version
        if subject_type == "image":
            encoded_image = load_static_inference_input(
                inference_input=subject,
            )
            payload = inject_images_into_payload(
                payload=payload, encoded_images=encoded_image, key="subject"
            )
        else:
            payload["subject"] = subject
        if prompt_type == "image":
            encoded_inference_inputs = load_static_inference_input(
                inference_input=prompt,
            )
            payload = inject_images_into_payload(
                payload=payload, encoded_images=encoded_inference_inputs, key="prompt"
            )
        else:
            payload["prompt"] = prompt

        headers = DEFAULT_HEADERS.copy()
        execution_id_value = execution_id.get()
        if execution_id_value is not None:
            headers[EXECUTION_ID_HEADER] = execution_id_value

        response = requests.post(
            self.__wrap_url_with_api_key(f"{self.__api_url}/clip/compare"),
            json=payload,
            headers=headers,
        )
        _collect_processing_time_from_response(
            response, model_id=clip_version or "clip"
        )
        api_key_safe_raise_for_status(response=response)
        return response.json()

    @wrap_errors_async
    async def clip_compare_async(
        self,
        subject: Union[str, ImagesReference],
        prompt: Union[str, List[str], ImagesReference, List[ImagesReference]],
        subject_type: str = "image",
        prompt_type: str = "text",
        clip_version: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Compare a subject against prompts using CLIP embeddings asynchronously.

        Args:
            subject (Union[str, ImagesReference]): The subject to compare (image or text).
            prompt (Union[str, List[str], ImagesReference, List[ImagesReference]]): The prompt(s) to compare against.
            subject_type (str, optional): Type of subject ('image' or 'text'). Defaults to "image".
            prompt_type (str, optional): Type of prompt(s) ('image' or 'text'). Defaults to "text".
            clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

        Returns:
            Union[dict, List[dict]]: Comparison results between subject and prompt(s).

        Raises:
            InvalidParameterError: If subject_type or prompt_type is invalid.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        if (
            subject_type not in CLIP_ARGUMENT_TYPES
            or prompt_type not in CLIP_ARGUMENT_TYPES
        ):
            raise InvalidParameterError(
                f"Could not accept `subject_type` and `prompt_type` with values different than {CLIP_ARGUMENT_TYPES}"
            )
        payload = self.__initialise_payload()
        payload["subject_type"] = subject_type
        payload["prompt_type"] = prompt_type
        if clip_version is not None:
            payload["clip_version_id"] = clip_version
        if subject_type == "image":
            encoded_image = await load_static_inference_input_async(
                inference_input=subject,
            )
            payload = inject_images_into_payload(
                payload=payload, encoded_images=encoded_image, key="subject"
            )
        else:
            payload["subject"] = subject
        if prompt_type == "image":
            encoded_inference_inputs = await load_static_inference_input_async(
                inference_input=prompt,
            )
            payload = inject_images_into_payload(
                payload=payload, encoded_images=encoded_inference_inputs, key="prompt"
            )
        else:
            payload["prompt"] = prompt

        async with aiohttp.ClientSession() as session:
            async with session.post(
                self.__wrap_url_with_api_key(f"{self.__api_url}/clip/compare"),
                json=payload,
                headers=DEFAULT_HEADERS,
            ) as response:
                response.raise_for_status()
                return await response.json()

    @wrap_errors
    def get_perception_encoder_image_embeddings(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        perception_encoder_version: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Get Perception Encoder embeddings for input image(s)."""
        extra_payload = {}
        if perception_encoder_version is not None:
            extra_payload["perception_encoder_version_id"] = perception_encoder_version
        result = self._post_images(
            inference_input=inference_input,
            endpoint="/perception_encoder/embed_image",
            extra_payload=extra_payload,
        )
        return unwrap_single_element_list(result)

    @wrap_errors
    def get_perception_encoder_text_embeddings(
        self,
        text: Union[str, List[str]],
        perception_encoder_version: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Get Perception Encoder embeddings for input text(s)."""
        payload = self.__initialise_payload()
        payload["text"] = text
        if perception_encoder_version is not None:
            payload["perception_encoder_version_id"] = perception_encoder_version

        headers = DEFAULT_HEADERS.copy()
        execution_id_value = execution_id.get()
        if execution_id_value is not None:
            headers[EXECUTION_ID_HEADER] = execution_id_value

        response = requests.post(
            self.__wrap_url_with_api_key(
                f"{self.__api_url}/perception_encoder/embed_text"
            ),
            json=payload,
            headers=headers,
        )
        _collect_processing_time_from_response(
            response,
            model_id=perception_encoder_version or "perception_encoder",
        )
        api_key_safe_raise_for_status(response=response)
        return unwrap_single_element_list(sequence=response.json())

    @wrap_errors
    def infer_lmm(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: str,
        prompt: Optional[str] = None,
        model_id_in_path: bool = False,
    ) -> Union[dict, List[dict]]:
        """Run inference using a Large Multimodal Model (LMM).

        This method supports various vision-language models including Florence-2,
        Moondream2, SmolVLM, Qwen2.5-VL, Qwen3-VL, and PaliGemma.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
                for inference. Can be file paths, URLs, base64 strings, numpy arrays, or PIL images.
            model_id (str): The identifier of the LMM model to use. Examples include:
                - "florence-2-base", "florence-2-large" for Florence-2
                - "moondream2/moondream2_2b_jul24" for Moondream2
                - "smolvlm2/smolvlm-2.2b-instruct" for SmolVLM
                - "qwen25-vl-7b" for Qwen2.5-VL
                - "qwen3vl-2b-instruct" for Qwen3-VL
            prompt (Optional[str], optional): Text prompt to guide the model. Defaults to None.
            model_id_in_path (bool, optional): If True, includes model_id in the URL path
                (e.g., /infer/lmm/florence-2-base) which enables path-based routing.
                If False (default), model_id is only sent in the request body.

        Returns:
            Union[dict, List[dict]]: Inference results containing the model response.
                The structure depends on the specific model used.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        extra_payload = {"model_id": model_id}
        if prompt is not None:
            extra_payload["prompt"] = prompt

        if model_id_in_path:
            endpoint = f"/infer/lmm/{model_id}"
        else:
            endpoint = "/infer/lmm"

        result = self._post_images(
            inference_input=inference_input,
            endpoint=endpoint,
            extra_payload=extra_payload,
        )
        return result

    @wrap_errors_async
    async def infer_lmm_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: str,
        prompt: Optional[str] = None,
        model_id_in_path: bool = False,
    ) -> Union[dict, List[dict]]:
        """Run inference using a Large Multimodal Model (LMM) asynchronously.

        This method supports various vision-language models including Florence-2,
        Moondream2, SmolVLM, Qwen2.5-VL, Qwen3-VL, and PaliGemma.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
                for inference. Can be file paths, URLs, base64 strings, numpy arrays, or PIL images.
            model_id (str): The identifier of the LMM model to use.
            prompt (Optional[str], optional): Text prompt to guide the model. Defaults to None.
            model_id_in_path (bool, optional): If True, includes model_id in the URL path
                (e.g., /infer/lmm/florence-2-base) which enables path-based routing.
                If False (default), model_id is only sent in the request body.

        Returns:
            Union[dict, List[dict]]: Inference results containing the model response.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        extra_payload = {"model_id": model_id}
        if prompt is not None:
            extra_payload["prompt"] = prompt

        if model_id_in_path:
            endpoint = f"/infer/lmm/{model_id}"
        else:
            endpoint = "/infer/lmm"

        result = await self._post_images_async(
            inference_input=inference_input,
            endpoint=endpoint,
            extra_payload=extra_payload,
        )
        return result

    @wrap_errors
    def depth_estimation(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: str = "depth-anything-v3/small",
    ) -> Union[dict, List[dict]]:
        """Run depth estimation on input image(s).

        This method estimates depth maps from images using models like Depth Anything.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
                for depth estimation. Can be file paths, URLs, base64 strings, numpy arrays,
                or PIL images.
            model_id (str, optional): The depth estimation model to use. Defaults to
                "depth-anything-v3/small". Supported models include:
                - "depth-anything-v2/small"
                - "depth-anything-v3/small"
                - "depth-anything-v3/base"

        Returns:
            Union[dict, List[dict]]: Depth estimation results containing:
                - normalized_depth: The normalized depth map as a list
                - image: Hex-encoded visualization of the depth map

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        extra_payload = {"model_id": model_id}
        result = self._post_images(
            inference_input=inference_input,
            endpoint="/infer/depth-estimation",
            extra_payload=extra_payload,
        )
        return result

    @wrap_errors_async
    async def depth_estimation_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        model_id: str = "depth-anything-v3/small",
    ) -> Union[dict, List[dict]]:
        """Run depth estimation on input image(s) asynchronously.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
                for depth estimation.
            model_id (str, optional): The depth estimation model to use. Defaults to
                "depth-anything-v3/small".

        Returns:
            Union[dict, List[dict]]: Depth estimation results.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        extra_payload = {"model_id": model_id}
        result = await self._post_images_async(
            inference_input=inference_input,
            endpoint="/infer/depth-estimation",
            extra_payload=extra_payload,
        )
        return result

    @wrap_errors
    def sam2_segment_image(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        prompts: Optional[List[dict]] = None,
        sam2_version_id: str = "hiera_tiny",
        multimask_output: bool = True,
        mask_input_format: str = "json",
    ) -> Union[dict, List[dict]]:
        """Run Segment Anything 2 (SAM2) segmentation on input image(s).

        This method performs instance segmentation using SAM2, which can segment
        objects based on point or box prompts.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
                for segmentation. Can be file paths, URLs, base64 strings, numpy arrays,
                or PIL images.
            prompts (Optional[List[dict]], optional): List of prompt dictionaries. Each prompt
                can contain:
                - "box": {"x": float, "y": float, "width": float, "height": float}
                - "points": [{"x": float, "y": float, "positive": bool}, ...]
                Defaults to None (automatic segmentation).
            sam2_version_id (str, optional): Version of SAM2 model to use. Options are
                "hiera_large", "hiera_small", "hiera_tiny", "hiera_b_plus".
                Defaults to "hiera_tiny".
            multimask_output (bool, optional): Whether to output multiple masks per prompt.
                Defaults to True.
            mask_input_format (str, optional): Format for mask output. Defaults to "json".

        Returns:
            Union[dict, List[dict]]: Segmentation results containing predictions with masks,
                confidence scores, and bounding boxes.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        extra_payload = {
            "sam2_version_id": sam2_version_id,
            "multimask_output": multimask_output,
            "format": mask_input_format,
        }
        if prompts is not None:
            extra_payload["prompts"] = {"prompts": prompts}
        result = self._post_images(
            inference_input=inference_input,
            endpoint="/sam2/segment_image",
            extra_payload=extra_payload,
        )
        return result

    @wrap_errors_async
    async def sam2_segment_image_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        prompts: Optional[List[dict]] = None,
        sam2_version_id: str = "hiera_tiny",
        multimask_output: bool = True,
        mask_input_format: str = "json",
    ) -> Union[dict, List[dict]]:
        """Run Segment Anything 2 (SAM2) segmentation on input image(s) asynchronously.

        Args:
            inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
                for segmentation.
            prompts (Optional[List[dict]], optional): List of prompt dictionaries.
                Defaults to None.
            sam2_version_id (str, optional): Version of SAM2 model. Defaults to "hiera_tiny".
            multimask_output (bool, optional): Whether to output multiple masks. Defaults to True.
            mask_input_format (str, optional): Format for mask output. Defaults to "json".

        Returns:
            Union[dict, List[dict]]: Segmentation results.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        extra_payload = {
            "sam2_version_id": sam2_version_id,
            "multimask_output": multimask_output,
            "format": mask_input_format,
        }
        if prompts is not None:
            extra_payload["prompts"] = {"prompts": prompts}
        result = await self._post_images_async(
            inference_input=inference_input,
            endpoint="/sam2/segment_image",
            extra_payload=extra_payload,
        )
        return result

    @wrap_errors
    def sam3_3d_infer(
        self,
        inference_input: ImagesReference,
        mask_input: Any,
        model_id: str = "sam3-3d-objects",
        *,
        output_meshes: bool = True,
        output_scene: bool = True,
        with_mesh_postprocess: bool = True,
        with_texture_baking: bool = True,
        use_distillations: bool = False,
    ) -> dict:
        """Generate 3D meshes and Gaussian splatting from a 2D image with mask prompts.

        This method uses SAM3 3D to generate 3D representations from 2D images
        with mask prompts.

        Args:
            inference_input (ImagesReference): Input image for 3D generation.
                Can be a file path, URL, base64 string, numpy array, or PIL image.
            mask_input (Any): Mask input in any supported format:
                - Polygon coordinates: [x1, y1, x2, y2, ...]
                - Binary mask (as numpy array or base64)
                - RLE dictionary
                - List of any of the above for multiple masks
            model_id (str, optional): The SAM3 3D model to use. Defaults to "sam3-3d-objects".
            output_meshes (bool, optional): SAM3 3D always outputs object gaussians, and can
                optionally output object meshes if output_meshes is True. Defaults to True.
            output_scene (bool, optional): Output the combined scene reconstruction in
                addition to individual object reconstructions. Defaults to True.
            with_mesh_postprocess (bool, optional): Enable mesh postprocessing. Defaults to True.
            with_texture_baking (bool, optional): Enable texture baking for meshes. Defaults to True.
            use_distillations (bool, optional): Use the distilled versions of the model components.

        Returns:
            dict: Response containing base64-encoded 3D outputs:
                - mesh_glb: Scene mesh in GLB format (base64 encoded) if output_meshes=True, otherwise None.
                - gaussian_ply: Combined Gaussian splatting in PLY format (base64 encoded)
                - objects: List of individual objects, each containing:
                    - mesh_glb: Object mesh (base64) if output_scene=True and output_meshes=True, otherwise None.
                    - gaussian_ply: Object Gaussian (base64) if output_scene=True, otherwise None.
                    - metadata: {"rotation": [...], "translation": [...], "scale": [...]}
                - time: Inference time in seconds

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        encoded_inference_inputs = load_static_inference_input(
            inference_input=inference_input,
        )
        payload = self.__initialise_payload()
        payload["model_id"] = model_id
        payload["mask_input"] = mask_input
        payload["output_meshes"] = output_meshes
        payload["output_scene"] = output_scene
        payload["with_mesh_postprocess"] = with_mesh_postprocess
        payload["with_texture_baking"] = with_texture_baking
        payload["use_distillations"] = use_distillations

        url = self.__wrap_url_with_api_key(f"{self.__api_url}/sam3_3d/infer")
        requests_data = prepare_requests_data(
            url=url,
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=1,
            image_placement=ImagePlacement.JSON,
        )
        responses = execute_requests_packages(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        return responses[0].json()

    @wrap_errors_async
    async def sam3_3d_infer_async(
        self,
        inference_input: ImagesReference,
        mask_input: Any,
        model_id: str = "sam3-3d-objects",
        *,
        output_meshes: bool = True,
        output_scene: bool = True,
        with_mesh_postprocess: bool = True,
        with_texture_baking: bool = True,
        use_distillations: bool = False,
    ) -> dict:
        """Generate 3D meshes and Gaussian splatting from a 2D image asynchronously.

        Args:
            inference_input (ImagesReference): Input image for 3D generation.
            mask_input (Any): Mask input in any supported format.
            model_id (str, optional): The SAM3 3D model to use. Defaults to "sam3-3d-objects".
            output_meshes (bool, optional): SAM3 3D always outputs object gaussians, and can
                optionally output object meshes if output_meshes is True. Defaults to True.
            output_scene (bool, optional): Output the combined scene reconstruction in
                addition to individual object reconstructions. Defaults to True.
            with_mesh_postprocess (bool, optional): Enable mesh postprocessing. Defaults to True.
            with_texture_baking (bool, optional): Enable texture baking for meshes. Defaults to True.
            use_distillations (bool, optional): Use the distilled versions of the model components.

        Returns:
            dict: Response containing base64-encoded 3D outputs:
                - mesh_glb: Scene mesh in GLB format (base64 encoded) if output_meshes=True, otherwise None.
                - gaussian_ply: Combined Gaussian splatting in PLY format (base64 encoded)
                - objects: List of individual objects, each containing:
                    - mesh_glb: Object mesh (base64) if output_scene=True and output_meshes=True, otherwise None.
                    - gaussian_ply: Object Gaussian (base64) if output_scene=True, otherwise None.
                    - metadata: {"rotation": [...], "translation": [...], "scale": [...]}
                - time: Inference time in seconds

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        encoded_inference_inputs = await load_static_inference_input_async(
            inference_input=inference_input,
        )
        payload = self.__initialise_payload()
        payload["model_id"] = model_id
        payload["mask_input"] = mask_input
        payload["output_meshes"] = output_meshes
        payload["output_scene"] = output_scene
        payload["with_mesh_postprocess"] = with_mesh_postprocess
        payload["with_texture_baking"] = with_texture_baking
        payload["use_distillations"] = use_distillations

        url = self.__wrap_url_with_api_key(f"{self.__api_url}/sam3_3d/infer")
        requests_data = prepare_requests_data(
            url=url,
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=1,
            image_placement=ImagePlacement.JSON,
        )
        responses = await execute_requests_packages_async(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        return responses[0]

    @wrap_errors
    def sam3_concept_segment(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        prompts: List[dict],
        model_id: str = "sam3/sam3_final",
        output_prob_thresh: float = 0.5,
        nms_iou_threshold: Optional[float] = None,
        format: str = "polygon",
    ) -> Union[dict, List[dict]]:
        """Run SAM3 promptable concept segmentation (PCS) on input image(s).

        Performs zero-shot instance segmentation using text or visual prompts.

        Args:
            inference_input: Input image(s) for segmentation.
            prompts: List of prompt dicts, each with keys like "type", "text",
                "output_prob_thresh", "boxes", "box_labels".
            model_id: SAM3 model to use. Defaults to "sam3/sam3_final".
            output_prob_thresh: Global confidence threshold. Defaults to 0.5.
            nms_iou_threshold: IoU threshold for cross-prompt NMS. None disables NMS.
            format: Output mask format, "polygon" or "rle". Defaults to "polygon".

        Returns:
            Segmentation results with prompt_results containing predictions.
        """
        extra_payload = {
            "model_id": model_id,
            "prompts": prompts,
            "output_prob_thresh": output_prob_thresh,
            "format": format,
        }
        if nms_iou_threshold is not None:
            extra_payload["nms_iou_threshold"] = nms_iou_threshold
        return self._post_images(
            inference_input=inference_input,
            endpoint="/sam3/concept_segment",
            extra_payload=extra_payload,
        )

    @wrap_errors_async
    async def sam3_concept_segment_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        prompts: List[dict],
        model_id: str = "sam3/sam3_final",
        output_prob_thresh: float = 0.5,
        nms_iou_threshold: Optional[float] = None,
        format: str = "polygon",
    ) -> Union[dict, List[dict]]:
        """Run SAM3 promptable concept segmentation (PCS) asynchronously.

        Args:
            inference_input: Input image(s) for segmentation.
            prompts: List of prompt dicts.
            model_id: SAM3 model to use. Defaults to "sam3/sam3_final".
            output_prob_thresh: Global confidence threshold. Defaults to 0.5.
            nms_iou_threshold: IoU threshold for cross-prompt NMS. None disables NMS.
            format: Output mask format, "polygon" or "rle". Defaults to "polygon".

        Returns:
            Segmentation results with prompt_results containing predictions.
        """
        extra_payload = {
            "model_id": model_id,
            "prompts": prompts,
            "output_prob_thresh": output_prob_thresh,
            "format": format,
        }
        if nms_iou_threshold is not None:
            extra_payload["nms_iou_threshold"] = nms_iou_threshold
        return await self._post_images_async(
            inference_input=inference_input,
            endpoint="/sam3/concept_segment",
            extra_payload=extra_payload,
        )

    @wrap_errors
    def sam3_visual_segment(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        prompts: Optional[List[dict]] = None,
        multimask_output: bool = True,
        mask_input_format: str = "json",
    ) -> Union[dict, List[dict]]:
        """Run SAM3 promptable visual segmentation (PVS) on input image(s).

        Performs instance segmentation using point or box prompts.

        Args:
            inference_input: Input image(s) for segmentation.
            prompts: List of prompt dicts with "box" and/or "points" keys.
                Defaults to None (automatic segmentation).
            multimask_output: Whether to output multiple masks per prompt.
                Defaults to True.
            mask_input_format: Format for mask output. Defaults to "json".

        Returns:
            Segmentation results containing predictions with masks.
        """
        extra_payload = {
            "multimask_output": multimask_output,
            "format": mask_input_format,
        }
        if prompts is not None:
            extra_payload["prompts"] = {"prompts": prompts}
        return self._post_images(
            inference_input=inference_input,
            endpoint="/sam3/visual_segment",
            extra_payload=extra_payload,
        )

    @wrap_errors_async
    async def sam3_visual_segment_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        prompts: Optional[List[dict]] = None,
        multimask_output: bool = True,
        mask_input_format: str = "json",
    ) -> Union[dict, List[dict]]:
        """Run SAM3 promptable visual segmentation (PVS) asynchronously.

        Args:
            inference_input: Input image(s) for segmentation.
            prompts: List of prompt dicts. Defaults to None.
            multimask_output: Whether to output multiple masks. Defaults to True.
            mask_input_format: Format for mask output. Defaults to "json".

        Returns:
            Segmentation results containing predictions with masks.
        """
        extra_payload = {
            "multimask_output": multimask_output,
            "format": mask_input_format,
        }
        if prompts is not None:
            extra_payload["prompts"] = {"prompts": prompts}
        return await self._post_images_async(
            inference_input=inference_input,
            endpoint="/sam3/visual_segment",
            extra_payload=extra_payload,
        )

    @wrap_errors
    def sam3_embed_image(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        image_id: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Generate SAM3 image embeddings.

        Args:
            inference_input: Input image(s) to embed.
            image_id: Optional cache ID for embeddings. Defaults to None.

        Returns:
            Embedding results with image_id and processing time.
        """
        extra_payload = {}
        if image_id is not None:
            extra_payload["image_id"] = image_id
        return self._post_images(
            inference_input=inference_input,
            endpoint="/sam3/embed_image",
            extra_payload=extra_payload if extra_payload else None,
        )

    @wrap_errors_async
    async def sam3_embed_image_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        image_id: Optional[str] = None,
    ) -> Union[dict, List[dict]]:
        """Generate SAM3 image embeddings asynchronously.

        Args:
            inference_input: Input image(s) to embed.
            image_id: Optional cache ID for embeddings. Defaults to None.

        Returns:
            Embedding results with image_id and processing time.
        """
        extra_payload = {}
        if image_id is not None:
            extra_payload["image_id"] = image_id
        return await self._post_images_async(
            inference_input=inference_input,
            endpoint="/sam3/embed_image",
            extra_payload=extra_payload if extra_payload else None,
        )

    @deprecated(
        reason="Please use run_workflow(...) method. This method will be removed end of Q2 2024"
    )
    @wrap_errors
    def infer_from_workflow(
        self,
        workspace_name: Optional[str] = None,
        workflow_name: Optional[str] = None,
        specification: Optional[dict] = None,
        images: Optional[Dict[str, Any]] = None,
        parameters: Optional[Dict[str, Any]] = None,
        excluded_fields: Optional[List[str]] = None,
        use_cache: bool = True,
        enable_profiling: bool = False,
        workflow_version_id: Optional[str] = None,
    ) -> List[Dict[str, Any]]:
        """Run inference using a workflow specification.

        Triggers inference from workflow specification at the inference HTTP
        side. Either (`workspace_name` and `workflow_name`) or `workflow_specification` must be
        provided. In the first case - definition of workflow will be fetched
        from Roboflow API, in the latter - `workflow_specification` will be
        used. `images` and `parameters` will be merged into workflow inputs,
        the distinction is made to make sure the SDK can easily serialise
        images and prepare a proper payload. Supported images are numpy arrays,
        PIL.Image and base64 images, links to images and local paths.
        `excluded_fields` will be added to request to filter out results
        of workflow execution at the server side.

        Args:
            workspace_name (Optional[str], optional): Name of the workspace containing the workflow. Defaults to None.
            workflow_name (Optional[str], optional): Name of the workflow. Defaults to None.
            specification (Optional[dict], optional): Direct workflow specification. Defaults to None.
            images (Optional[Dict[str, Any]], optional): Images to process. Defaults to None.
            parameters (Optional[Dict[str, Any]], optional): Additional parameters for the workflow. Defaults to None.
            excluded_fields (Optional[List[str]], optional): Fields to exclude from results. Defaults to None.
            use_cache (bool, optional): Whether to use cached results. Defaults to True.
            enable_profiling (bool, optional): Whether to enable profiling. Defaults to False.

        Returns:
            List[Dict[str, Any]]: Results of the workflow execution.

        Raises:
            InvalidParameterError: If neither workflow identifiers nor specification is provided.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        return self._run_workflow(
            workspace_name=workspace_name,
            workflow_id=workflow_name,
            specification=specification,
            images=images,
            parameters=parameters,
            excluded_fields=excluded_fields,
            legacy_endpoints=True,
            use_cache=use_cache,
            enable_profiling=enable_profiling,
            workflow_version_id=workflow_version_id,
        )

    @wrap_errors
    def run_workflow(
        self,
        workspace_name: Optional[str] = None,
        workflow_id: Optional[str] = None,
        specification: Optional[dict] = None,
        images: Optional[Dict[str, Any]] = None,
        parameters: Optional[Dict[str, Any]] = None,
        excluded_fields: Optional[List[str]] = None,
        use_cache: bool = True,
        enable_profiling: bool = False,
        workflow_version_id: Optional[str] = None,
    ) -> List[Dict[str, Any]]:
        """Run inference using a workflow specification.

        Triggers inference from workflow specification at the inference HTTP
        side. Either (`workspace_name` and `workflow_id`) or `workflow_specification` must be
        provided. In the first case - definition of workflow will be fetched
        from Roboflow API, in the latter - `workflow_specification` will be
        used. `images` and `parameters` will be merged into workflow inputs,
        the distinction is made to make sure the SDK can easily serialise
        images and prepare a proper payload. Supported images are numpy arrays,
        PIL.Image and base64 images, links to images and local paths.
        `excluded_fields` will be added to request to filter out results
        of workflow execution at the server side.

        **Important!**
        Method is not compatible with inference server <=0.9.18. Please migrate to newer version of
        the server before end of Q2 2024. Until that is done - use old method: infer_from_workflow(...).

        Note:
            Method is not compatible with inference server <=0.9.18. Please migrate to newer version of
            the server before end of Q2 2024. Until that is done - use old method: infer_from_workflow(...).

        Args:
            workspace_name (Optional[str], optional): Name of the workspace containing the workflow. Defaults to None.
            workflow_id (Optional[str], optional): ID of the workflow. Defaults to None.
            specification (Optional[dict], optional): Direct workflow specification. Defaults to None.
            images (Optional[Dict[str, Any]], optional): Images to process. Defaults to None.
            parameters (Optional[Dict[str, Any]], optional): Additional parameters for the workflow. Defaults to None.
            excluded_fields (Optional[List[str]], optional): Fields to exclude from results. Defaults to None.
            use_cache (bool, optional): Whether to use cached results. Defaults to True.
            enable_profiling (bool, optional): Whether to enable profiling. Defaults to False.

        Returns:
            List[Dict[str, Any]]: Results of the workflow execution.

        Raises:
            InvalidParameterError: If neither workflow identifiers nor specification is provided.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        return self._run_workflow(
            workspace_name=workspace_name,
            workflow_id=workflow_id,
            specification=specification,
            images=images,
            parameters=parameters,
            excluded_fields=excluded_fields,
            legacy_endpoints=False,
            use_cache=use_cache,
            enable_profiling=enable_profiling,
            workflow_version_id=workflow_version_id,
        )

    def _run_workflow(
        self,
        workspace_name: Optional[str] = None,
        workflow_id: Optional[str] = None,
        specification: Optional[dict] = None,
        images: Optional[Dict[str, Any]] = None,
        parameters: Optional[Dict[str, Any]] = None,
        excluded_fields: Optional[List[str]] = None,
        legacy_endpoints: bool = False,
        use_cache: bool = True,
        enable_profiling: bool = False,
        workflow_version_id: Optional[str] = None,
    ) -> List[Dict[str, Any]]:
        response = self._execute_workflow_request(
            workspace_name=workspace_name,
            workflow_id=workflow_id,
            specification=specification,
            images=images,
            parameters=parameters,
            excluded_fields=excluded_fields,
            legacy_endpoints=legacy_endpoints,
            use_cache=use_cache,
            enable_profiling=enable_profiling,
            workflow_version_id=workflow_version_id,
        )
        response_data = response.json()
        workflow_outputs = response_data["outputs"]
        profiler_trace = response_data.get("profiler_trace", [])
        if enable_profiling:
            save_workflows_profiler_trace(
                directory=self.__inference_configuration.profiling_directory,
                profiler_trace=profiler_trace,
            )
        return decode_workflow_outputs(
            workflow_outputs=workflow_outputs,
            expected_format=self.__inference_configuration.output_visualisation_format,
        )

    def _execute_workflow_request(
        self,
        workspace_name: Optional[str] = None,
        workflow_id: Optional[str] = None,
        specification: Optional[dict] = None,
        images: Optional[Dict[str, Any]] = None,
        parameters: Optional[Dict[str, Any]] = None,
        excluded_fields: Optional[List[str]] = None,
        legacy_endpoints: bool = False,
        use_cache: bool = True,
        enable_profiling: bool = False,
        workflow_version_id: Optional[str] = None,
    ) -> Response:
        named_workflow_specified = (workspace_name is not None) and (
            workflow_id is not None
        )
        if not (named_workflow_specified != (specification is not None)):
            raise InvalidParameterError(
                "Parameters (`workspace_name`, `workflow_id` / `workflow_name`) can be used mutually exclusive with "
                "`specification`, but at least one must be set."
            )
        if images is None:
            images = {}
        if parameters is None:
            parameters = {}
        payload = {
            "api_key": self.__api_key,
            "use_cache": use_cache,
            "enable_profiling": enable_profiling,
        }
        inputs = {}
        for image_name, image in images.items():
            loaded_image = load_nested_batches_of_inference_input(
                inference_input=image,
            )
            inject_nested_batches_of_images_into_payload(
                payload=inputs,
                encoded_images=loaded_image,
                key=image_name,
            )
        inputs.update(parameters)
        payload["inputs"] = inputs
        if excluded_fields is not None:
            payload["excluded_fields"] = excluded_fields
        if specification is not None:
            payload["specification"] = specification
        if specification is not None:
            if legacy_endpoints:
                url = f"{self.__api_url}/infer/workflows"
            else:
                url = f"{self.__api_url}/workflows/run"
        else:
            if workflow_version_id is not None:
                payload["workflow_version_id"] = workflow_version_id
            if legacy_endpoints:
                url = f"{self.__api_url}/infer/workflows/{workspace_name}/{workflow_id}"
            else:
                url = f"{self.__api_url}/{workspace_name}/workflows/{workflow_id}"
        response = send_post_request(
            url=url,
            payload=payload,
            headers=DEFAULT_HEADERS,
            enable_retries=self.__inference_configuration.workflow_run_retries_enabled,
        )
        return response

    @wrap_errors
    def infer_from_yolo_world(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        class_names: List[str],
        model_version: Optional[str] = None,
        confidence: Optional[float] = None,
    ) -> List[dict]:
        """Run inference using YOLO-World model.

        Args:
            inference_input: Input image(s) to run inference on. Can be a single image
                reference or a list of image references.
            class_names: List of class names to detect in the image(s).
            model_version: Optional version of YOLO-World model to use. If not specified,
                uses the default version.
            confidence: Optional confidence threshold for detections. If not specified,
                uses the model's default threshold.

        Returns:
            List of dictionaries containing detection results for each input image.
            Each dictionary contains bounding boxes, class labels, and confidence scores
            for detected objects.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        encoded_inference_inputs = load_static_inference_input(
            inference_input=inference_input,
        )
        payload = self.__initialise_payload()
        payload["text"] = class_names
        if model_version is not None:
            payload["yolo_world_version_id"] = model_version
        if confidence is not None:
            payload["confidence"] = confidence
        url = self.__wrap_url_with_api_key(f"{self.__api_url}/yolo_world/infer")
        requests_data = prepare_requests_data(
            url=url,
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=1,
            image_placement=ImagePlacement.JSON,
        )
        responses = execute_requests_packages(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        return [r.json() for r in responses]

    @wrap_errors_async
    async def infer_from_yolo_world_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        class_names: List[str],
        model_version: Optional[str] = None,
        confidence: Optional[float] = None,
    ) -> List[dict]:
        """Run inference using YOLO-World model asynchronously.

        Args:
            inference_input: Input image(s) to run inference on. Can be a single image
                reference or a list of image references.
            class_names: List of class names to detect in the image(s).
            model_version: Optional version of YOLO-World model to use. If not specified,
                uses the default version.
            confidence: Optional confidence threshold for detections. If not specified,
                uses the model's default threshold.

        Returns:
            List of dictionaries containing detection results for each input image.
            Each dictionary contains bounding boxes, class labels, and confidence scores
            for detected objects.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        encoded_inference_inputs = await load_static_inference_input_async(
            inference_input=inference_input,
        )
        payload = self.__initialise_payload()
        payload["text"] = class_names
        if model_version is not None:
            payload["yolo_world_version_id"] = model_version
        if confidence is not None:
            payload["confidence"] = confidence
        url = self.__wrap_url_with_api_key(f"{self.__api_url}/yolo_world/infer")
        requests_data = prepare_requests_data(
            url=url,
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=1,
            image_placement=ImagePlacement.JSON,
        )
        return await execute_requests_packages_async(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )

    @experimental(
        info="Video processing in inference server is under development. Breaking changes are possible."
    )
    @wrap_errors
    def start_inference_pipeline_with_workflow(
        self,
        video_reference: Union[str, int, List[Union[str, int]]],
        workflow_specification: Optional[dict] = None,
        workspace_name: Optional[str] = None,
        workflow_id: Optional[str] = None,
        image_input_name: str = "image",
        workflows_parameters: Optional[Dict[str, Any]] = None,
        workflows_thread_pool_workers: int = 4,
        cancel_thread_pool_tasks_on_exit: bool = True,
        video_metadata_input_name: str = "video_metadata",
        max_fps: Optional[Union[float, int]] = None,
        source_buffer_filling_strategy: Optional[BufferFillingStrategy] = "DROP_OLDEST",
        source_buffer_consumption_strategy: Optional[
            BufferConsumptionStrategy
        ] = "EAGER",
        video_source_properties: Optional[Dict[str, float]] = None,
        batch_collection_timeout: Optional[float] = None,
        results_buffer_size: int = 64,
    ) -> dict:
        """Starts an inference pipeline using a workflow specification.

        Args:
            video_reference: Path to video file, camera index, or list of video sources.
                Can be a string path, integer camera index, or list of either.
            workflow_specification: Optional workflow specification dictionary. Mutually
                exclusive with workspace_name/workflow_id.
            workspace_name: Optional name of workspace containing workflow. Must be used
                with workflow_id.
            workflow_id: Optional ID of workflow to use. Must be used with workspace_name.
            image_input_name: Name of the image input node in workflow. Defaults to "image".
            workflows_parameters: Optional parameters to pass to workflow.
            workflows_thread_pool_workers: Number of worker threads for workflow execution.
                Defaults to 4.
            cancel_thread_pool_tasks_on_exit: Whether to cancel pending tasks when exiting.
                Defaults to True.
            video_metadata_input_name: Name of video metadata input in workflow.
                Defaults to "video_metadata".
            max_fps: Optional maximum FPS to process video at.
            source_buffer_filling_strategy: Strategy for filling source buffer when full.
                One of: "WAIT", "DROP_OLDEST", "ADAPTIVE_DROP_OLDEST", "DROP_LATEST",
                "ADAPTIVE_DROP_LATEST". Defaults to "DROP_OLDEST".
            source_buffer_consumption_strategy: Strategy for consuming from source buffer.
                One of: "LAZY", "EAGER". Defaults to "EAGER".
            video_source_properties: Optional dictionary of video source properties.
            batch_collection_timeout: Optional timeout for batch collection in seconds.
            results_buffer_size: Size of results buffer. Defaults to 64.

        Returns:
            dict: Response containing pipeline initialization details.

        Raises:
            InvalidParameterError: If workflow specification parameters are invalid.
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        named_workflow_specified = (workspace_name is not None) and (
            workflow_id is not None
        )
        if not (named_workflow_specified != (workflow_specification is not None)):
            raise InvalidParameterError(
                "Parameters (`workspace_name`, `workflow_id`) can be used mutually exclusive with "
                "`workflow_specification`, but at least one must be set."
            )
        payload = {
            "api_key": self.__api_key,
            "video_configuration": {
                "type": "VideoConfiguration",
                "video_reference": video_reference,
                "max_fps": max_fps,
                "source_buffer_filling_strategy": source_buffer_filling_strategy,
                "source_buffer_consumption_strategy": source_buffer_consumption_strategy,
                "video_source_properties": video_source_properties,
                "batch_collection_timeout": batch_collection_timeout,
            },
            "processing_configuration": {
                "type": "WorkflowConfiguration",
                "workflow_specification": workflow_specification,
                "workspace_name": workspace_name,
                "workflow_id": workflow_id,
                "image_input_name": image_input_name,
                "workflows_parameters": workflows_parameters,
                "workflows_thread_pool_workers": workflows_thread_pool_workers,
                "cancel_thread_pool_tasks_on_exit": cancel_thread_pool_tasks_on_exit,
                "video_metadata_input_name": video_metadata_input_name,
            },
            "sink_configuration": {
                "type": "MemorySinkConfiguration",
                "results_buffer_size": results_buffer_size,
            },
        }
        response = requests.post(
            f"{self.__api_url}/inference_pipelines/initialise",
            json=payload,
        )
        response.raise_for_status()
        return response.json()

    @experimental(
        info="Video processing in inference server is under development. Breaking changes are possible."
    )
    @wrap_errors
    def list_inference_pipelines(self) -> List[dict]:
        """Lists all active inference pipelines on the server.

        This method retrieves information about all currently running inference pipelines
        on the server, including their IDs and status.

        Returns:
            List[dict]: A list of dictionaries containing information about each active
                inference pipeline.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
        """
        payload = {"api_key": self.__api_key}
        response = requests.get(
            f"{self.__api_url}/inference_pipelines/list",
            json=payload,
        )
        api_key_safe_raise_for_status(response=response)
        return response.json()

    @experimental(
        info="Video processing in inference server is under development. Breaking changes are possible."
    )
    @wrap_errors
    def get_inference_pipeline_status(self, pipeline_id: str) -> dict:
        """Gets the current status of a specific inference pipeline.

        Args:
            pipeline_id: The unique identifier of the inference pipeline to check.

        Returns:
            dict: A dictionary containing the current status and details of the pipeline.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
            ValueError: If pipeline_id is empty or None.
        """
        self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
        payload = {"api_key": self.__api_key}
        response = requests.get(
            f"{self.__api_url}/inference_pipelines/{pipeline_id}/status",
            json=payload,
        )
        api_key_safe_raise_for_status(response=response)
        return response.json()

    @experimental(
        info="Video processing in inference server is under development. Breaking changes are possible."
    )
    @wrap_errors
    def pause_inference_pipeline(self, pipeline_id: str) -> dict:
        """Pauses a running inference pipeline.

        Sends a request to pause the specified inference pipeline. The pipeline must be
        currently running for this operation to succeed.

        Args:
            pipeline_id: The unique identifier of the inference pipeline to pause.

        Returns:
            dict: A dictionary containing the response from the server about the pause operation.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
            ValueError: If pipeline_id is empty or None.
        """
        self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
        payload = {"api_key": self.__api_key}
        response = requests.post(
            f"{self.__api_url}/inference_pipelines/{pipeline_id}/pause",
            json=payload,
        )
        api_key_safe_raise_for_status(response=response)
        return response.json()

    @experimental(
        info="Video processing in inference server is under development. Breaking changes are possible."
    )
    @wrap_errors
    def resume_inference_pipeline(self, pipeline_id: str) -> dict:
        """Resumes a paused inference pipeline.

        Sends a request to resume the specified inference pipeline. The pipeline must be
        currently paused for this operation to succeed.

        Args:
            pipeline_id: The unique identifier of the inference pipeline to resume.

        Returns:
            dict: A dictionary containing the response from the server about the resume operation.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
            ValueError: If pipeline_id is empty or None.
        """
        self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
        payload = {"api_key": self.__api_key}
        response = requests.post(
            f"{self.__api_url}/inference_pipelines/{pipeline_id}/resume",
            json=payload,
        )
        api_key_safe_raise_for_status(response=response)
        return response.json()

    @experimental(
        info="Video processing in inference server is under development. Breaking changes are possible."
    )
    @wrap_errors
    def terminate_inference_pipeline(self, pipeline_id: str) -> dict:
        """Terminates a running inference pipeline.

        Sends a request to terminate the specified inference pipeline. This will stop all
        processing and free up associated resources.

        Args:
            pipeline_id: The unique identifier of the inference pipeline to terminate.

        Returns:
            dict: A dictionary containing the response from the server about the termination operation.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
            ValueError: If pipeline_id is empty or None.
        """
        self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
        payload = {"api_key": self.__api_key}
        response = requests.post(
            f"{self.__api_url}/inference_pipelines/{pipeline_id}/terminate",
            json=payload,
        )
        api_key_safe_raise_for_status(response=response)
        return response.json()

    @experimental(
        info="Video processing in inference server is under development. Breaking changes are possible."
    )
    @wrap_errors
    def consume_inference_pipeline_result(
        self,
        pipeline_id: str,
        excluded_fields: Optional[List[str]] = None,
    ) -> dict:
        """Consumes and returns the next available result from an inference pipeline.

        Args:
            pipeline_id: The unique identifier of the inference pipeline to consume results from.
            excluded_fields: Optional list of field names to exclude from the result. If None,
                no fields will be excluded.

        Returns:
            dict: A dictionary containing the next available result from the pipeline.

        Raises:
            HTTPCallErrorError: If there is an error in the HTTP call.
            HTTPClientError: If there is an error with the server connection.
            InvalidParameterError: If pipeline_id is empty or None.
        """
        self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
        if excluded_fields is None:
            excluded_fields = []
        payload = {"api_key": self.__api_key, "excluded_fields": excluded_fields}
        response = requests.get(
            f"{self.__api_url}/inference_pipelines/{pipeline_id}/consume",
            json=payload,
        )
        api_key_safe_raise_for_status(response=response)
        return response.json()

    def _ensure_pipeline_id_not_empty(self, pipeline_id: str) -> None:
        if not pipeline_id:
            raise InvalidParameterError("Empty `pipeline_id` parameter detected")

    def _post_images(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        endpoint: str,
        model_id: Optional[str] = None,
        extra_payload: Optional[Dict[str, Any]] = None,
    ) -> Union[dict, List[dict]]:
        encoded_inference_inputs = load_static_inference_input(
            inference_input=inference_input,
        )
        payload = self.__initialise_payload()
        if model_id is not None:
            payload["model_id"] = model_id
        url = self.__wrap_url_with_api_key(f"{self.__api_url}{endpoint}")
        if extra_payload is not None:
            payload.update(extra_payload)
        requests_data = prepare_requests_data(
            url=url,
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=self.__inference_configuration.max_batch_size,
            image_placement=ImagePlacement.JSON,
        )
        responses = execute_requests_packages(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        results = [r.json() for r in responses]
        return unwrap_single_element_list(sequence=results)

    async def _post_images_async(
        self,
        inference_input: Union[ImagesReference, List[ImagesReference]],
        endpoint: str,
        model_id: Optional[str] = None,
        extra_payload: Optional[Dict[str, Any]] = None,
    ) -> Union[dict, List[dict]]:
        encoded_inference_inputs = await load_static_inference_input_async(
            inference_input=inference_input,
        )
        payload = self.__initialise_payload()
        if model_id is not None:
            payload["model_id"] = model_id
        url = self.__wrap_url_with_api_key(f"{self.__api_url}{endpoint}")
        if extra_payload is not None:
            payload.update(extra_payload)
        requests_data = prepare_requests_data(
            url=url,
            encoded_inference_inputs=encoded_inference_inputs,
            headers=DEFAULT_HEADERS,
            parameters=None,
            payload=payload,
            max_batch_size=self.__inference_configuration.max_batch_size,
            image_placement=ImagePlacement.JSON,
        )
        responses = await execute_requests_packages_async(
            requests_data=requests_data,
            request_method=RequestMethod.POST,
            max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
        )
        return unwrap_single_element_list(sequence=responses)

    def __initialise_payload(self) -> dict:
        if self.__client_mode is not HTTPClientMode.V0:
            return {"api_key": self.__api_key}
        return {}

    def __wrap_url_with_api_key(self, url: str) -> str:
        if self.__client_mode is not HTTPClientMode.V0:
            return url
        return f"{url}?api_key={self.__api_key}"

    def __ensure_v1_client_mode(self) -> None:
        if self.__client_mode is not HTTPClientMode.V1:
            raise WrongClientModeError("Use client mode `v1` to run this operation.")

Attributes¶

client_mode `property` ¶

client_mode

Get the current client mode.

Returns:

Name	Type	Description
`HTTPClientMode`	`HTTPClientMode`	The current API version mode (V0 or V1).

inference_configuration `property` ¶

inference_configuration

Get the current inference configuration.

Returns:

Name	Type	Description
`InferenceConfiguration`	`InferenceConfiguration`	The current inference configuration settings.

selected_model `property` ¶

selected_model

Get the currently selected model identifier.

Returns:

Type	Description
`Optional[str]`	Optional[str]: The identifier of the currently selected model, if any.

webrtc `property` ¶

webrtc

Lazy accessor for the WebRTC client namespace.

Returns:

Name	Type	Description
`WebRTCClient`	`WebRTCClient`	Namespaced WebRTC API bound to this HTTP client.

Functions¶

init ¶

__init__(api_url, api_key=None)

Initialize a new InferenceHTTPClient instance.

Parameters:

Name	Type	Description	Default
`api_url`	`str`	The base URL for the inference API.	required
`api_key`	`Optional[str]`	API key for authentication. Defaults to None.	`None`

Source code in inference_sdk/http/client.py

def __init__(
    self,
    api_url: str,
    api_key: Optional[str] = None,
):
    """Initialize a new InferenceHTTPClient instance.

    Args:
        api_url (str): The base URL for the inference API.
        api_key (Optional[str], optional): API key for authentication. Defaults to None.
    """
    self.__api_url = api_url
    self.__api_key = api_key
    self.__inference_configuration = InferenceConfiguration.init_default()
    self.__client_mode = _determine_client_mode(api_url=api_url)
    self.__selected_model: Optional[str] = None
    self.__webrtc_client: Optional["WebRTCClient"] = None

clip_compare ¶

clip_compare(
    subject,
    prompt,
    subject_type="image",
    prompt_type="text",
    clip_version=None,
)

Compare a subject against prompts using CLIP embeddings.

Parameters:

Name	Type	Description	Default
`subject`	`Union[str, ImagesReference]`	The subject to compare (image or text).	required
`prompt`	`Union[str, List[str], ImagesReference, List[ImagesReference]]`	The prompt(s) to compare against.	required
`subject_type`	`str`	Type of subject ('image' or 'text'). Defaults to "image".	`'image'`
`prompt_type`	`str`	Type of prompt(s) ('image' or 'text'). Defaults to "text".	`'text'`
`clip_version`	`Optional[str]`	Version of CLIP model to use. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Comparison results between subject and prompt(s).

Raises:

Type	Description
`InvalidParameterError`	If subject_type or prompt_type is invalid.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def clip_compare(
    self,
    subject: Union[str, ImagesReference],
    prompt: Union[str, List[str], ImagesReference, List[ImagesReference]],
    subject_type: str = "image",
    prompt_type: str = "text",
    clip_version: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Compare a subject against prompts using CLIP embeddings.

    Args:
        subject (Union[str, ImagesReference]): The subject to compare (image or text).
        prompt (Union[str, List[str], ImagesReference, List[ImagesReference]]): The prompt(s) to compare against.
        subject_type (str, optional): Type of subject ('image' or 'text'). Defaults to "image".
        prompt_type (str, optional): Type of prompt(s) ('image' or 'text'). Defaults to "text".
        clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

    Returns:
        Union[dict, List[dict]]: Comparison results between subject and prompt(s).

    Raises:
        InvalidParameterError: If subject_type or prompt_type is invalid.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    if (
        subject_type not in CLIP_ARGUMENT_TYPES
        or prompt_type not in CLIP_ARGUMENT_TYPES
    ):
        raise InvalidParameterError(
            f"Could not accept `subject_type` and `prompt_type` with values different than {CLIP_ARGUMENT_TYPES}"
        )
    payload = self.__initialise_payload()
    payload["subject_type"] = subject_type
    payload["prompt_type"] = prompt_type
    if clip_version is not None:
        payload["clip_version_id"] = clip_version
    if subject_type == "image":
        encoded_image = load_static_inference_input(
            inference_input=subject,
        )
        payload = inject_images_into_payload(
            payload=payload, encoded_images=encoded_image, key="subject"
        )
    else:
        payload["subject"] = subject
    if prompt_type == "image":
        encoded_inference_inputs = load_static_inference_input(
            inference_input=prompt,
        )
        payload = inject_images_into_payload(
            payload=payload, encoded_images=encoded_inference_inputs, key="prompt"
        )
    else:
        payload["prompt"] = prompt

    headers = DEFAULT_HEADERS.copy()
    execution_id_value = execution_id.get()
    if execution_id_value is not None:
        headers[EXECUTION_ID_HEADER] = execution_id_value

    response = requests.post(
        self.__wrap_url_with_api_key(f"{self.__api_url}/clip/compare"),
        json=payload,
        headers=headers,
    )
    _collect_processing_time_from_response(
        response, model_id=clip_version or "clip"
    )
    api_key_safe_raise_for_status(response=response)
    return response.json()

clip_compare_async `async` ¶

clip_compare_async(
    subject,
    prompt,
    subject_type="image",
    prompt_type="text",
    clip_version=None,
)

Compare a subject against prompts using CLIP embeddings asynchronously.

Parameters:

Name	Type	Description	Default
`subject`	`Union[str, ImagesReference]`	The subject to compare (image or text).	required
`prompt`	`Union[str, List[str], ImagesReference, List[ImagesReference]]`	The prompt(s) to compare against.	required
`subject_type`	`str`	Type of subject ('image' or 'text'). Defaults to "image".	`'image'`
`prompt_type`	`str`	Type of prompt(s) ('image' or 'text'). Defaults to "text".	`'text'`
`clip_version`	`Optional[str]`	Version of CLIP model to use. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Comparison results between subject and prompt(s).

Raises:

Type	Description
`InvalidParameterError`	If subject_type or prompt_type is invalid.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def clip_compare_async(
    self,
    subject: Union[str, ImagesReference],
    prompt: Union[str, List[str], ImagesReference, List[ImagesReference]],
    subject_type: str = "image",
    prompt_type: str = "text",
    clip_version: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Compare a subject against prompts using CLIP embeddings asynchronously.

    Args:
        subject (Union[str, ImagesReference]): The subject to compare (image or text).
        prompt (Union[str, List[str], ImagesReference, List[ImagesReference]]): The prompt(s) to compare against.
        subject_type (str, optional): Type of subject ('image' or 'text'). Defaults to "image".
        prompt_type (str, optional): Type of prompt(s) ('image' or 'text'). Defaults to "text".
        clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

    Returns:
        Union[dict, List[dict]]: Comparison results between subject and prompt(s).

    Raises:
        InvalidParameterError: If subject_type or prompt_type is invalid.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    if (
        subject_type not in CLIP_ARGUMENT_TYPES
        or prompt_type not in CLIP_ARGUMENT_TYPES
    ):
        raise InvalidParameterError(
            f"Could not accept `subject_type` and `prompt_type` with values different than {CLIP_ARGUMENT_TYPES}"
        )
    payload = self.__initialise_payload()
    payload["subject_type"] = subject_type
    payload["prompt_type"] = prompt_type
    if clip_version is not None:
        payload["clip_version_id"] = clip_version
    if subject_type == "image":
        encoded_image = await load_static_inference_input_async(
            inference_input=subject,
        )
        payload = inject_images_into_payload(
            payload=payload, encoded_images=encoded_image, key="subject"
        )
    else:
        payload["subject"] = subject
    if prompt_type == "image":
        encoded_inference_inputs = await load_static_inference_input_async(
            inference_input=prompt,
        )
        payload = inject_images_into_payload(
            payload=payload, encoded_images=encoded_inference_inputs, key="prompt"
        )
    else:
        payload["prompt"] = prompt

    async with aiohttp.ClientSession() as session:
        async with session.post(
            self.__wrap_url_with_api_key(f"{self.__api_url}/clip/compare"),
            json=payload,
            headers=DEFAULT_HEADERS,
        ) as response:
            response.raise_for_status()
            return await response.json()

configure ¶

configure(inference_configuration)

Configure the client with new inference settings.

Parameters:

Name	Type	Description	Default
`inference_configuration`	`InferenceConfiguration`	The new configuration to apply.	required

Returns:

Name	Type	Description
`InferenceHTTPClient`	`InferenceHTTPClient`	The client instance with updated configuration.

Source code in inference_sdk/http/client.py

def configure(
    self, inference_configuration: InferenceConfiguration
) -> "InferenceHTTPClient":
    """Configure the client with new inference settings.

    Args:
        inference_configuration (InferenceConfiguration): The new configuration to apply.

    Returns:
        InferenceHTTPClient: The client instance with updated configuration.
    """
    self.__inference_configuration = inference_configuration
    return self

consume_inference_pipeline_result ¶

consume_inference_pipeline_result(
    pipeline_id, excluded_fields=None
)

Consumes and returns the next available result from an inference pipeline.

Parameters:

Name	Type	Description	Default
`pipeline_id`	`str`	The unique identifier of the inference pipeline to consume results from.	required
`excluded_fields`	`Optional[List[str]]`	Optional list of field names to exclude from the result. If None, no fields will be excluded.	`None`

Returns:

Name	Type	Description
`dict`	`dict`	A dictionary containing the next available result from the pipeline.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.
`InvalidParameterError`	If pipeline_id is empty or None.

Source code in inference_sdk/http/client.py

@experimental(
    info="Video processing in inference server is under development. Breaking changes are possible."
)
@wrap_errors
def consume_inference_pipeline_result(
    self,
    pipeline_id: str,
    excluded_fields: Optional[List[str]] = None,
) -> dict:
    """Consumes and returns the next available result from an inference pipeline.

    Args:
        pipeline_id: The unique identifier of the inference pipeline to consume results from.
        excluded_fields: Optional list of field names to exclude from the result. If None,
            no fields will be excluded.

    Returns:
        dict: A dictionary containing the next available result from the pipeline.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
        InvalidParameterError: If pipeline_id is empty or None.
    """
    self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
    if excluded_fields is None:
        excluded_fields = []
    payload = {"api_key": self.__api_key, "excluded_fields": excluded_fields}
    response = requests.get(
        f"{self.__api_url}/inference_pipelines/{pipeline_id}/consume",
        json=payload,
    )
    api_key_safe_raise_for_status(response=response)
    return response.json()

depth_estimation ¶

depth_estimation(
    inference_input, model_id="depth-anything-v3/small"
)

Run depth estimation on input image(s).

This method estimates depth maps from images using models like Depth Anything.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for depth estimation. Can be file paths, URLs, base64 strings, numpy arrays, or PIL images.	required
`model_id`	`str`	The depth estimation model to use. Defaults to "depth-anything-v3/small". Supported models include: - "depth-anything-v2/small" - "depth-anything-v3/small" - "depth-anything-v3/base"	`'depth-anything-v3/small'`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Depth estimation results containing: - normalized_depth: The normalized depth map as a list - image: Hex-encoded visualization of the depth map

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def depth_estimation(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model_id: str = "depth-anything-v3/small",
) -> Union[dict, List[dict]]:
    """Run depth estimation on input image(s).

    This method estimates depth maps from images using models like Depth Anything.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
            for depth estimation. Can be file paths, URLs, base64 strings, numpy arrays,
            or PIL images.
        model_id (str, optional): The depth estimation model to use. Defaults to
            "depth-anything-v3/small". Supported models include:
            - "depth-anything-v2/small"
            - "depth-anything-v3/small"
            - "depth-anything-v3/base"

    Returns:
        Union[dict, List[dict]]: Depth estimation results containing:
            - normalized_depth: The normalized depth map as a list
            - image: Hex-encoded visualization of the depth map

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    extra_payload = {"model_id": model_id}
    result = self._post_images(
        inference_input=inference_input,
        endpoint="/infer/depth-estimation",
        extra_payload=extra_payload,
    )
    return result

depth_estimation_async `async` ¶

depth_estimation_async(
    inference_input, model_id="depth-anything-v3/small"
)

Run depth estimation on input image(s) asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for depth estimation.	required
`model_id`	`str`	The depth estimation model to use. Defaults to "depth-anything-v3/small".	`'depth-anything-v3/small'`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Depth estimation results.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def depth_estimation_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model_id: str = "depth-anything-v3/small",
) -> Union[dict, List[dict]]:
    """Run depth estimation on input image(s) asynchronously.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
            for depth estimation.
        model_id (str, optional): The depth estimation model to use. Defaults to
            "depth-anything-v3/small".

    Returns:
        Union[dict, List[dict]]: Depth estimation results.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    extra_payload = {"model_id": model_id}
    result = await self._post_images_async(
        inference_input=inference_input,
        endpoint="/infer/depth-estimation",
        extra_payload=extra_payload,
    )
    return result

detect_gazes ¶

detect_gazes(inference_input)

Detect gazes in input image(s).

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for gaze detection.	required

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Gaze detection results for the input image(s).

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def detect_gazes(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
) -> Union[dict, List[dict]]:
    """Detect gazes in input image(s).

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for gaze detection.

    Returns:
        Union[dict, List[dict]]: Gaze detection results for the input image(s).

    Raises:
        WrongClientModeError: If not in API v1 mode.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    self.__ensure_v1_client_mode()  # Lambda does not support Gaze, so we require v1 mode of client
    result = self._post_images(
        inference_input=inference_input, endpoint="/gaze/gaze_detection"
    )
    return combine_gaze_detections(detections=result)

detect_gazes_async `async` ¶

detect_gazes_async(inference_input)

Detect gazes in input image(s) asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for gaze detection.	required

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Gaze detection results for the input image(s).

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def detect_gazes_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
) -> Union[dict, List[dict]]:
    """Detect gazes in input image(s) asynchronously.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for gaze detection.

    Returns:
        Union[dict, List[dict]]: Gaze detection results for the input image(s).

    Raises:
        WrongClientModeError: If not in API v1 mode.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    self.__ensure_v1_client_mode()  # Lambda does not support Gaze, so we require v1 mode of client
    result = await self._post_images_async(
        inference_input=inference_input, endpoint="/gaze/gaze_detection"
    )
    return combine_gaze_detections(detections=result)

get_clip_image_embeddings ¶

get_clip_image_embeddings(
    inference_input, clip_version=None
)

Get CLIP embeddings for input image(s).

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) to embed.	required
`clip_version`	`Optional[str]`	Version of CLIP model to use. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: CLIP embeddings for the input image(s).

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def get_clip_image_embeddings(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    clip_version: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Get CLIP embeddings for input image(s).

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) to embed.
        clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

    Returns:
        Union[dict, List[dict]]: CLIP embeddings for the input image(s).

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    extra_payload = {}
    if clip_version is not None:
        extra_payload["clip_version_id"] = clip_version
    result = self._post_images(
        inference_input=inference_input,
        endpoint="/clip/embed_image",
        extra_payload=extra_payload,
    )
    result = combine_clip_embeddings(embeddings=result)
    return unwrap_single_element_list(result)

get_clip_image_embeddings_async `async` ¶

get_clip_image_embeddings_async(
    inference_input, clip_version=None
)

Get CLIP embeddings for input image(s) asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) to embed.	required
`clip_version`	`Optional[str]`	Version of CLIP model to use. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: CLIP embeddings for the input image(s).

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def get_clip_image_embeddings_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    clip_version: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Get CLIP embeddings for input image(s) asynchronously.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) to embed.
        clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

    Returns:
        Union[dict, List[dict]]: CLIP embeddings for the input image(s).

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    extra_payload = {}
    if clip_version is not None:
        extra_payload["clip_version_id"] = clip_version
    result = await self._post_images_async(
        inference_input=inference_input,
        endpoint="/clip/embed_image",
        extra_payload=extra_payload,
    )
    result = combine_clip_embeddings(embeddings=result)
    return unwrap_single_element_list(result)

get_clip_text_embeddings ¶

get_clip_text_embeddings(text, clip_version=None)

Get CLIP embeddings for input text(s).

Parameters:

Name	Type	Description	Default
`text`	`Union[str, List[str]]`	Input text(s) to embed.	required
`clip_version`	`Optional[str]`	Version of CLIP model to use. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: CLIP embeddings for the input text(s).

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def get_clip_text_embeddings(
    self,
    text: Union[str, List[str]],
    clip_version: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Get CLIP embeddings for input text(s).

    Args:
        text (Union[str, List[str]]): Input text(s) to embed.
        clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

    Returns:
        Union[dict, List[dict]]: CLIP embeddings for the input text(s).

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    payload = self.__initialise_payload()
    payload["text"] = text
    if clip_version is not None:
        payload["clip_version_id"] = clip_version
    headers = DEFAULT_HEADERS.copy()
    execution_id_value = execution_id.get()
    if execution_id_value is not None:
        headers[EXECUTION_ID_HEADER] = execution_id_value

    response = requests.post(
        self.__wrap_url_with_api_key(f"{self.__api_url}/clip/embed_text"),
        json=payload,
        headers=headers,
    )
    _collect_processing_time_from_response(
        response, model_id=clip_version or "clip"
    )
    api_key_safe_raise_for_status(response=response)
    return unwrap_single_element_list(sequence=response.json())

get_clip_text_embeddings_async `async` ¶

get_clip_text_embeddings_async(text, clip_version=None)

Get CLIP embeddings for input text(s) asynchronously.

Parameters:

Name	Type	Description	Default
`text`	`Union[str, List[str]]`	Input text(s) to embed.	required
`clip_version`	`Optional[str]`	Version of CLIP model to use. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: CLIP embeddings for the input text(s).

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def get_clip_text_embeddings_async(
    self,
    text: Union[str, List[str]],
    clip_version: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Get CLIP embeddings for input text(s) asynchronously.

    Args:
        text (Union[str, List[str]]): Input text(s) to embed.
        clip_version (Optional[str], optional): Version of CLIP model to use. Defaults to None.

    Returns:
        Union[dict, List[dict]]: CLIP embeddings for the input text(s).

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    payload = self.__initialise_payload()
    payload["text"] = text
    if clip_version is not None:
        payload["clip_version_id"] = clip_version
    async with aiohttp.ClientSession() as session:
        async with session.post(
            self.__wrap_url_with_api_key(f"{self.__api_url}/clip/embed_text"),
            json=payload,
            headers=DEFAULT_HEADERS,
        ) as response:
            response.raise_for_status()
            response_payload = await response.json()
    return unwrap_single_element_list(sequence=response_payload)

get_inference_pipeline_status ¶

get_inference_pipeline_status(pipeline_id)

Gets the current status of a specific inference pipeline.

Parameters:

Name	Type	Description	Default
`pipeline_id`	`str`	The unique identifier of the inference pipeline to check.	required

Returns:

Name	Type	Description
`dict`	`dict`	A dictionary containing the current status and details of the pipeline.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.
`ValueError`	If pipeline_id is empty or None.

Source code in inference_sdk/http/client.py

@experimental(
    info="Video processing in inference server is under development. Breaking changes are possible."
)
@wrap_errors
def get_inference_pipeline_status(self, pipeline_id: str) -> dict:
    """Gets the current status of a specific inference pipeline.

    Args:
        pipeline_id: The unique identifier of the inference pipeline to check.

    Returns:
        dict: A dictionary containing the current status and details of the pipeline.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
        ValueError: If pipeline_id is empty or None.
    """
    self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
    payload = {"api_key": self.__api_key}
    response = requests.get(
        f"{self.__api_url}/inference_pipelines/{pipeline_id}/status",
        json=payload,
    )
    api_key_safe_raise_for_status(response=response)
    return response.json()

get_model_description ¶

get_model_description(model_id, allow_loading=True)

Get the description of a model.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier of the model.	required
`allow_loading`	`bool`	Whether to load the model if not already loaded. Defaults to True.	`True`

Returns:

Name	Type	Description
`ModelDescription`	`ModelDescription`	Description of the model.

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`ModelNotInitializedError`	If the model is not initialized and cannot be loaded.

Source code in inference_sdk/http/client.py

def get_model_description(
    self, model_id: str, allow_loading: bool = True
) -> ModelDescription:
    """Get the description of a model.

    Args:
        model_id (str): The identifier of the model.
        allow_loading (bool, optional): Whether to load the model if not already loaded. Defaults to True.

    Returns:
        ModelDescription: Description of the model.

    Raises:
        WrongClientModeError: If not in API v1 mode.
        ModelNotInitializedError: If the model is not initialized and cannot be loaded.
    """
    self.__ensure_v1_client_mode()
    de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
    registered_models = self.list_loaded_models()
    matching_model = filter_model_descriptions(
        descriptions=registered_models.models,
        model_id=de_aliased_model_id,
    )
    if matching_model is None and allow_loading is True:
        registered_models = self.load_model(model_id=de_aliased_model_id)
        matching_model = filter_model_descriptions(
            descriptions=registered_models.models,
            model_id=de_aliased_model_id,
        )
    if matching_model is not None:
        return matching_model
    raise ModelNotInitializedError(
        f"Model {model_id} (de-aliased: {de_aliased_model_id}) is not initialised and cannot "
        f"retrieve its description."
    )

get_model_description_async `async` ¶

get_model_description_async(model_id, allow_loading=True)

Get the description of a model asynchronously.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier of the model.	required
`allow_loading`	`bool`	Whether to load the model if not already loaded. Defaults to True.	`True`

Returns:

Name	Type	Description
`ModelDescription`	`ModelDescription`	Description of the model.

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`ModelNotInitializedError`	If the model is not initialized and cannot be loaded.

Source code in inference_sdk/http/client.py

async def get_model_description_async(
    self, model_id: str, allow_loading: bool = True
) -> ModelDescription:
    """Get the description of a model asynchronously.

    Args:
        model_id (str): The identifier of the model.
        allow_loading (bool, optional): Whether to load the model if not already loaded. Defaults to True.

    Returns:
        ModelDescription: Description of the model.

    Raises:
        WrongClientModeError: If not in API v1 mode.
        ModelNotInitializedError: If the model is not initialized and cannot be loaded.
    """
    self.__ensure_v1_client_mode()
    de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
    registered_models = await self.list_loaded_models_async()
    matching_model = filter_model_descriptions(
        descriptions=registered_models.models,
        model_id=de_aliased_model_id,
    )
    if matching_model is None and allow_loading is True:
        registered_models = await self.load_model_async(
            model_id=de_aliased_model_id
        )
        matching_model = filter_model_descriptions(
            descriptions=registered_models.models,
            model_id=de_aliased_model_id,
        )
    if matching_model is not None:
        return matching_model
    raise ModelNotInitializedError(
        f"Model {model_id} (de-aliased: {de_aliased_model_id}) is not initialised and cannot "
        f"retrieve its description."
    )

get_perception_encoder_image_embeddings ¶

get_perception_encoder_image_embeddings(
    inference_input, perception_encoder_version=None
)

Get Perception Encoder embeddings for input image(s).

Source code in inference_sdk/http/client.py

@wrap_errors
def get_perception_encoder_image_embeddings(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    perception_encoder_version: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Get Perception Encoder embeddings for input image(s)."""
    extra_payload = {}
    if perception_encoder_version is not None:
        extra_payload["perception_encoder_version_id"] = perception_encoder_version
    result = self._post_images(
        inference_input=inference_input,
        endpoint="/perception_encoder/embed_image",
        extra_payload=extra_payload,
    )
    return unwrap_single_element_list(result)

get_perception_encoder_text_embeddings ¶

get_perception_encoder_text_embeddings(
    text, perception_encoder_version=None
)

Get Perception Encoder embeddings for input text(s).

Source code in inference_sdk/http/client.py

@wrap_errors
def get_perception_encoder_text_embeddings(
    self,
    text: Union[str, List[str]],
    perception_encoder_version: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Get Perception Encoder embeddings for input text(s)."""
    payload = self.__initialise_payload()
    payload["text"] = text
    if perception_encoder_version is not None:
        payload["perception_encoder_version_id"] = perception_encoder_version

    headers = DEFAULT_HEADERS.copy()
    execution_id_value = execution_id.get()
    if execution_id_value is not None:
        headers[EXECUTION_ID_HEADER] = execution_id_value

    response = requests.post(
        self.__wrap_url_with_api_key(
            f"{self.__api_url}/perception_encoder/embed_text"
        ),
        json=payload,
        headers=headers,
    )
    _collect_processing_time_from_response(
        response,
        model_id=perception_encoder_version or "perception_encoder",
    )
    api_key_safe_raise_for_status(response=response)
    return unwrap_single_element_list(sequence=response.json())

get_server_info ¶

get_server_info()

Get information about the inference server.

Returns:

Name	Type	Description
`ServerInfo`	`ServerInfo`	Information about the server configuration and status.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def get_server_info(self) -> ServerInfo:
    """Get information about the inference server.

    Returns:
        ServerInfo: Information about the server configuration and status.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    response = requests.get(f"{self.__api_url}/info")
    response.raise_for_status()
    response_payload = response.json()
    return ServerInfo.from_dict(response_payload)

infer ¶

infer(inference_input, model_id=None)

Run inference on one or more images.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for inference.	required
`model_id`	`Optional[str]`	Model identifier to use for inference. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Inference results for the input image(s).

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def infer(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model_id: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Run inference on one or more images.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for inference.
        model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

    Returns:
        Union[dict, List[dict]]: Inference results for the input image(s).

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    if self.__client_mode is HTTPClientMode.V0:
        return self.infer_from_api_v0(
            inference_input=inference_input,
            model_id=model_id,
        )
    return self.infer_from_api_v1(
        inference_input=inference_input,
        model_id=model_id,
    )

infer_async `async` ¶

infer_async(inference_input, model_id=None)

Run inference asynchronously on one or more images.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for inference.	required
`model_id`	`Optional[str]`	Model identifier to use for inference. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Inference results for the input image(s).

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def infer_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model_id: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Run inference asynchronously on one or more images.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for inference.
        model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

    Returns:
        Union[dict, List[dict]]: Inference results for the input image(s).

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    if self.__client_mode is HTTPClientMode.V0:
        return await self.infer_from_api_v0_async(
            inference_input=inference_input,
            model_id=model_id,
        )
    return await self.infer_from_api_v1_async(
        inference_input=inference_input,
        model_id=model_id,
    )

infer_from_api_v0 ¶

infer_from_api_v0(inference_input, model_id=None)

Run inference using API v0.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for inference.	required
`model_id`	`Optional[str]`	Model identifier to use for inference. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Inference results for the input image(s).

Raises:

Type	Description
`ModelNotSelectedError`	If no model is selected.
`APIKeyNotProvided`	If API key is required but not provided.
`InvalidModelIdentifier`	If the model identifier format is invalid.

Source code in inference_sdk/http/client.py

def infer_from_api_v0(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model_id: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Run inference using API v0.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for inference.
        model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

    Returns:
        Union[dict, List[dict]]: Inference results for the input image(s).

    Raises:
        ModelNotSelectedError: If no model is selected.
        APIKeyNotProvided: If API key is required but not provided.
        InvalidModelIdentifier: If the model identifier format is invalid.
    """
    requests_data = self._prepare_infer_from_api_v0_request_data(
        inference_input=inference_input,
        model_id=model_id,
    )
    responses = self._execute_infer_from_api_request(
        requests_data=requests_data,
    )
    results = []
    for request_data, response in zip(requests_data, responses):
        if response_contains_jpeg_image(response=response):
            visualisation = transform_visualisation_bytes(
                visualisation=response.content,
                expected_format=self.__inference_configuration.output_visualisation_format,
            )
            parsed_response = {"visualization": visualisation}
        else:
            parsed_response = response.json()
            if parsed_response.get("visualization") is not None:
                parsed_response["visualization"] = transform_base64_visualisation(
                    visualisation=parsed_response["visualization"],
                    expected_format=self.__inference_configuration.output_visualisation_format,
                )
        parsed_response = adjust_prediction_to_client_scaling_factor(
            prediction=parsed_response,
            scaling_factor=request_data.image_scaling_factors[0],
        )
        results.append(parsed_response)
    return unwrap_single_element_list(sequence=results)

infer_from_api_v0_async `async` ¶

infer_from_api_v0_async(inference_input, model_id=None)

Run inference using API v0 asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for inference.	required
`model_id`	`Optional[str]`	Model identifier to use for inference. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Inference results for the input image(s).

Raises:

Type	Description
`ModelNotSelectedError`	If no model is selected.
`APIKeyNotProvided`	If API key is required but not provided.
`InvalidModelIdentifier`	If the model identifier format is invalid.

Source code in inference_sdk/http/client.py

async def infer_from_api_v0_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model_id: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Run inference using API v0 asynchronously.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for inference.
        model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

    Returns:
        Union[dict, List[dict]]: Inference results for the input image(s).

    Raises:
        ModelNotSelectedError: If no model is selected.
        APIKeyNotProvided: If API key is required but not provided.
        InvalidModelIdentifier: If the model identifier format is invalid.
    """
    model_id_to_be_used = model_id or self.__selected_model
    _ensure_model_is_selected(model_id=model_id_to_be_used)
    _ensure_api_key_provided(api_key=self.__api_key)
    model_id_to_be_used = resolve_roboflow_model_alias(model_id=model_id_to_be_used)
    model_id_chunks = model_id_to_be_used.split("/")
    if len(model_id_chunks) != 2:
        raise InvalidModelIdentifier(
            f"Invalid model id: {model_id}. Expected format: project_id/model_version_id."
        )
    max_height, max_width = _determine_client_downsizing_parameters(
        client_downsizing_disabled=self.__inference_configuration.client_downsizing_disabled,
        model_description=None,
        default_max_input_size=self.__inference_configuration.default_max_input_size,
    )
    encoded_inference_inputs = await load_static_inference_input_async(
        inference_input=inference_input,
        max_height=max_height,
        max_width=max_width,
    )
    params = {
        "api_key": self.__api_key,
    }
    params.update(self.__inference_configuration.to_legacy_call_parameters())

    execution_id_value = execution_id.get()
    headers = DEFAULT_HEADERS
    if execution_id_value:
        headers = headers.copy()
        headers[EXECUTION_ID_HEADER] = execution_id_value

    requests_data = prepare_requests_data(
        url=f"{self.__api_url}/{model_id_chunks[0]}/{model_id_chunks[1]}",
        encoded_inference_inputs=encoded_inference_inputs,
        headers=headers,
        parameters=params,
        payload=None,
        max_batch_size=1,
        image_placement=ImagePlacement.DATA,
    )
    responses = await execute_requests_packages_async(
        requests_data=requests_data,
        request_method=RequestMethod.POST,
        max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
    )
    results = []
    for request_data, response in zip(requests_data, responses):
        if not issubclass(type(response), dict):
            visualisation = transform_visualisation_bytes(
                visualisation=response,
                expected_format=self.__inference_configuration.output_visualisation_format,
            )
            parsed_response = {"visualization": visualisation}
        else:
            parsed_response = response
            if parsed_response.get("visualization") is not None:
                parsed_response["visualization"] = transform_base64_visualisation(
                    visualisation=parsed_response["visualization"],
                    expected_format=self.__inference_configuration.output_visualisation_format,
                )
        parsed_response = adjust_prediction_to_client_scaling_factor(
            prediction=parsed_response,
            scaling_factor=request_data.image_scaling_factors[0],
        )
        results.append(parsed_response)
    return unwrap_single_element_list(sequence=results)

infer_from_workflow ¶

infer_from_workflow(
    workspace_name=None,
    workflow_name=None,
    specification=None,
    images=None,
    parameters=None,
    excluded_fields=None,
    use_cache=True,
    enable_profiling=False,
    workflow_version_id=None,
)

Run inference using a workflow specification.

Triggers inference from workflow specification at the inference HTTP side. Either (workspace_name and workflow_name) or workflow_specification must be provided. In the first case - definition of workflow will be fetched from Roboflow API, in the latter - workflow_specification will be used. images and parameters will be merged into workflow inputs, the distinction is made to make sure the SDK can easily serialise images and prepare a proper payload. Supported images are numpy arrays, PIL.Image and base64 images, links to images and local paths. excluded_fields will be added to request to filter out results of workflow execution at the server side.

Parameters:

Name	Type	Description	Default
`workspace_name`	`Optional[str]`	Name of the workspace containing the workflow. Defaults to None.	`None`
`workflow_name`	`Optional[str]`	Name of the workflow. Defaults to None.	`None`
`specification`	`Optional[dict]`	Direct workflow specification. Defaults to None.	`None`
`images`	`Optional[Dict[str, Any]]`	Images to process. Defaults to None.	`None`
`parameters`	`Optional[Dict[str, Any]]`	Additional parameters for the workflow. Defaults to None.	`None`
`excluded_fields`	`Optional[List[str]]`	Fields to exclude from results. Defaults to None.	`None`
`use_cache`	`bool`	Whether to use cached results. Defaults to True.	`True`
`enable_profiling`	`bool`	Whether to enable profiling. Defaults to False.	`False`

Returns:

Type	Description
`List[Dict[str, Any]]`	List[Dict[str, Any]]: Results of the workflow execution.

Raises:

Type	Description
`InvalidParameterError`	If neither workflow identifiers nor specification is provided.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@deprecated(
    reason="Please use run_workflow(...) method. This method will be removed end of Q2 2024"
)
@wrap_errors
def infer_from_workflow(
    self,
    workspace_name: Optional[str] = None,
    workflow_name: Optional[str] = None,
    specification: Optional[dict] = None,
    images: Optional[Dict[str, Any]] = None,
    parameters: Optional[Dict[str, Any]] = None,
    excluded_fields: Optional[List[str]] = None,
    use_cache: bool = True,
    enable_profiling: bool = False,
    workflow_version_id: Optional[str] = None,
) -> List[Dict[str, Any]]:
    """Run inference using a workflow specification.

    Triggers inference from workflow specification at the inference HTTP
    side. Either (`workspace_name` and `workflow_name`) or `workflow_specification` must be
    provided. In the first case - definition of workflow will be fetched
    from Roboflow API, in the latter - `workflow_specification` will be
    used. `images` and `parameters` will be merged into workflow inputs,
    the distinction is made to make sure the SDK can easily serialise
    images and prepare a proper payload. Supported images are numpy arrays,
    PIL.Image and base64 images, links to images and local paths.
    `excluded_fields` will be added to request to filter out results
    of workflow execution at the server side.

    Args:
        workspace_name (Optional[str], optional): Name of the workspace containing the workflow. Defaults to None.
        workflow_name (Optional[str], optional): Name of the workflow. Defaults to None.
        specification (Optional[dict], optional): Direct workflow specification. Defaults to None.
        images (Optional[Dict[str, Any]], optional): Images to process. Defaults to None.
        parameters (Optional[Dict[str, Any]], optional): Additional parameters for the workflow. Defaults to None.
        excluded_fields (Optional[List[str]], optional): Fields to exclude from results. Defaults to None.
        use_cache (bool, optional): Whether to use cached results. Defaults to True.
        enable_profiling (bool, optional): Whether to enable profiling. Defaults to False.

    Returns:
        List[Dict[str, Any]]: Results of the workflow execution.

    Raises:
        InvalidParameterError: If neither workflow identifiers nor specification is provided.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    return self._run_workflow(
        workspace_name=workspace_name,
        workflow_id=workflow_name,
        specification=specification,
        images=images,
        parameters=parameters,
        excluded_fields=excluded_fields,
        legacy_endpoints=True,
        use_cache=use_cache,
        enable_profiling=enable_profiling,
        workflow_version_id=workflow_version_id,
    )

infer_from_yolo_world ¶

infer_from_yolo_world(
    inference_input,
    class_names,
    model_version=None,
    confidence=None,
)

Run inference using YOLO-World model.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) to run inference on. Can be a single image reference or a list of image references.	required
`class_names`	`List[str]`	List of class names to detect in the image(s).	required
`model_version`	`Optional[str]`	Optional version of YOLO-World model to use. If not specified, uses the default version.	`None`
`confidence`	`Optional[float]`	Optional confidence threshold for detections. If not specified, uses the model's default threshold.	`None`

Returns:

Type	Description
`List[dict]`	List of dictionaries containing detection results for each input image.
`List[dict]`	Each dictionary contains bounding boxes, class labels, and confidence scores
`List[dict]`	for detected objects.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def infer_from_yolo_world(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    class_names: List[str],
    model_version: Optional[str] = None,
    confidence: Optional[float] = None,
) -> List[dict]:
    """Run inference using YOLO-World model.

    Args:
        inference_input: Input image(s) to run inference on. Can be a single image
            reference or a list of image references.
        class_names: List of class names to detect in the image(s).
        model_version: Optional version of YOLO-World model to use. If not specified,
            uses the default version.
        confidence: Optional confidence threshold for detections. If not specified,
            uses the model's default threshold.

    Returns:
        List of dictionaries containing detection results for each input image.
        Each dictionary contains bounding boxes, class labels, and confidence scores
        for detected objects.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    encoded_inference_inputs = load_static_inference_input(
        inference_input=inference_input,
    )
    payload = self.__initialise_payload()
    payload["text"] = class_names
    if model_version is not None:
        payload["yolo_world_version_id"] = model_version
    if confidence is not None:
        payload["confidence"] = confidence
    url = self.__wrap_url_with_api_key(f"{self.__api_url}/yolo_world/infer")
    requests_data = prepare_requests_data(
        url=url,
        encoded_inference_inputs=encoded_inference_inputs,
        headers=DEFAULT_HEADERS,
        parameters=None,
        payload=payload,
        max_batch_size=1,
        image_placement=ImagePlacement.JSON,
    )
    responses = execute_requests_packages(
        requests_data=requests_data,
        request_method=RequestMethod.POST,
        max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
    )
    return [r.json() for r in responses]

infer_from_yolo_world_async `async` ¶

infer_from_yolo_world_async(
    inference_input,
    class_names,
    model_version=None,
    confidence=None,
)

Run inference using YOLO-World model asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) to run inference on. Can be a single image reference or a list of image references.	required
`class_names`	`List[str]`	List of class names to detect in the image(s).	required
`model_version`	`Optional[str]`	Optional version of YOLO-World model to use. If not specified, uses the default version.	`None`
`confidence`	`Optional[float]`	Optional confidence threshold for detections. If not specified, uses the model's default threshold.	`None`

Returns:

Type	Description
`List[dict]`	List of dictionaries containing detection results for each input image.
`List[dict]`	Each dictionary contains bounding boxes, class labels, and confidence scores
`List[dict]`	for detected objects.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def infer_from_yolo_world_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    class_names: List[str],
    model_version: Optional[str] = None,
    confidence: Optional[float] = None,
) -> List[dict]:
    """Run inference using YOLO-World model asynchronously.

    Args:
        inference_input: Input image(s) to run inference on. Can be a single image
            reference or a list of image references.
        class_names: List of class names to detect in the image(s).
        model_version: Optional version of YOLO-World model to use. If not specified,
            uses the default version.
        confidence: Optional confidence threshold for detections. If not specified,
            uses the model's default threshold.

    Returns:
        List of dictionaries containing detection results for each input image.
        Each dictionary contains bounding boxes, class labels, and confidence scores
        for detected objects.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    encoded_inference_inputs = await load_static_inference_input_async(
        inference_input=inference_input,
    )
    payload = self.__initialise_payload()
    payload["text"] = class_names
    if model_version is not None:
        payload["yolo_world_version_id"] = model_version
    if confidence is not None:
        payload["confidence"] = confidence
    url = self.__wrap_url_with_api_key(f"{self.__api_url}/yolo_world/infer")
    requests_data = prepare_requests_data(
        url=url,
        encoded_inference_inputs=encoded_inference_inputs,
        headers=DEFAULT_HEADERS,
        parameters=None,
        payload=payload,
        max_batch_size=1,
        image_placement=ImagePlacement.JSON,
    )
    return await execute_requests_packages_async(
        requests_data=requests_data,
        request_method=RequestMethod.POST,
        max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
    )

infer_lmm ¶

infer_lmm(
    inference_input,
    model_id,
    prompt=None,
    model_id_in_path=False,
)

Run inference using a Large Multimodal Model (LMM).

This method supports various vision-language models including Florence-2, Moondream2, SmolVLM, Qwen2.5-VL, Qwen3-VL, and PaliGemma.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for inference. Can be file paths, URLs, base64 strings, numpy arrays, or PIL images.	required
`model_id`	`str`	The identifier of the LMM model to use. Examples include: - "florence-2-base", "florence-2-large" for Florence-2 - "moondream2/moondream2_2b_jul24" for Moondream2 - "smolvlm2/smolvlm-2.2b-instruct" for SmolVLM - "qwen25-vl-7b" for Qwen2.5-VL - "qwen3vl-2b-instruct" for Qwen3-VL	required
`prompt`	`Optional[str]`	Text prompt to guide the model. Defaults to None.	`None`
`model_id_in_path`	`bool`	If True, includes model_id in the URL path (e.g., /infer/lmm/florence-2-base) which enables path-based routing. If False (default), model_id is only sent in the request body.	`False`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Inference results containing the model response. The structure depends on the specific model used.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def infer_lmm(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model_id: str,
    prompt: Optional[str] = None,
    model_id_in_path: bool = False,
) -> Union[dict, List[dict]]:
    """Run inference using a Large Multimodal Model (LMM).

    This method supports various vision-language models including Florence-2,
    Moondream2, SmolVLM, Qwen2.5-VL, Qwen3-VL, and PaliGemma.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
            for inference. Can be file paths, URLs, base64 strings, numpy arrays, or PIL images.
        model_id (str): The identifier of the LMM model to use. Examples include:
            - "florence-2-base", "florence-2-large" for Florence-2
            - "moondream2/moondream2_2b_jul24" for Moondream2
            - "smolvlm2/smolvlm-2.2b-instruct" for SmolVLM
            - "qwen25-vl-7b" for Qwen2.5-VL
            - "qwen3vl-2b-instruct" for Qwen3-VL
        prompt (Optional[str], optional): Text prompt to guide the model. Defaults to None.
        model_id_in_path (bool, optional): If True, includes model_id in the URL path
            (e.g., /infer/lmm/florence-2-base) which enables path-based routing.
            If False (default), model_id is only sent in the request body.

    Returns:
        Union[dict, List[dict]]: Inference results containing the model response.
            The structure depends on the specific model used.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    extra_payload = {"model_id": model_id}
    if prompt is not None:
        extra_payload["prompt"] = prompt

    if model_id_in_path:
        endpoint = f"/infer/lmm/{model_id}"
    else:
        endpoint = "/infer/lmm"

    result = self._post_images(
        inference_input=inference_input,
        endpoint=endpoint,
        extra_payload=extra_payload,
    )
    return result

infer_lmm_async `async` ¶

infer_lmm_async(
    inference_input,
    model_id,
    prompt=None,
    model_id_in_path=False,
)

Run inference using a Large Multimodal Model (LMM) asynchronously.

This method supports various vision-language models including Florence-2, Moondream2, SmolVLM, Qwen2.5-VL, Qwen3-VL, and PaliGemma.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for inference. Can be file paths, URLs, base64 strings, numpy arrays, or PIL images.	required
`model_id`	`str`	The identifier of the LMM model to use.	required
`prompt`	`Optional[str]`	Text prompt to guide the model. Defaults to None.	`None`
`model_id_in_path`	`bool`	If True, includes model_id in the URL path (e.g., /infer/lmm/florence-2-base) which enables path-based routing. If False (default), model_id is only sent in the request body.	`False`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Inference results containing the model response.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def infer_lmm_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model_id: str,
    prompt: Optional[str] = None,
    model_id_in_path: bool = False,
) -> Union[dict, List[dict]]:
    """Run inference using a Large Multimodal Model (LMM) asynchronously.

    This method supports various vision-language models including Florence-2,
    Moondream2, SmolVLM, Qwen2.5-VL, Qwen3-VL, and PaliGemma.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
            for inference. Can be file paths, URLs, base64 strings, numpy arrays, or PIL images.
        model_id (str): The identifier of the LMM model to use.
        prompt (Optional[str], optional): Text prompt to guide the model. Defaults to None.
        model_id_in_path (bool, optional): If True, includes model_id in the URL path
            (e.g., /infer/lmm/florence-2-base) which enables path-based routing.
            If False (default), model_id is only sent in the request body.

    Returns:
        Union[dict, List[dict]]: Inference results containing the model response.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    extra_payload = {"model_id": model_id}
    if prompt is not None:
        extra_payload["prompt"] = prompt

    if model_id_in_path:
        endpoint = f"/infer/lmm/{model_id}"
    else:
        endpoint = "/infer/lmm"

    result = await self._post_images_async(
        inference_input=inference_input,
        endpoint=endpoint,
        extra_payload=extra_payload,
    )
    return result

infer_on_stream ¶

infer_on_stream(input_uri, model_id=None)

Run inference on a video stream or sequence of images.

Parameters:

Name	Type	Description	Default
`input_uri`	`str`	URI of the input stream or directory.	required
`model_id`	`Optional[str]`	Model identifier to use for inference. Defaults to None.	`None`

Yields:

Type	Description
`Tuple[Union[str, int], ndarray, dict]`	Generator[Tuple[Union[str, int], np.ndarray, dict], None, None]: Tuples of (frame reference, frame data, prediction).

Source code in inference_sdk/http/client.py

def infer_on_stream(
    self,
    input_uri: str,
    model_id: Optional[str] = None,
) -> Generator[Tuple[Union[str, int], np.ndarray, dict], None, None]:
    """Run inference on a video stream or sequence of images.

    Args:
        input_uri (str): URI of the input stream or directory.
        model_id (Optional[str], optional): Model identifier to use for inference. Defaults to None.

    Yields:
        Generator[Tuple[Union[str, int], np.ndarray, dict], None, None]: Tuples of (frame reference, frame data, prediction).
    """
    for reference, frame in load_stream_inference_input(
        input_uri=input_uri,
        image_extensions=self.__inference_configuration.image_extensions_for_directory_scan,
    ):
        prediction = self.infer(
            inference_input=frame,
            model_id=model_id,
        )
        yield reference, frame, prediction

init `classmethod` ¶

init(api_url, api_key=None)

Initialize a new InferenceHTTPClient instance.

Parameters:

Name	Type	Description	Default
`api_url`	`str`	The base URL for the inference API.	required
`api_key`	`Optional[str]`	API key for authentication. Defaults to None.	`None`

Returns:

Name	Type	Description
`InferenceHTTPClient`	`InferenceHTTPClient`	A new instance of the InferenceHTTPClient.

Source code in inference_sdk/http/client.py

@classmethod
def init(
    cls,
    api_url: str,
    api_key: Optional[str] = None,
) -> "InferenceHTTPClient":
    """Initialize a new InferenceHTTPClient instance.

    Args:
        api_url (str): The base URL for the inference API.
        api_key (Optional[str], optional): API key for authentication. Defaults to None.

    Returns:
        InferenceHTTPClient: A new instance of the InferenceHTTPClient.
    """
    return cls(api_url=api_url, api_key=api_key)

list_inference_pipelines ¶

list_inference_pipelines()

Lists all active inference pipelines on the server.

This method retrieves information about all currently running inference pipelines on the server, including their IDs and status.

Returns:

Type	Description
`List[dict]`	List[dict]: A list of dictionaries containing information about each active inference pipeline.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@experimental(
    info="Video processing in inference server is under development. Breaking changes are possible."
)
@wrap_errors
def list_inference_pipelines(self) -> List[dict]:
    """Lists all active inference pipelines on the server.

    This method retrieves information about all currently running inference pipelines
    on the server, including their IDs and status.

    Returns:
        List[dict]: A list of dictionaries containing information about each active
            inference pipeline.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    payload = {"api_key": self.__api_key}
    response = requests.get(
        f"{self.__api_url}/inference_pipelines/list",
        json=payload,
    )
    api_key_safe_raise_for_status(response=response)
    return response.json()

list_loaded_models ¶

list_loaded_models()

List all models currently loaded on the server.

Returns:

Name	Type	Description
`RegisteredModels`	`RegisteredModels`	Information about registered models.

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def list_loaded_models(self) -> RegisteredModels:
    """List all models currently loaded on the server.

    Returns:
        RegisteredModels: Information about registered models.

    Raises:
        WrongClientModeError: If not in API v1 mode.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    self.__ensure_v1_client_mode()
    response = requests.get(
        f"{self.__api_url}/model/registry?api_key={self.__api_key}"
    )
    response.raise_for_status()
    response_payload = response.json()
    return RegisteredModels.from_dict(response_payload)

list_loaded_models_async `async` ¶

list_loaded_models_async()

List all models currently loaded on the server asynchronously.

Returns:

Name	Type	Description
`RegisteredModels`	`RegisteredModels`	Information about registered models.

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def list_loaded_models_async(self) -> RegisteredModels:
    """List all models currently loaded on the server asynchronously.

    Returns:
        RegisteredModels: Information about registered models.

    Raises:
        WrongClientModeError: If not in API v1 mode.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    self.__ensure_v1_client_mode()
    async with aiohttp.ClientSession() as session:
        async with session.get(
            f"{self.__api_url}/model/registry?api_key={self.__api_key}"
        ) as response:
            response.raise_for_status()
            response_payload = await response.json()
            return RegisteredModels.from_dict(response_payload)

load_model ¶

load_model(model_id, set_as_default=False)

Load a model onto the server.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier of the model to load.	required
`set_as_default`	`bool`	Whether to set this model as the default. Defaults to False.	`False`

Returns:

Name	Type	Description
`RegisteredModels`	`RegisteredModels`	Updated information about registered models.

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def load_model(
    self, model_id: str, set_as_default: bool = False
) -> RegisteredModels:
    """Load a model onto the server.

    Args:
        model_id (str): The identifier of the model to load.
        set_as_default (bool, optional): Whether to set this model as the default. Defaults to False.

    Returns:
        RegisteredModels: Updated information about registered models.

    Raises:
        WrongClientModeError: If not in API v1 mode.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    self.__ensure_v1_client_mode()
    de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
    response = requests.post(
        f"{self.__api_url}/model/add",
        json={
            "model_id": de_aliased_model_id,
            "api_key": self.__api_key,
        },
        headers=DEFAULT_HEADERS,
    )
    response.raise_for_status()
    response_payload = response.json()
    if set_as_default:
        self.__selected_model = de_aliased_model_id
    return RegisteredModels.from_dict(response_payload)

load_model_async `async` ¶

load_model_async(model_id, set_as_default=False)

Load a model onto the server asynchronously.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier of the model to load.	required
`set_as_default`	`bool`	Whether to set this model as the default. Defaults to False.	`False`

Returns:

Name	Type	Description
`RegisteredModels`	`RegisteredModels`	Updated information about registered models.

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def load_model_async(
    self, model_id: str, set_as_default: bool = False
) -> RegisteredModels:
    """Load a model onto the server asynchronously.

    Args:
        model_id (str): The identifier of the model to load.
        set_as_default (bool, optional): Whether to set this model as the default. Defaults to False.

    Returns:
        RegisteredModels: Updated information about registered models.

    Raises:
        WrongClientModeError: If not in API v1 mode.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    self.__ensure_v1_client_mode()
    de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
    payload = {
        "model_id": de_aliased_model_id,
        "api_key": self.__api_key,
    }
    async with aiohttp.ClientSession() as session:
        async with session.post(
            f"{self.__api_url}/model/add",
            json=payload,
            headers=DEFAULT_HEADERS,
        ) as response:
            response.raise_for_status()
            response_payload = await response.json()
    if set_as_default:
        self.__selected_model = de_aliased_model_id
    return RegisteredModels.from_dict(response_payload)

ocr_image ¶

ocr_image(
    inference_input,
    model="doctr",
    version=None,
    quantize=None,
    generate_bounding_boxes=None,
    language_codes=None,
)

Run OCR on input image(s).

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for OCR.	required
`model`	`str`	OCR model to use ('doctr' or 'trocr'). Defaults to "doctr".	`'doctr'`
`version`	`Optional[str]`	Model version to use. Defaults to None. For trocr, supported versions are: 'trocr-small-printed', 'trocr-base-printed', 'trocr-large-printed'.	`None`
`quantize`	`Optional[bool]`	(Optional[bool]): flag of EasyOCR to decide which version of model to load	`None`
`generate_bounding_boxes`	`Optional[bool]`	(Optional[bool]): flag of some models (like DocTR) to decide if output variant with sv.Detections(...) compatible bounding boxes should be returned (due to historical reasons, some old implementations were flattening detected OCR structure into text and were only returning that as results).	`None`
`language_codes`	`Optional[List[str]]`	(Optional[List[str]]): Parameter of EasyOCR that dictates the code of languages that model should recognise (leave blank for default for given OCR model version).	`None`

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def ocr_image(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model: str = "doctr",
    version: Optional[str] = None,
    quantize: Optional[bool] = None,
    generate_bounding_boxes: Optional[bool] = None,
    language_codes: Optional[List[str]] = None,
) -> Union[dict, List[dict]]:
    """Run OCR on input image(s).

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for OCR.
        model (str, optional): OCR model to use ('doctr' or 'trocr'). Defaults to "doctr".
        version (Optional[str], optional): Model version to use. Defaults to None.
            For trocr, supported versions are: 'trocr-small-printed', 'trocr-base-printed', 'trocr-large-printed'.
        quantize: (Optional[bool]): flag of EasyOCR to decide which version of model to load
        generate_bounding_boxes: (Optional[bool]): flag of some models (like DocTR) to decide if output variant
            with sv.Detections(...) compatible bounding boxes should be returned (due to historical reasons, some
            old implementations were flattening detected OCR structure into text and were only returning that as
            results).
        language_codes: (Optional[List[str]]): Parameter of EasyOCR that dictates the code of languages that
            model should recognise (leave blank for default for given OCR model version).
    Returns:
        Union[dict, List[dict]]: OCR results for the input image(s).

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    encoded_inference_inputs = load_static_inference_input(
        inference_input=inference_input,
    )
    payload = self.__initialise_payload()
    if version:
        key = f"{model.lower()}_version_id"
        payload[key] = version
    if quantize is not None:
        payload["quantize"] = quantize
    if generate_bounding_boxes is not None:
        payload["generate_bounding_boxes"] = generate_bounding_boxes
    if language_codes is not None:
        payload["language_codes"] = language_codes
    model_path = resolve_ocr_path(model_name=model)
    url = self.__wrap_url_with_api_key(f"{self.__api_url}{model_path}")
    requests_data = prepare_requests_data(
        url=url,
        encoded_inference_inputs=encoded_inference_inputs,
        headers=DEFAULT_HEADERS,
        parameters=None,
        payload=payload,
        max_batch_size=1,
        image_placement=ImagePlacement.JSON,
    )
    responses = execute_requests_packages(
        requests_data=requests_data,
        request_method=RequestMethod.POST,
        max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
    )
    results = [r.json() for r in responses]
    return unwrap_single_element_list(sequence=results)

ocr_image_async `async` ¶

ocr_image_async(
    inference_input,
    model="doctr",
    version=None,
    quantize=None,
    generate_bounding_boxes=None,
    language_codes=None,
)

Run OCR on input image(s) asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for OCR.	required
`model`	`str`	OCR model to use ('doctr' or 'trocr'). Defaults to "doctr".	`'doctr'`
`version`	`Optional[str]`	Model version to use. Defaults to None. For trocr, supported versions are: 'trocr-small-printed', 'trocr-base-printed', 'trocr-large-printed'.	`None`
`quantize`	`Optional[bool]`	(Optional[bool]): flag of EasyOCR to decide which version of model to load	`None`
`generate_bounding_boxes`	`Optional[bool]`	(Optional[bool]): flag of some models (like DocTR) to decide if output variant with sv.Detections(...) compatible bounding boxes should be returned (due to historical reasons, some old implementations were flattening detected OCR structure into text and were only returning that as results).	`None`
`language_codes`	`Optional[List[str]]`	(Optional[List[str]]): Parameter of EasyOCR that dictates the code of languages that model should recognise (leave blank for default for given OCR model version).	`None`

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def ocr_image_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    model: str = "doctr",
    version: Optional[str] = None,
    quantize: Optional[bool] = None,
    generate_bounding_boxes: Optional[bool] = None,
    language_codes: Optional[List[str]] = None,
) -> Union[dict, List[dict]]:
    """Run OCR on input image(s) asynchronously.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s) for OCR.
        model (str, optional): OCR model to use ('doctr' or 'trocr'). Defaults to "doctr".
        version (Optional[str], optional): Model version to use. Defaults to None.
            For trocr, supported versions are: 'trocr-small-printed', 'trocr-base-printed', 'trocr-large-printed'.
        quantize: (Optional[bool]): flag of EasyOCR to decide which version of model to load
        generate_bounding_boxes: (Optional[bool]): flag of some models (like DocTR) to decide if output variant
            with sv.Detections(...) compatible bounding boxes should be returned (due to historical reasons, some
            old implementations were flattening detected OCR structure into text and were only returning that as
            results).
        language_codes: (Optional[List[str]]): Parameter of EasyOCR that dictates the code of languages that
            model should recognise (leave blank for default for given OCR model version).
    Returns:
        Union[dict, List[dict]]: OCR results for the input image(s).

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    encoded_inference_inputs = await load_static_inference_input_async(
        inference_input=inference_input,
    )
    payload = self.__initialise_payload()
    if version:
        key = f"{model.lower()}_version_id"
        payload[key] = version
    if quantize is not None:
        payload["quantize"] = quantize
    if generate_bounding_boxes is not None:
        payload["generate_bounding_boxes"] = generate_bounding_boxes
    if language_codes is not None:
        payload["language_codes"] = language_codes
    model_path = resolve_ocr_path(model_name=model)
    url = self.__wrap_url_with_api_key(f"{self.__api_url}{model_path}")
    requests_data = prepare_requests_data(
        url=url,
        encoded_inference_inputs=encoded_inference_inputs,
        headers=DEFAULT_HEADERS,
        parameters=None,
        payload=payload,
        max_batch_size=1,
        image_placement=ImagePlacement.JSON,
    )
    responses = await execute_requests_packages_async(
        requests_data=requests_data,
        request_method=RequestMethod.POST,
        max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
    )
    return unwrap_single_element_list(sequence=responses)

pause_inference_pipeline ¶

pause_inference_pipeline(pipeline_id)

Pauses a running inference pipeline.

Sends a request to pause the specified inference pipeline. The pipeline must be currently running for this operation to succeed.

Parameters:

Name	Type	Description	Default
`pipeline_id`	`str`	The unique identifier of the inference pipeline to pause.	required

Returns:

Name	Type	Description
`dict`	`dict`	A dictionary containing the response from the server about the pause operation.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.
`ValueError`	If pipeline_id is empty or None.

Source code in inference_sdk/http/client.py

@experimental(
    info="Video processing in inference server is under development. Breaking changes are possible."
)
@wrap_errors
def pause_inference_pipeline(self, pipeline_id: str) -> dict:
    """Pauses a running inference pipeline.

    Sends a request to pause the specified inference pipeline. The pipeline must be
    currently running for this operation to succeed.

    Args:
        pipeline_id: The unique identifier of the inference pipeline to pause.

    Returns:
        dict: A dictionary containing the response from the server about the pause operation.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
        ValueError: If pipeline_id is empty or None.
    """
    self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
    payload = {"api_key": self.__api_key}
    response = requests.post(
        f"{self.__api_url}/inference_pipelines/{pipeline_id}/pause",
        json=payload,
    )
    api_key_safe_raise_for_status(response=response)
    return response.json()

resume_inference_pipeline ¶

resume_inference_pipeline(pipeline_id)

Resumes a paused inference pipeline.

Sends a request to resume the specified inference pipeline. The pipeline must be currently paused for this operation to succeed.

Parameters:

Name	Type	Description	Default
`pipeline_id`	`str`	The unique identifier of the inference pipeline to resume.	required

Returns:

Name	Type	Description
`dict`	`dict`	A dictionary containing the response from the server about the resume operation.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.
`ValueError`	If pipeline_id is empty or None.

Source code in inference_sdk/http/client.py

@experimental(
    info="Video processing in inference server is under development. Breaking changes are possible."
)
@wrap_errors
def resume_inference_pipeline(self, pipeline_id: str) -> dict:
    """Resumes a paused inference pipeline.

    Sends a request to resume the specified inference pipeline. The pipeline must be
    currently paused for this operation to succeed.

    Args:
        pipeline_id: The unique identifier of the inference pipeline to resume.

    Returns:
        dict: A dictionary containing the response from the server about the resume operation.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
        ValueError: If pipeline_id is empty or None.
    """
    self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
    payload = {"api_key": self.__api_key}
    response = requests.post(
        f"{self.__api_url}/inference_pipelines/{pipeline_id}/resume",
        json=payload,
    )
    api_key_safe_raise_for_status(response=response)
    return response.json()

run_workflow ¶

run_workflow(
    workspace_name=None,
    workflow_id=None,
    specification=None,
    images=None,
    parameters=None,
    excluded_fields=None,
    use_cache=True,
    enable_profiling=False,
    workflow_version_id=None,
)

Run inference using a workflow specification.

Triggers inference from workflow specification at the inference HTTP side. Either (workspace_name and workflow_id) or workflow_specification must be provided. In the first case - definition of workflow will be fetched from Roboflow API, in the latter - workflow_specification will be used. images and parameters will be merged into workflow inputs, the distinction is made to make sure the SDK can easily serialise images and prepare a proper payload. Supported images are numpy arrays, PIL.Image and base64 images, links to images and local paths. excluded_fields will be added to request to filter out results of workflow execution at the server side.

Important! Method is not compatible with inference server <=0.9.18. Please migrate to newer version of the server before end of Q2 2024. Until that is done - use old method: infer_from_workflow(...).

Note

Method is not compatible with inference server <=0.9.18. Please migrate to newer version of the server before end of Q2 2024. Until that is done - use old method: infer_from_workflow(...).

Parameters:

Name	Type	Description	Default
`workspace_name`	`Optional[str]`	Name of the workspace containing the workflow. Defaults to None.	`None`
`workflow_id`	`Optional[str]`	ID of the workflow. Defaults to None.	`None`
`specification`	`Optional[dict]`	Direct workflow specification. Defaults to None.	`None`
`images`	`Optional[Dict[str, Any]]`	Images to process. Defaults to None.	`None`
`parameters`	`Optional[Dict[str, Any]]`	Additional parameters for the workflow. Defaults to None.	`None`
`excluded_fields`	`Optional[List[str]]`	Fields to exclude from results. Defaults to None.	`None`
`use_cache`	`bool`	Whether to use cached results. Defaults to True.	`True`
`enable_profiling`	`bool`	Whether to enable profiling. Defaults to False.	`False`

Returns:

Type	Description
`List[Dict[str, Any]]`	List[Dict[str, Any]]: Results of the workflow execution.

Raises:

Type	Description
`InvalidParameterError`	If neither workflow identifiers nor specification is provided.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def run_workflow(
    self,
    workspace_name: Optional[str] = None,
    workflow_id: Optional[str] = None,
    specification: Optional[dict] = None,
    images: Optional[Dict[str, Any]] = None,
    parameters: Optional[Dict[str, Any]] = None,
    excluded_fields: Optional[List[str]] = None,
    use_cache: bool = True,
    enable_profiling: bool = False,
    workflow_version_id: Optional[str] = None,
) -> List[Dict[str, Any]]:
    """Run inference using a workflow specification.

    Triggers inference from workflow specification at the inference HTTP
    side. Either (`workspace_name` and `workflow_id`) or `workflow_specification` must be
    provided. In the first case - definition of workflow will be fetched
    from Roboflow API, in the latter - `workflow_specification` will be
    used. `images` and `parameters` will be merged into workflow inputs,
    the distinction is made to make sure the SDK can easily serialise
    images and prepare a proper payload. Supported images are numpy arrays,
    PIL.Image and base64 images, links to images and local paths.
    `excluded_fields` will be added to request to filter out results
    of workflow execution at the server side.

    **Important!**
    Method is not compatible with inference server <=0.9.18. Please migrate to newer version of
    the server before end of Q2 2024. Until that is done - use old method: infer_from_workflow(...).

    Note:
        Method is not compatible with inference server <=0.9.18. Please migrate to newer version of
        the server before end of Q2 2024. Until that is done - use old method: infer_from_workflow(...).

    Args:
        workspace_name (Optional[str], optional): Name of the workspace containing the workflow. Defaults to None.
        workflow_id (Optional[str], optional): ID of the workflow. Defaults to None.
        specification (Optional[dict], optional): Direct workflow specification. Defaults to None.
        images (Optional[Dict[str, Any]], optional): Images to process. Defaults to None.
        parameters (Optional[Dict[str, Any]], optional): Additional parameters for the workflow. Defaults to None.
        excluded_fields (Optional[List[str]], optional): Fields to exclude from results. Defaults to None.
        use_cache (bool, optional): Whether to use cached results. Defaults to True.
        enable_profiling (bool, optional): Whether to enable profiling. Defaults to False.

    Returns:
        List[Dict[str, Any]]: Results of the workflow execution.

    Raises:
        InvalidParameterError: If neither workflow identifiers nor specification is provided.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    return self._run_workflow(
        workspace_name=workspace_name,
        workflow_id=workflow_id,
        specification=specification,
        images=images,
        parameters=parameters,
        excluded_fields=excluded_fields,
        legacy_endpoints=False,
        use_cache=use_cache,
        enable_profiling=enable_profiling,
        workflow_version_id=workflow_version_id,
    )

sam2_segment_image ¶

sam2_segment_image(
    inference_input,
    prompts=None,
    sam2_version_id="hiera_tiny",
    multimask_output=True,
    mask_input_format="json",
)

Run Segment Anything 2 (SAM2) segmentation on input image(s).

This method performs instance segmentation using SAM2, which can segment objects based on point or box prompts.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for segmentation. Can be file paths, URLs, base64 strings, numpy arrays, or PIL images.	required
`prompts`	`Optional[List[dict]]`	List of prompt dictionaries. Each prompt can contain: - "box": {"x": float, "y": float, "width": float, "height": float} - "points": [{"x": float, "y": float, "positive": bool}, ...] Defaults to None (automatic segmentation).	`None`
`sam2_version_id`	`str`	Version of SAM2 model to use. Options are "hiera_large", "hiera_small", "hiera_tiny", "hiera_b_plus". Defaults to "hiera_tiny".	`'hiera_tiny'`
`multimask_output`	`bool`	Whether to output multiple masks per prompt. Defaults to True.	`True`
`mask_input_format`	`str`	Format for mask output. Defaults to "json".	`'json'`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Segmentation results containing predictions with masks, confidence scores, and bounding boxes.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def sam2_segment_image(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    prompts: Optional[List[dict]] = None,
    sam2_version_id: str = "hiera_tiny",
    multimask_output: bool = True,
    mask_input_format: str = "json",
) -> Union[dict, List[dict]]:
    """Run Segment Anything 2 (SAM2) segmentation on input image(s).

    This method performs instance segmentation using SAM2, which can segment
    objects based on point or box prompts.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
            for segmentation. Can be file paths, URLs, base64 strings, numpy arrays,
            or PIL images.
        prompts (Optional[List[dict]], optional): List of prompt dictionaries. Each prompt
            can contain:
            - "box": {"x": float, "y": float, "width": float, "height": float}
            - "points": [{"x": float, "y": float, "positive": bool}, ...]
            Defaults to None (automatic segmentation).
        sam2_version_id (str, optional): Version of SAM2 model to use. Options are
            "hiera_large", "hiera_small", "hiera_tiny", "hiera_b_plus".
            Defaults to "hiera_tiny".
        multimask_output (bool, optional): Whether to output multiple masks per prompt.
            Defaults to True.
        mask_input_format (str, optional): Format for mask output. Defaults to "json".

    Returns:
        Union[dict, List[dict]]: Segmentation results containing predictions with masks,
            confidence scores, and bounding boxes.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    extra_payload = {
        "sam2_version_id": sam2_version_id,
        "multimask_output": multimask_output,
        "format": mask_input_format,
    }
    if prompts is not None:
        extra_payload["prompts"] = {"prompts": prompts}
    result = self._post_images(
        inference_input=inference_input,
        endpoint="/sam2/segment_image",
        extra_payload=extra_payload,
    )
    return result

sam2_segment_image_async `async` ¶

sam2_segment_image_async(
    inference_input,
    prompts=None,
    sam2_version_id="hiera_tiny",
    multimask_output=True,
    mask_input_format="json",
)

Run Segment Anything 2 (SAM2) segmentation on input image(s) asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for segmentation.	required
`prompts`	`Optional[List[dict]]`	List of prompt dictionaries. Defaults to None.	`None`
`sam2_version_id`	`str`	Version of SAM2 model. Defaults to "hiera_tiny".	`'hiera_tiny'`
`multimask_output`	`bool`	Whether to output multiple masks. Defaults to True.	`True`
`mask_input_format`	`str`	Format for mask output. Defaults to "json".	`'json'`

Returns:

Type	Description
`Union[dict, List[dict]]`	Union[dict, List[dict]]: Segmentation results.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def sam2_segment_image_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    prompts: Optional[List[dict]] = None,
    sam2_version_id: str = "hiera_tiny",
    multimask_output: bool = True,
    mask_input_format: str = "json",
) -> Union[dict, List[dict]]:
    """Run Segment Anything 2 (SAM2) segmentation on input image(s) asynchronously.

    Args:
        inference_input (Union[ImagesReference, List[ImagesReference]]): Input image(s)
            for segmentation.
        prompts (Optional[List[dict]], optional): List of prompt dictionaries.
            Defaults to None.
        sam2_version_id (str, optional): Version of SAM2 model. Defaults to "hiera_tiny".
        multimask_output (bool, optional): Whether to output multiple masks. Defaults to True.
        mask_input_format (str, optional): Format for mask output. Defaults to "json".

    Returns:
        Union[dict, List[dict]]: Segmentation results.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    extra_payload = {
        "sam2_version_id": sam2_version_id,
        "multimask_output": multimask_output,
        "format": mask_input_format,
    }
    if prompts is not None:
        extra_payload["prompts"] = {"prompts": prompts}
    result = await self._post_images_async(
        inference_input=inference_input,
        endpoint="/sam2/segment_image",
        extra_payload=extra_payload,
    )
    return result

sam3_3d_infer ¶

sam3_3d_infer(
    inference_input,
    mask_input,
    model_id="sam3-3d-objects",
    *,
    output_meshes=True,
    output_scene=True,
    with_mesh_postprocess=True,
    with_texture_baking=True,
    use_distillations=False
)

Generate 3D meshes and Gaussian splatting from a 2D image with mask prompts.

This method uses SAM3 3D to generate 3D representations from 2D images with mask prompts.

Parameters:

Name	Type	Description	Default
`inference_input`	`ImagesReference`	Input image for 3D generation. Can be a file path, URL, base64 string, numpy array, or PIL image.	required
`mask_input`	`Any`	Mask input in any supported format: - Polygon coordinates: [x1, y1, x2, y2, ...] - Binary mask (as numpy array or base64) - RLE dictionary - List of any of the above for multiple masks	required
`model_id`	`str`	The SAM3 3D model to use. Defaults to "sam3-3d-objects".	`'sam3-3d-objects'`
`output_meshes`	`bool`	SAM3 3D always outputs object gaussians, and can optionally output object meshes if output_meshes is True. Defaults to True.	`True`
`output_scene`	`bool`	Output the combined scene reconstruction in addition to individual object reconstructions. Defaults to True.	`True`
`with_mesh_postprocess`	`bool`	Enable mesh postprocessing. Defaults to True.	`True`
`with_texture_baking`	`bool`	Enable texture baking for meshes. Defaults to True.	`True`
`use_distillations`	`bool`	Use the distilled versions of the model components.	`False`

Returns:

Name	Type	Description
`dict`	`dict`	Response containing base64-encoded 3D outputs: - mesh_glb: Scene mesh in GLB format (base64 encoded) if output_meshes=True, otherwise None. - gaussian_ply: Combined Gaussian splatting in PLY format (base64 encoded) - objects: List of individual objects, each containing: - mesh_glb: Object mesh (base64) if output_scene=True and output_meshes=True, otherwise None. - gaussian_ply: Object Gaussian (base64) if output_scene=True, otherwise None. - metadata: {"rotation": [...], "translation": [...], "scale": [...]} - time: Inference time in seconds

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def sam3_3d_infer(
    self,
    inference_input: ImagesReference,
    mask_input: Any,
    model_id: str = "sam3-3d-objects",
    *,
    output_meshes: bool = True,
    output_scene: bool = True,
    with_mesh_postprocess: bool = True,
    with_texture_baking: bool = True,
    use_distillations: bool = False,
) -> dict:
    """Generate 3D meshes and Gaussian splatting from a 2D image with mask prompts.

    This method uses SAM3 3D to generate 3D representations from 2D images
    with mask prompts.

    Args:
        inference_input (ImagesReference): Input image for 3D generation.
            Can be a file path, URL, base64 string, numpy array, or PIL image.
        mask_input (Any): Mask input in any supported format:
            - Polygon coordinates: [x1, y1, x2, y2, ...]
            - Binary mask (as numpy array or base64)
            - RLE dictionary
            - List of any of the above for multiple masks
        model_id (str, optional): The SAM3 3D model to use. Defaults to "sam3-3d-objects".
        output_meshes (bool, optional): SAM3 3D always outputs object gaussians, and can
            optionally output object meshes if output_meshes is True. Defaults to True.
        output_scene (bool, optional): Output the combined scene reconstruction in
            addition to individual object reconstructions. Defaults to True.
        with_mesh_postprocess (bool, optional): Enable mesh postprocessing. Defaults to True.
        with_texture_baking (bool, optional): Enable texture baking for meshes. Defaults to True.
        use_distillations (bool, optional): Use the distilled versions of the model components.

    Returns:
        dict: Response containing base64-encoded 3D outputs:
            - mesh_glb: Scene mesh in GLB format (base64 encoded) if output_meshes=True, otherwise None.
            - gaussian_ply: Combined Gaussian splatting in PLY format (base64 encoded)
            - objects: List of individual objects, each containing:
                - mesh_glb: Object mesh (base64) if output_scene=True and output_meshes=True, otherwise None.
                - gaussian_ply: Object Gaussian (base64) if output_scene=True, otherwise None.
                - metadata: {"rotation": [...], "translation": [...], "scale": [...]}
            - time: Inference time in seconds

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    encoded_inference_inputs = load_static_inference_input(
        inference_input=inference_input,
    )
    payload = self.__initialise_payload()
    payload["model_id"] = model_id
    payload["mask_input"] = mask_input
    payload["output_meshes"] = output_meshes
    payload["output_scene"] = output_scene
    payload["with_mesh_postprocess"] = with_mesh_postprocess
    payload["with_texture_baking"] = with_texture_baking
    payload["use_distillations"] = use_distillations

    url = self.__wrap_url_with_api_key(f"{self.__api_url}/sam3_3d/infer")
    requests_data = prepare_requests_data(
        url=url,
        encoded_inference_inputs=encoded_inference_inputs,
        headers=DEFAULT_HEADERS,
        parameters=None,
        payload=payload,
        max_batch_size=1,
        image_placement=ImagePlacement.JSON,
    )
    responses = execute_requests_packages(
        requests_data=requests_data,
        request_method=RequestMethod.POST,
        max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
    )
    return responses[0].json()

sam3_3d_infer_async `async` ¶

sam3_3d_infer_async(
    inference_input,
    mask_input,
    model_id="sam3-3d-objects",
    *,
    output_meshes=True,
    output_scene=True,
    with_mesh_postprocess=True,
    with_texture_baking=True,
    use_distillations=False
)

Generate 3D meshes and Gaussian splatting from a 2D image asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`ImagesReference`	Input image for 3D generation.	required
`mask_input`	`Any`	Mask input in any supported format.	required
`model_id`	`str`	The SAM3 3D model to use. Defaults to "sam3-3d-objects".	`'sam3-3d-objects'`
`output_meshes`	`bool`	SAM3 3D always outputs object gaussians, and can optionally output object meshes if output_meshes is True. Defaults to True.	`True`
`output_scene`	`bool`	Output the combined scene reconstruction in addition to individual object reconstructions. Defaults to True.	`True`
`with_mesh_postprocess`	`bool`	Enable mesh postprocessing. Defaults to True.	`True`
`with_texture_baking`	`bool`	Enable texture baking for meshes. Defaults to True.	`True`
`use_distillations`	`bool`	Use the distilled versions of the model components.	`False`

Returns:

Name	Type	Description
`dict`	`dict`	Response containing base64-encoded 3D outputs: - mesh_glb: Scene mesh in GLB format (base64 encoded) if output_meshes=True, otherwise None. - gaussian_ply: Combined Gaussian splatting in PLY format (base64 encoded) - objects: List of individual objects, each containing: - mesh_glb: Object mesh (base64) if output_scene=True and output_meshes=True, otherwise None. - gaussian_ply: Object Gaussian (base64) if output_scene=True, otherwise None. - metadata: {"rotation": [...], "translation": [...], "scale": [...]} - time: Inference time in seconds

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def sam3_3d_infer_async(
    self,
    inference_input: ImagesReference,
    mask_input: Any,
    model_id: str = "sam3-3d-objects",
    *,
    output_meshes: bool = True,
    output_scene: bool = True,
    with_mesh_postprocess: bool = True,
    with_texture_baking: bool = True,
    use_distillations: bool = False,
) -> dict:
    """Generate 3D meshes and Gaussian splatting from a 2D image asynchronously.

    Args:
        inference_input (ImagesReference): Input image for 3D generation.
        mask_input (Any): Mask input in any supported format.
        model_id (str, optional): The SAM3 3D model to use. Defaults to "sam3-3d-objects".
        output_meshes (bool, optional): SAM3 3D always outputs object gaussians, and can
            optionally output object meshes if output_meshes is True. Defaults to True.
        output_scene (bool, optional): Output the combined scene reconstruction in
            addition to individual object reconstructions. Defaults to True.
        with_mesh_postprocess (bool, optional): Enable mesh postprocessing. Defaults to True.
        with_texture_baking (bool, optional): Enable texture baking for meshes. Defaults to True.
        use_distillations (bool, optional): Use the distilled versions of the model components.

    Returns:
        dict: Response containing base64-encoded 3D outputs:
            - mesh_glb: Scene mesh in GLB format (base64 encoded) if output_meshes=True, otherwise None.
            - gaussian_ply: Combined Gaussian splatting in PLY format (base64 encoded)
            - objects: List of individual objects, each containing:
                - mesh_glb: Object mesh (base64) if output_scene=True and output_meshes=True, otherwise None.
                - gaussian_ply: Object Gaussian (base64) if output_scene=True, otherwise None.
                - metadata: {"rotation": [...], "translation": [...], "scale": [...]}
            - time: Inference time in seconds

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    encoded_inference_inputs = await load_static_inference_input_async(
        inference_input=inference_input,
    )
    payload = self.__initialise_payload()
    payload["model_id"] = model_id
    payload["mask_input"] = mask_input
    payload["output_meshes"] = output_meshes
    payload["output_scene"] = output_scene
    payload["with_mesh_postprocess"] = with_mesh_postprocess
    payload["with_texture_baking"] = with_texture_baking
    payload["use_distillations"] = use_distillations

    url = self.__wrap_url_with_api_key(f"{self.__api_url}/sam3_3d/infer")
    requests_data = prepare_requests_data(
        url=url,
        encoded_inference_inputs=encoded_inference_inputs,
        headers=DEFAULT_HEADERS,
        parameters=None,
        payload=payload,
        max_batch_size=1,
        image_placement=ImagePlacement.JSON,
    )
    responses = await execute_requests_packages_async(
        requests_data=requests_data,
        request_method=RequestMethod.POST,
        max_concurrent_requests=self.__inference_configuration.max_concurrent_requests,
    )
    return responses[0]

sam3_concept_segment ¶

sam3_concept_segment(
    inference_input,
    prompts,
    model_id="sam3/sam3_final",
    output_prob_thresh=0.5,
    nms_iou_threshold=None,
    format="polygon",
)

Run SAM3 promptable concept segmentation (PCS) on input image(s).

Performs zero-shot instance segmentation using text or visual prompts.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for segmentation.	required
`prompts`	`List[dict]`	List of prompt dicts, each with keys like "type", "text", "output_prob_thresh", "boxes", "box_labels".	required
`model_id`	`str`	SAM3 model to use. Defaults to "sam3/sam3_final".	`'sam3/sam3_final'`
`output_prob_thresh`	`float`	Global confidence threshold. Defaults to 0.5.	`0.5`
`nms_iou_threshold`	`Optional[float]`	IoU threshold for cross-prompt NMS. None disables NMS.	`None`
`format`	`str`	Output mask format, "polygon" or "rle". Defaults to "polygon".	`'polygon'`

Returns:

Type	Description
`Union[dict, List[dict]]`	Segmentation results with prompt_results containing predictions.

Source code in inference_sdk/http/client.py

@wrap_errors
def sam3_concept_segment(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    prompts: List[dict],
    model_id: str = "sam3/sam3_final",
    output_prob_thresh: float = 0.5,
    nms_iou_threshold: Optional[float] = None,
    format: str = "polygon",
) -> Union[dict, List[dict]]:
    """Run SAM3 promptable concept segmentation (PCS) on input image(s).

    Performs zero-shot instance segmentation using text or visual prompts.

    Args:
        inference_input: Input image(s) for segmentation.
        prompts: List of prompt dicts, each with keys like "type", "text",
            "output_prob_thresh", "boxes", "box_labels".
        model_id: SAM3 model to use. Defaults to "sam3/sam3_final".
        output_prob_thresh: Global confidence threshold. Defaults to 0.5.
        nms_iou_threshold: IoU threshold for cross-prompt NMS. None disables NMS.
        format: Output mask format, "polygon" or "rle". Defaults to "polygon".

    Returns:
        Segmentation results with prompt_results containing predictions.
    """
    extra_payload = {
        "model_id": model_id,
        "prompts": prompts,
        "output_prob_thresh": output_prob_thresh,
        "format": format,
    }
    if nms_iou_threshold is not None:
        extra_payload["nms_iou_threshold"] = nms_iou_threshold
    return self._post_images(
        inference_input=inference_input,
        endpoint="/sam3/concept_segment",
        extra_payload=extra_payload,
    )

sam3_concept_segment_async `async` ¶

sam3_concept_segment_async(
    inference_input,
    prompts,
    model_id="sam3/sam3_final",
    output_prob_thresh=0.5,
    nms_iou_threshold=None,
    format="polygon",
)

Run SAM3 promptable concept segmentation (PCS) asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for segmentation.	required
`prompts`	`List[dict]`	List of prompt dicts.	required
`model_id`	`str`	SAM3 model to use. Defaults to "sam3/sam3_final".	`'sam3/sam3_final'`
`output_prob_thresh`	`float`	Global confidence threshold. Defaults to 0.5.	`0.5`
`nms_iou_threshold`	`Optional[float]`	IoU threshold for cross-prompt NMS. None disables NMS.	`None`
`format`	`str`	Output mask format, "polygon" or "rle". Defaults to "polygon".	`'polygon'`

Returns:

Type	Description
`Union[dict, List[dict]]`	Segmentation results with prompt_results containing predictions.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def sam3_concept_segment_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    prompts: List[dict],
    model_id: str = "sam3/sam3_final",
    output_prob_thresh: float = 0.5,
    nms_iou_threshold: Optional[float] = None,
    format: str = "polygon",
) -> Union[dict, List[dict]]:
    """Run SAM3 promptable concept segmentation (PCS) asynchronously.

    Args:
        inference_input: Input image(s) for segmentation.
        prompts: List of prompt dicts.
        model_id: SAM3 model to use. Defaults to "sam3/sam3_final".
        output_prob_thresh: Global confidence threshold. Defaults to 0.5.
        nms_iou_threshold: IoU threshold for cross-prompt NMS. None disables NMS.
        format: Output mask format, "polygon" or "rle". Defaults to "polygon".

    Returns:
        Segmentation results with prompt_results containing predictions.
    """
    extra_payload = {
        "model_id": model_id,
        "prompts": prompts,
        "output_prob_thresh": output_prob_thresh,
        "format": format,
    }
    if nms_iou_threshold is not None:
        extra_payload["nms_iou_threshold"] = nms_iou_threshold
    return await self._post_images_async(
        inference_input=inference_input,
        endpoint="/sam3/concept_segment",
        extra_payload=extra_payload,
    )

sam3_embed_image ¶

sam3_embed_image(inference_input, image_id=None)

Generate SAM3 image embeddings.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) to embed.	required
`image_id`	`Optional[str]`	Optional cache ID for embeddings. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Embedding results with image_id and processing time.

Source code in inference_sdk/http/client.py

@wrap_errors
def sam3_embed_image(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    image_id: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Generate SAM3 image embeddings.

    Args:
        inference_input: Input image(s) to embed.
        image_id: Optional cache ID for embeddings. Defaults to None.

    Returns:
        Embedding results with image_id and processing time.
    """
    extra_payload = {}
    if image_id is not None:
        extra_payload["image_id"] = image_id
    return self._post_images(
        inference_input=inference_input,
        endpoint="/sam3/embed_image",
        extra_payload=extra_payload if extra_payload else None,
    )

sam3_embed_image_async `async` ¶

sam3_embed_image_async(inference_input, image_id=None)

Generate SAM3 image embeddings asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) to embed.	required
`image_id`	`Optional[str]`	Optional cache ID for embeddings. Defaults to None.	`None`

Returns:

Type	Description
`Union[dict, List[dict]]`	Embedding results with image_id and processing time.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def sam3_embed_image_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    image_id: Optional[str] = None,
) -> Union[dict, List[dict]]:
    """Generate SAM3 image embeddings asynchronously.

    Args:
        inference_input: Input image(s) to embed.
        image_id: Optional cache ID for embeddings. Defaults to None.

    Returns:
        Embedding results with image_id and processing time.
    """
    extra_payload = {}
    if image_id is not None:
        extra_payload["image_id"] = image_id
    return await self._post_images_async(
        inference_input=inference_input,
        endpoint="/sam3/embed_image",
        extra_payload=extra_payload if extra_payload else None,
    )

sam3_visual_segment ¶

sam3_visual_segment(
    inference_input,
    prompts=None,
    multimask_output=True,
    mask_input_format="json",
)

Run SAM3 promptable visual segmentation (PVS) on input image(s).

Performs instance segmentation using point or box prompts.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for segmentation.	required
`prompts`	`Optional[List[dict]]`	List of prompt dicts with "box" and/or "points" keys. Defaults to None (automatic segmentation).	`None`
`multimask_output`	`bool`	Whether to output multiple masks per prompt. Defaults to True.	`True`
`mask_input_format`	`str`	Format for mask output. Defaults to "json".	`'json'`

Returns:

Type	Description
`Union[dict, List[dict]]`	Segmentation results containing predictions with masks.

Source code in inference_sdk/http/client.py

@wrap_errors
def sam3_visual_segment(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    prompts: Optional[List[dict]] = None,
    multimask_output: bool = True,
    mask_input_format: str = "json",
) -> Union[dict, List[dict]]:
    """Run SAM3 promptable visual segmentation (PVS) on input image(s).

    Performs instance segmentation using point or box prompts.

    Args:
        inference_input: Input image(s) for segmentation.
        prompts: List of prompt dicts with "box" and/or "points" keys.
            Defaults to None (automatic segmentation).
        multimask_output: Whether to output multiple masks per prompt.
            Defaults to True.
        mask_input_format: Format for mask output. Defaults to "json".

    Returns:
        Segmentation results containing predictions with masks.
    """
    extra_payload = {
        "multimask_output": multimask_output,
        "format": mask_input_format,
    }
    if prompts is not None:
        extra_payload["prompts"] = {"prompts": prompts}
    return self._post_images(
        inference_input=inference_input,
        endpoint="/sam3/visual_segment",
        extra_payload=extra_payload,
    )

sam3_visual_segment_async `async` ¶

sam3_visual_segment_async(
    inference_input,
    prompts=None,
    multimask_output=True,
    mask_input_format="json",
)

Run SAM3 promptable visual segmentation (PVS) asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	Input image(s) for segmentation.	required
`prompts`	`Optional[List[dict]]`	List of prompt dicts. Defaults to None.	`None`
`multimask_output`	`bool`	Whether to output multiple masks. Defaults to True.	`True`
`mask_input_format`	`str`	Format for mask output. Defaults to "json".	`'json'`

Returns:

Type	Description
`Union[dict, List[dict]]`	Segmentation results containing predictions with masks.

Source code in inference_sdk/http/client.py

@wrap_errors_async
async def sam3_visual_segment_async(
    self,
    inference_input: Union[ImagesReference, List[ImagesReference]],
    prompts: Optional[List[dict]] = None,
    multimask_output: bool = True,
    mask_input_format: str = "json",
) -> Union[dict, List[dict]]:
    """Run SAM3 promptable visual segmentation (PVS) asynchronously.

    Args:
        inference_input: Input image(s) for segmentation.
        prompts: List of prompt dicts. Defaults to None.
        multimask_output: Whether to output multiple masks. Defaults to True.
        mask_input_format: Format for mask output. Defaults to "json".

    Returns:
        Segmentation results containing predictions with masks.
    """
    extra_payload = {
        "multimask_output": multimask_output,
        "format": mask_input_format,
    }
    if prompts is not None:
        extra_payload["prompts"] = {"prompts": prompts}
    return await self._post_images_async(
        inference_input=inference_input,
        endpoint="/sam3/visual_segment",
        extra_payload=extra_payload,
    )

select_api_v0 ¶

select_api_v0()

Select API version 0 for client operations.

Returns:

Name	Type	Description
`InferenceHTTPClient`	`InferenceHTTPClient`	The client instance with API v0 selected.

Source code in inference_sdk/http/client.py

def select_api_v0(self) -> "InferenceHTTPClient":
    """Select API version 0 for client operations.

    Returns:
        InferenceHTTPClient: The client instance with API v0 selected.
    """
    self.__client_mode = HTTPClientMode.V0
    return self

select_api_v1 ¶

select_api_v1()

Select API version 1 for client operations.

Returns:

Name	Type	Description
`InferenceHTTPClient`	`InferenceHTTPClient`	The client instance with API v1 selected.

Source code in inference_sdk/http/client.py

def select_api_v1(self) -> "InferenceHTTPClient":
    """Select API version 1 for client operations.

    Returns:
        InferenceHTTPClient: The client instance with API v1 selected.
    """
    self.__client_mode = HTTPClientMode.V1
    return self

select_model ¶

select_model(model_id)

Select a model for inference operations.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier of the model to select.	required

Returns:

Name	Type	Description
`InferenceHTTPClient`	`InferenceHTTPClient`	The client instance with the selected model.

Source code in inference_sdk/http/client.py

def select_model(self, model_id: str) -> "InferenceHTTPClient":
    """Select a model for inference operations.

    Args:
        model_id (str): The identifier of the model to select.

    Returns:
        InferenceHTTPClient: The client instance with the selected model.
    """
    self.__selected_model = model_id
    return self

start_inference_pipeline_with_workflow ¶

start_inference_pipeline_with_workflow(
    video_reference,
    workflow_specification=None,
    workspace_name=None,
    workflow_id=None,
    image_input_name="image",
    workflows_parameters=None,
    workflows_thread_pool_workers=4,
    cancel_thread_pool_tasks_on_exit=True,
    video_metadata_input_name="video_metadata",
    max_fps=None,
    source_buffer_filling_strategy="DROP_OLDEST",
    source_buffer_consumption_strategy="EAGER",
    video_source_properties=None,
    batch_collection_timeout=None,
    results_buffer_size=64,
)

Starts an inference pipeline using a workflow specification.

Parameters:

Name	Type	Description	Default
`video_reference`	`Union[str, int, List[Union[str, int]]]`	Path to video file, camera index, or list of video sources. Can be a string path, integer camera index, or list of either.	required
`workflow_specification`	`Optional[dict]`	Optional workflow specification dictionary. Mutually exclusive with workspace_name/workflow_id.	`None`
`workspace_name`	`Optional[str]`	Optional name of workspace containing workflow. Must be used with workflow_id.	`None`
`workflow_id`	`Optional[str]`	Optional ID of workflow to use. Must be used with workspace_name.	`None`
`image_input_name`	`str`	Name of the image input node in workflow. Defaults to "image".	`'image'`
`workflows_parameters`	`Optional[Dict[str, Any]]`	Optional parameters to pass to workflow.	`None`
`workflows_thread_pool_workers`	`int`	Number of worker threads for workflow execution. Defaults to 4.	`4`
`cancel_thread_pool_tasks_on_exit`	`bool`	Whether to cancel pending tasks when exiting. Defaults to True.	`True`
`video_metadata_input_name`	`str`	Name of video metadata input in workflow. Defaults to "video_metadata".	`'video_metadata'`
`max_fps`	`Optional[Union[float, int]]`	Optional maximum FPS to process video at.	`None`
`source_buffer_filling_strategy`	`Optional[BufferFillingStrategy]`	Strategy for filling source buffer when full. One of: "WAIT", "DROP_OLDEST", "ADAPTIVE_DROP_OLDEST", "DROP_LATEST", "ADAPTIVE_DROP_LATEST". Defaults to "DROP_OLDEST".	`'DROP_OLDEST'`
`source_buffer_consumption_strategy`	`Optional[BufferConsumptionStrategy]`	Strategy for consuming from source buffer. One of: "LAZY", "EAGER". Defaults to "EAGER".	`'EAGER'`
`video_source_properties`	`Optional[Dict[str, float]]`	Optional dictionary of video source properties.	`None`
`batch_collection_timeout`	`Optional[float]`	Optional timeout for batch collection in seconds.	`None`
`results_buffer_size`	`int`	Size of results buffer. Defaults to 64.	`64`

Returns:

Name	Type	Description
`dict`	`dict`	Response containing pipeline initialization details.

Raises:

Type	Description
`InvalidParameterError`	If workflow specification parameters are invalid.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@experimental(
    info="Video processing in inference server is under development. Breaking changes are possible."
)
@wrap_errors
def start_inference_pipeline_with_workflow(
    self,
    video_reference: Union[str, int, List[Union[str, int]]],
    workflow_specification: Optional[dict] = None,
    workspace_name: Optional[str] = None,
    workflow_id: Optional[str] = None,
    image_input_name: str = "image",
    workflows_parameters: Optional[Dict[str, Any]] = None,
    workflows_thread_pool_workers: int = 4,
    cancel_thread_pool_tasks_on_exit: bool = True,
    video_metadata_input_name: str = "video_metadata",
    max_fps: Optional[Union[float, int]] = None,
    source_buffer_filling_strategy: Optional[BufferFillingStrategy] = "DROP_OLDEST",
    source_buffer_consumption_strategy: Optional[
        BufferConsumptionStrategy
    ] = "EAGER",
    video_source_properties: Optional[Dict[str, float]] = None,
    batch_collection_timeout: Optional[float] = None,
    results_buffer_size: int = 64,
) -> dict:
    """Starts an inference pipeline using a workflow specification.

    Args:
        video_reference: Path to video file, camera index, or list of video sources.
            Can be a string path, integer camera index, or list of either.
        workflow_specification: Optional workflow specification dictionary. Mutually
            exclusive with workspace_name/workflow_id.
        workspace_name: Optional name of workspace containing workflow. Must be used
            with workflow_id.
        workflow_id: Optional ID of workflow to use. Must be used with workspace_name.
        image_input_name: Name of the image input node in workflow. Defaults to "image".
        workflows_parameters: Optional parameters to pass to workflow.
        workflows_thread_pool_workers: Number of worker threads for workflow execution.
            Defaults to 4.
        cancel_thread_pool_tasks_on_exit: Whether to cancel pending tasks when exiting.
            Defaults to True.
        video_metadata_input_name: Name of video metadata input in workflow.
            Defaults to "video_metadata".
        max_fps: Optional maximum FPS to process video at.
        source_buffer_filling_strategy: Strategy for filling source buffer when full.
            One of: "WAIT", "DROP_OLDEST", "ADAPTIVE_DROP_OLDEST", "DROP_LATEST",
            "ADAPTIVE_DROP_LATEST". Defaults to "DROP_OLDEST".
        source_buffer_consumption_strategy: Strategy for consuming from source buffer.
            One of: "LAZY", "EAGER". Defaults to "EAGER".
        video_source_properties: Optional dictionary of video source properties.
        batch_collection_timeout: Optional timeout for batch collection in seconds.
        results_buffer_size: Size of results buffer. Defaults to 64.

    Returns:
        dict: Response containing pipeline initialization details.

    Raises:
        InvalidParameterError: If workflow specification parameters are invalid.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    named_workflow_specified = (workspace_name is not None) and (
        workflow_id is not None
    )
    if not (named_workflow_specified != (workflow_specification is not None)):
        raise InvalidParameterError(
            "Parameters (`workspace_name`, `workflow_id`) can be used mutually exclusive with "
            "`workflow_specification`, but at least one must be set."
        )
    payload = {
        "api_key": self.__api_key,
        "video_configuration": {
            "type": "VideoConfiguration",
            "video_reference": video_reference,
            "max_fps": max_fps,
            "source_buffer_filling_strategy": source_buffer_filling_strategy,
            "source_buffer_consumption_strategy": source_buffer_consumption_strategy,
            "video_source_properties": video_source_properties,
            "batch_collection_timeout": batch_collection_timeout,
        },
        "processing_configuration": {
            "type": "WorkflowConfiguration",
            "workflow_specification": workflow_specification,
            "workspace_name": workspace_name,
            "workflow_id": workflow_id,
            "image_input_name": image_input_name,
            "workflows_parameters": workflows_parameters,
            "workflows_thread_pool_workers": workflows_thread_pool_workers,
            "cancel_thread_pool_tasks_on_exit": cancel_thread_pool_tasks_on_exit,
            "video_metadata_input_name": video_metadata_input_name,
        },
        "sink_configuration": {
            "type": "MemorySinkConfiguration",
            "results_buffer_size": results_buffer_size,
        },
    }
    response = requests.post(
        f"{self.__api_url}/inference_pipelines/initialise",
        json=payload,
    )
    response.raise_for_status()
    return response.json()

terminate_inference_pipeline ¶

terminate_inference_pipeline(pipeline_id)

Terminates a running inference pipeline.

Sends a request to terminate the specified inference pipeline. This will stop all processing and free up associated resources.

Parameters:

Name	Type	Description	Default
`pipeline_id`	`str`	The unique identifier of the inference pipeline to terminate.	required

Returns:

Name	Type	Description
`dict`	`dict`	A dictionary containing the response from the server about the termination operation.

Raises:

Type	Description
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.
`ValueError`	If pipeline_id is empty or None.

Source code in inference_sdk/http/client.py

@experimental(
    info="Video processing in inference server is under development. Breaking changes are possible."
)
@wrap_errors
def terminate_inference_pipeline(self, pipeline_id: str) -> dict:
    """Terminates a running inference pipeline.

    Sends a request to terminate the specified inference pipeline. This will stop all
    processing and free up associated resources.

    Args:
        pipeline_id: The unique identifier of the inference pipeline to terminate.

    Returns:
        dict: A dictionary containing the response from the server about the termination operation.

    Raises:
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
        ValueError: If pipeline_id is empty or None.
    """
    self._ensure_pipeline_id_not_empty(pipeline_id=pipeline_id)
    payload = {"api_key": self.__api_key}
    response = requests.post(
        f"{self.__api_url}/inference_pipelines/{pipeline_id}/terminate",
        json=payload,
    )
    api_key_safe_raise_for_status(response=response)
    return response.json()

unload_model ¶

unload_model(model_id)

Unload a model from the server.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier of the model to unload.	required

Returns:

Name	Type	Description
`RegisteredModels`	`RegisteredModels`	Updated information about registered models.

Raises:

Type	Description
`WrongClientModeError`	If not in API v1 mode.
`HTTPCallErrorError`	If there is an error in the HTTP call.
`HTTPClientError`	If there is an error with the server connection.

Source code in inference_sdk/http/client.py

@wrap_errors
def unload_model(self, model_id: str) -> RegisteredModels:
    """Unload a model from the server.

    Args:
        model_id (str): The identifier of the model to unload.

    Returns:
        RegisteredModels: Updated information about registered models.

    Raises:
        WrongClientModeError: If not in API v1 mode.
        HTTPCallErrorError: If there is an error in the HTTP call.
        HTTPClientError: If there is an error with the server connection.
    """
    self.__ensure_v1_client_mode()
    de_aliased_model_id = resolve_roboflow_model_alias(model_id=model_id)
    response = requests.post(
        f"{self.__api_url}/model/remove",
        json={
            "model_id": de_aliased_model_id,
        },
        headers=DEFAULT_HEADERS,
    )
    response.raise_for_status()
    response_payload = response.json()
    if (
        de_aliased_model_id == self.__selected_model
        or model_id == self.__selected_model
    ):
        self.__selected_model = None
    return RegisteredModels.from_dict(response_payload)

use_api_v0 ¶

use_api_v0()

Temporarily use API version 0 for client operations.

Yields:

Type	Description
`InferenceHTTPClient`	Generator[InferenceHTTPClient, None, None]: The client instance temporarily using API v0.

Source code in inference_sdk/http/client.py

@contextmanager
def use_api_v0(self) -> Generator["InferenceHTTPClient", None, None]:
    """Temporarily use API version 0 for client operations.

    Yields:
        Generator[InferenceHTTPClient, None, None]: The client instance temporarily using API v0.
    """
    previous_client_mode = self.__client_mode
    self.__client_mode = HTTPClientMode.V0
    try:
        yield self
    finally:
        self.__client_mode = previous_client_mode

use_api_v1 ¶

use_api_v1()

Temporarily use API version 1 for client operations.

Yields:

Type	Description
`InferenceHTTPClient`	Generator[InferenceHTTPClient, None, None]: The client instance temporarily using API v1.

Source code in inference_sdk/http/client.py

@contextmanager
def use_api_v1(self) -> Generator["InferenceHTTPClient", None, None]:
    """Temporarily use API version 1 for client operations.

    Yields:
        Generator[InferenceHTTPClient, None, None]: The client instance temporarily using API v1.
    """
    previous_client_mode = self.__client_mode
    self.__client_mode = HTTPClientMode.V1
    try:
        yield self
    finally:
        self.__client_mode = previous_client_mode

use_configuration ¶

use_configuration(inference_configuration)

Temporarily use a different inference configuration.

Parameters:

Name	Type	Description	Default
`inference_configuration`	`InferenceConfiguration`	The temporary configuration to use.	required

Yields:

Type	Description
`InferenceHTTPClient`	Generator[InferenceHTTPClient, None, None]: The client instance with temporary configuration.

Source code in inference_sdk/http/client.py

@contextmanager
def use_configuration(
    self, inference_configuration: InferenceConfiguration
) -> Generator["InferenceHTTPClient", None, None]:
    """Temporarily use a different inference configuration.

    Args:
        inference_configuration (InferenceConfiguration): The temporary configuration to use.

    Yields:
        Generator[InferenceHTTPClient, None, None]: The client instance with temporary configuration.
    """
    previous_configuration = self.__inference_configuration
    self.__inference_configuration = inference_configuration
    try:
        yield self
    finally:
        self.__inference_configuration = previous_configuration

use_model ¶

use_model(model_id)

Temporarily use a specific model for inference operations.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The identifier of the model to use.	required

Yields:

Type	Description
`InferenceHTTPClient`	Generator[InferenceHTTPClient, None, None]: The client instance temporarily using the specified model.

Source code in inference_sdk/http/client.py

@contextmanager
def use_model(self, model_id: str) -> Generator["InferenceHTTPClient", None, None]:
    """Temporarily use a specific model for inference operations.

    Args:
        model_id (str): The identifier of the model to use.

    Yields:
        Generator[InferenceHTTPClient, None, None]: The client instance temporarily using the specified model.
    """
    previous_model = self.__selected_model
    self.__selected_model = model_id
    try:
        yield self
    finally:
        self.__selected_model = previous_model

Functions¶

inference_sdk.http.entities ¶

Classes¶

HTTPClientMode ¶

Bases: str, Enum

Enum for the HTTP client mode.

Attributes:

Name	Type	Description
`V0`		The version 0 of the HTTP client.
`V1`		The version 1 of the HTTP client.

Source code in inference_sdk/http/entities.py

class HTTPClientMode(str, Enum):
    """Enum for the HTTP client mode.

    Attributes:
        V0: The version 0 of the HTTP client.
        V1: The version 1 of the HTTP client.
    """

    V0 = "v0"
    V1 = "v1"

InferenceConfiguration `dataclass` ¶

Dataclass for inference configuration.

Attributes:

Name	Type	Description
`confidence_threshold`	`Optional[float]`	The confidence threshold for the inference.
`keypoint_confidence_threshold`	`Optional[float]`	The keypoint confidence threshold for the inference.
`format`	`Optional[str]`	The format for the inference.
`mask_decode_mode`	`Optional[str]`	The mask decode mode for the inference.
`tradeoff_factor`	`Optional[float]`	The tradeoff factor for the inference.
`max_candidates`	`Optional[int]`	The maximum number of candidates for the inference.
`max_detections`	`Optional[int]`	The maximum number of detections for the inference.
`iou_threshold`	`Optional[float]`	The intersection over union threshold for the inference.
`stroke_width`	`Optional[int]`	The stroke width for the inference.

Source code in inference_sdk/http/entities.py

@dataclass(frozen=True)
class InferenceConfiguration:
    """Dataclass for inference configuration.

    Attributes:
        confidence_threshold: The confidence threshold for the inference.
        keypoint_confidence_threshold: The keypoint confidence threshold for the inference.
        format: The format for the inference.
        mask_decode_mode: The mask decode mode for the inference.
        tradeoff_factor: The tradeoff factor for the inference.
        max_candidates: The maximum number of candidates for the inference.
        max_detections: The maximum number of detections for the inference.
        iou_threshold: The intersection over union threshold for the inference.
        stroke_width: The stroke width for the inference.
    """

    confidence_threshold: Optional[float] = None
    keypoint_confidence_threshold: Optional[float] = None
    format: Optional[str] = None
    mask_decode_mode: Optional[str] = None
    tradeoff_factor: Optional[float] = None
    max_candidates: Optional[int] = None
    max_detections: Optional[int] = None
    iou_threshold: Optional[float] = None
    stroke_width: Optional[int] = None
    count_inference: Optional[bool] = None
    service_secret: Optional[str] = None
    disable_preproc_auto_orientation: Optional[bool] = None
    disable_preproc_contrast: Optional[bool] = None
    disable_preproc_grayscale: Optional[bool] = None
    disable_preproc_static_crop: Optional[bool] = None
    class_agnostic_nms: Optional[bool] = None
    class_filter: Optional[List[str]] = None
    fix_batch_size: Optional[bool] = None
    visualize_predictions: bool = False
    visualize_labels: Optional[bool] = None
    output_visualisation_format: VisualisationResponseFormat = (
        VisualisationResponseFormat.BASE64
    )
    image_extensions_for_directory_scan: Optional[List[str]] = field(
        default_factory=lambda: DEFAULT_IMAGE_EXTENSIONS,
    )
    client_downsizing_disabled: bool = True
    default_max_input_size: int = DEFAULT_MAX_INPUT_SIZE
    disable_active_learning: bool = False
    active_learning_target_dataset: Optional[str] = None
    max_concurrent_requests: int = 1
    max_batch_size: int = 1
    source: Optional[str] = None
    source_info: Optional[str] = None
    profiling_directory: str = "./inference_profiling"
    workflow_run_retries_enabled: bool = WORKFLOW_RUN_RETRIES_ENABLED

    @classmethod
    def init_default(cls) -> "InferenceConfiguration":
        return cls()

    def to_api_call_parameters(
        self, client_mode: HTTPClientMode, task_type: TaskType
    ) -> Dict[str, Any]:
        """Convert the current configuration to API call parameters.

        Args:
            client_mode: The HTTP client mode.
            task_type: The type of task the model is designed for.

        Returns:
            Dict[str, Any]: The API call parameters.
        """
        if client_mode is HTTPClientMode.V0:
            return self.to_legacy_call_parameters()
        if task_type == OBJECT_DETECTION_TASK:
            return self.to_object_detection_parameters()
        if task_type == INSTANCE_SEGMENTATION_TASK:
            return self.to_instance_segmentation_parameters()
        if task_type == CLASSIFICATION_TASK:
            return self.to_classification_parameters()
        if task_type == KEYPOINTS_DETECTION_TASK:
            return self.to_keypoints_detection_parameters()
        raise ModelTaskTypeNotSupportedError(
            f"Model task {task_type} is not supported by API v1 client."
        )

    def to_object_detection_parameters(self) -> Dict[str, Any]:
        """Convert the current configuration to object detection parameters.

        Returns:
            Dict[str, Any]: The object detection parameters.
        """
        parameters_specs = [
            ("disable_preproc_auto_orientation", "disable_preproc_auto_orient"),
            ("disable_preproc_contrast", "disable_preproc_contrast"),
            ("disable_preproc_grayscale", "disable_preproc_grayscale"),
            ("disable_preproc_static_crop", "disable_preproc_static_crop"),
            ("class_agnostic_nms", "class_agnostic_nms"),
            ("class_filter", "class_filter"),
            ("confidence_threshold", "confidence"),
            ("fix_batch_size", "fix_batch_size"),
            ("iou_threshold", "iou_threshold"),
            ("max_detections", "max_detections"),
            ("max_candidates", "max_candidates"),
            ("visualize_labels", "visualization_labels"),
            ("stroke_width", "visualization_stroke_width"),
            ("visualize_predictions", "visualize_predictions"),
            ("disable_active_learning", "disable_active_learning"),
            ("active_learning_target_dataset", "active_learning_target_dataset"),
            ("source", "source"),
            ("source_info", "source_info"),
        ]
        return get_non_empty_attributes(
            source_object=self,
            specification=parameters_specs,
        )

    def to_keypoints_detection_parameters(self) -> Dict[str, Any]:
        """Convert the current configuration to keypoints detection parameters.

        Returns:
            Dict[str, Any]: The keypoints detection parameters.
        """
        parameters = self.to_object_detection_parameters()
        parameters["keypoint_confidence"] = self.keypoint_confidence_threshold
        return remove_empty_values(dictionary=parameters)

    def to_instance_segmentation_parameters(self) -> Dict[str, Any]:
        """Convert the current configuration to instance segmentation parameters.

        Returns:
            Dict[str, Any]: The instance segmentation parameters.
        """
        parameters = self.to_object_detection_parameters()
        parameters_specs = [
            ("mask_decode_mode", "mask_decode_mode"),
            ("tradeoff_factor", "tradeoff_factor"),
        ]
        for internal_name, external_name in parameters_specs:
            parameters[external_name] = getattr(self, internal_name)
        return remove_empty_values(dictionary=parameters)

    def to_classification_parameters(self) -> Dict[str, Any]:
        """Convert the current configuration to classification parameters.

        Returns:
            Dict[str, Any]: The classification parameters.
        """
        parameters_specs = [
            ("disable_preproc_auto_orientation", "disable_preproc_auto_orient"),
            ("disable_preproc_contrast", "disable_preproc_contrast"),
            ("disable_preproc_grayscale", "disable_preproc_grayscale"),
            ("disable_preproc_static_crop", "disable_preproc_static_crop"),
            ("confidence_threshold", "confidence"),
            ("visualize_predictions", "visualize_predictions"),
            ("stroke_width", "visualization_stroke_width"),
            ("disable_active_learning", "disable_active_learning"),
            ("source", "source"),
            ("source_info", "source_info"),
            ("active_learning_target_dataset", "active_learning_target_dataset"),
        ]
        return get_non_empty_attributes(
            source_object=self,
            specification=parameters_specs,
        )

    def to_legacy_call_parameters(self) -> Dict[str, Any]:
        """Convert the current configuration to legacy call parameters.

        Returns:
            Dict[str, Any]: The legacy call parameters.
        """
        parameters_specs = [
            ("confidence_threshold", "confidence"),
            ("keypoint_confidence_threshold", "keypoint_confidence"),
            ("format", "format"),
            ("visualize_labels", "labels"),
            ("mask_decode_mode", "mask_decode_mode"),
            ("tradeoff_factor", "tradeoff_factor"),
            ("max_detections", "max_detections"),
            ("iou_threshold", "overlap"),
            ("stroke_width", "stroke"),
            ("count_inference", "countinference"),
            ("service_secret", "service_secret"),
            ("disable_preproc_auto_orientation", "disable_preproc_auto_orient"),
            ("disable_preproc_contrast", "disable_preproc_contrast"),
            ("disable_preproc_grayscale", "disable_preproc_grayscale"),
            ("disable_preproc_static_crop", "disable_preproc_static_crop"),
            ("disable_active_learning", "disable_active_learning"),
            ("active_learning_target_dataset", "active_learning_target_dataset"),
            ("source", "source"),
            ("source_info", "source_info"),
        ]
        return get_non_empty_attributes(
            source_object=self,
            specification=parameters_specs,
        )

Functions¶

to_api_call_parameters ¶

to_api_call_parameters(client_mode, task_type)

Convert the current configuration to API call parameters.

Parameters:

Name	Type	Description	Default
`client_mode`	`HTTPClientMode`	The HTTP client mode.	required
`task_type`	`TaskType`	The type of task the model is designed for.	required

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The API call parameters.

Source code in inference_sdk/http/entities.py

def to_api_call_parameters(
    self, client_mode: HTTPClientMode, task_type: TaskType
) -> Dict[str, Any]:
    """Convert the current configuration to API call parameters.

    Args:
        client_mode: The HTTP client mode.
        task_type: The type of task the model is designed for.

    Returns:
        Dict[str, Any]: The API call parameters.
    """
    if client_mode is HTTPClientMode.V0:
        return self.to_legacy_call_parameters()
    if task_type == OBJECT_DETECTION_TASK:
        return self.to_object_detection_parameters()
    if task_type == INSTANCE_SEGMENTATION_TASK:
        return self.to_instance_segmentation_parameters()
    if task_type == CLASSIFICATION_TASK:
        return self.to_classification_parameters()
    if task_type == KEYPOINTS_DETECTION_TASK:
        return self.to_keypoints_detection_parameters()
    raise ModelTaskTypeNotSupportedError(
        f"Model task {task_type} is not supported by API v1 client."
    )

to_classification_parameters ¶

to_classification_parameters()

Convert the current configuration to classification parameters.

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The classification parameters.

Source code in inference_sdk/http/entities.py

def to_classification_parameters(self) -> Dict[str, Any]:
    """Convert the current configuration to classification parameters.

    Returns:
        Dict[str, Any]: The classification parameters.
    """
    parameters_specs = [
        ("disable_preproc_auto_orientation", "disable_preproc_auto_orient"),
        ("disable_preproc_contrast", "disable_preproc_contrast"),
        ("disable_preproc_grayscale", "disable_preproc_grayscale"),
        ("disable_preproc_static_crop", "disable_preproc_static_crop"),
        ("confidence_threshold", "confidence"),
        ("visualize_predictions", "visualize_predictions"),
        ("stroke_width", "visualization_stroke_width"),
        ("disable_active_learning", "disable_active_learning"),
        ("source", "source"),
        ("source_info", "source_info"),
        ("active_learning_target_dataset", "active_learning_target_dataset"),
    ]
    return get_non_empty_attributes(
        source_object=self,
        specification=parameters_specs,
    )

to_instance_segmentation_parameters ¶

to_instance_segmentation_parameters()

Convert the current configuration to instance segmentation parameters.

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The instance segmentation parameters.

Source code in inference_sdk/http/entities.py

def to_instance_segmentation_parameters(self) -> Dict[str, Any]:
    """Convert the current configuration to instance segmentation parameters.

    Returns:
        Dict[str, Any]: The instance segmentation parameters.
    """
    parameters = self.to_object_detection_parameters()
    parameters_specs = [
        ("mask_decode_mode", "mask_decode_mode"),
        ("tradeoff_factor", "tradeoff_factor"),
    ]
    for internal_name, external_name in parameters_specs:
        parameters[external_name] = getattr(self, internal_name)
    return remove_empty_values(dictionary=parameters)

to_keypoints_detection_parameters ¶

to_keypoints_detection_parameters()

Convert the current configuration to keypoints detection parameters.

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The keypoints detection parameters.

Source code in inference_sdk/http/entities.py

def to_keypoints_detection_parameters(self) -> Dict[str, Any]:
    """Convert the current configuration to keypoints detection parameters.

    Returns:
        Dict[str, Any]: The keypoints detection parameters.
    """
    parameters = self.to_object_detection_parameters()
    parameters["keypoint_confidence"] = self.keypoint_confidence_threshold
    return remove_empty_values(dictionary=parameters)

to_legacy_call_parameters ¶

to_legacy_call_parameters()

Convert the current configuration to legacy call parameters.

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The legacy call parameters.

Source code in inference_sdk/http/entities.py

def to_legacy_call_parameters(self) -> Dict[str, Any]:
    """Convert the current configuration to legacy call parameters.

    Returns:
        Dict[str, Any]: The legacy call parameters.
    """
    parameters_specs = [
        ("confidence_threshold", "confidence"),
        ("keypoint_confidence_threshold", "keypoint_confidence"),
        ("format", "format"),
        ("visualize_labels", "labels"),
        ("mask_decode_mode", "mask_decode_mode"),
        ("tradeoff_factor", "tradeoff_factor"),
        ("max_detections", "max_detections"),
        ("iou_threshold", "overlap"),
        ("stroke_width", "stroke"),
        ("count_inference", "countinference"),
        ("service_secret", "service_secret"),
        ("disable_preproc_auto_orientation", "disable_preproc_auto_orient"),
        ("disable_preproc_contrast", "disable_preproc_contrast"),
        ("disable_preproc_grayscale", "disable_preproc_grayscale"),
        ("disable_preproc_static_crop", "disable_preproc_static_crop"),
        ("disable_active_learning", "disable_active_learning"),
        ("active_learning_target_dataset", "active_learning_target_dataset"),
        ("source", "source"),
        ("source_info", "source_info"),
    ]
    return get_non_empty_attributes(
        source_object=self,
        specification=parameters_specs,
    )

to_object_detection_parameters ¶

to_object_detection_parameters()

Convert the current configuration to object detection parameters.

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The object detection parameters.

Source code in inference_sdk/http/entities.py

def to_object_detection_parameters(self) -> Dict[str, Any]:
    """Convert the current configuration to object detection parameters.

    Returns:
        Dict[str, Any]: The object detection parameters.
    """
    parameters_specs = [
        ("disable_preproc_auto_orientation", "disable_preproc_auto_orient"),
        ("disable_preproc_contrast", "disable_preproc_contrast"),
        ("disable_preproc_grayscale", "disable_preproc_grayscale"),
        ("disable_preproc_static_crop", "disable_preproc_static_crop"),
        ("class_agnostic_nms", "class_agnostic_nms"),
        ("class_filter", "class_filter"),
        ("confidence_threshold", "confidence"),
        ("fix_batch_size", "fix_batch_size"),
        ("iou_threshold", "iou_threshold"),
        ("max_detections", "max_detections"),
        ("max_candidates", "max_candidates"),
        ("visualize_labels", "visualization_labels"),
        ("stroke_width", "visualization_stroke_width"),
        ("visualize_predictions", "visualize_predictions"),
        ("disable_active_learning", "disable_active_learning"),
        ("active_learning_target_dataset", "active_learning_target_dataset"),
        ("source", "source"),
        ("source_info", "source_info"),
    ]
    return get_non_empty_attributes(
        source_object=self,
        specification=parameters_specs,
    )

ModelDescription `dataclass` ¶

Bases: DataClassJsonMixin

Dataclass for model description.

Attributes:

Name	Type	Description
`model_id`	`str`	The unique identifier of the model.
`task_type`	`TaskType`	The type of task the model is designed for.
`batch_size`	`Optional[Union[int, str]]`	The batch size for the model.
`input_height`	`Optional[int]`	The height of the input image.
`input_width`	`Optional[int]`	The width of the input image.

Source code in inference_sdk/http/entities.py

@dataclass(frozen=True)
class ModelDescription(DataClassJsonMixin):
    """Dataclass for model description.

    Attributes:
        model_id: The unique identifier of the model.
        task_type: The type of task the model is designed for.
        batch_size: The batch size for the model.
        input_height: The height of the input image.
        input_width: The width of the input image.
    """

    model_id: str
    task_type: TaskType
    batch_size: Optional[Union[int, str]] = None
    input_height: Optional[int] = None
    input_width: Optional[int] = None

RegisteredModels `dataclass` ¶

Bases: DataClassJsonMixin

Dataclass for registered models.

Attributes:

Name	Type	Description
`models`	`List[ModelDescription]`	A list of model descriptions.

Source code in inference_sdk/http/entities.py

@dataclass(frozen=True)
class RegisteredModels(DataClassJsonMixin):
    """Dataclass for registered models.

    Attributes:
        models: A list of model descriptions.
    """

    models: List[ModelDescription]

ServerInfo `dataclass` ¶

Bases: DataClassJsonMixin

Dataclass for Information about the inference server.

Attributes:

Name	Type	Description
`name`	`str`	The name of the inference server.
`version`	`str`	The version of the inference server.
`uuid`	`str`	The unique identifier of the inference server instance.

Source code in inference_sdk/http/entities.py

@dataclass(frozen=True)
class ServerInfo(DataClassJsonMixin):
    """Dataclass for Information about the inference server.

    Attributes:
        name: The name of the inference server.
        version: The version of the inference server.
        uuid: The unique identifier of the inference server instance.
    """

    name: str
    version: str
    uuid: str

VisualisationResponseFormat ¶

Bases: str, Enum

Enum for the visualisation response format.

Attributes:

Name	Type	Description
`BASE64`		The base64 format.
`NUMPY`		The numpy format.
`PILLOW`		The pillow format.

Source code in inference_sdk/http/entities.py

class VisualisationResponseFormat(str, Enum):
    """Enum for the visualisation response format.

    Attributes:
        BASE64: The base64 format.
        NUMPY: The numpy format.
        PILLOW: The pillow format.
    """

    BASE64 = "base64"
    NUMPY = "numpy"
    PILLOW = "pillow"

Functions¶

get_non_empty_attributes ¶

get_non_empty_attributes(source_object, specification)

Get non-empty attributes from the source object.

Parameters:

Name	Type	Description	Default
`source_object`	`object`	The source object.	required
`specification`	`List[Tuple[str, str]]`	The specification of the attributes.	required

Returns:

Type	Description
`Dict[str, Any]`	Dict[str, Any]: The non-empty attributes.

Source code in inference_sdk/http/entities.py

def get_non_empty_attributes(
    source_object: object, specification: List[Tuple[str, str]]
) -> Dict[str, Any]:
    """Get non-empty attributes from the source object.

    Args:
        source_object: The source object.
        specification: The specification of the attributes.

    Returns:
        Dict[str, Any]: The non-empty attributes.
    """
    attributes = {
        external_name: getattr(source_object, internal_name)
        for internal_name, external_name in specification
    }
    return remove_empty_values(dictionary=attributes)

inference_sdk.http.errors ¶

Classes¶

APIKeyNotProvided ¶

Bases: HTTPClientError

Error for API key not provided.

Source code in inference_sdk/http/errors.py

class APIKeyNotProvided(HTTPClientError):
    """Error for API key not provided."""

    pass

EncodingError ¶

Bases: HTTPClientError

Error for encoding errors.

Source code in inference_sdk/http/errors.py

class EncodingError(HTTPClientError):
    """Error for encoding errors."""

    pass

HTTPCallErrorError ¶

Bases: HTTPClientError

Error for HTTP call errors.

Attributes:

Name	Type	Description
`description`	`str`	The description of the error.
`status_code`	`int`	The status code of the error.
`api_message`	`str`	The API message of the error.

Source code in inference_sdk/http/errors.py

class HTTPCallErrorError(HTTPClientError):
    """Error for HTTP call errors.

    Attributes:
        description: The description of the error.
        status_code: The status code of the error.
        api_message: The API message of the error.
    """

    def __init__(
        self,
        description: str,
        status_code: int,
        api_message: Optional[str],
    ):
        super().__init__(description)
        self.__description = description
        self.__api_message = api_message
        self.__status_code = status_code

    @property
    def description(self) -> str:
        """The description of the error."""
        return self.__description

    @property
    def api_message(self) -> str:
        """The API message of the error."""
        return self.__api_message

    @property
    def status_code(self) -> int:
        """The status code of the error."""
        return self.__status_code

    def __repr__(self) -> str:
        return (
            f"{self.__class__.__name__}("
            f"description='{self.description}', "
            f"api_message='{self.api_message}',"
            f"status_code={self.__status_code})"
        )

    def __str__(self) -> str:
        return self.__repr__()

Attributes¶

api_message `property` ¶

api_message

The API message of the error.

description `property` ¶

description

The description of the error.

status_code `property` ¶

status_code

The status code of the error.

HTTPClientError ¶

Bases: Exception

Base class for HTTP client errors.

Source code in inference_sdk/http/errors.py

class HTTPClientError(Exception):
    """Base class for HTTP client errors."""

    pass

InvalidInputFormatError ¶

Bases: HTTPClientError

Error for invalid input format.

Source code in inference_sdk/http/errors.py

class InvalidInputFormatError(HTTPClientError):
    """Error for invalid input format."""

    pass

InvalidModelIdentifier ¶

Bases: HTTPClientError

Error for invalid model identifier.

Source code in inference_sdk/http/errors.py

class InvalidModelIdentifier(HTTPClientError):
    """Error for invalid model identifier."""

    pass

InvalidParameterError ¶

Bases: HTTPClientError

Error for invalid parameter.

Source code in inference_sdk/http/errors.py

class InvalidParameterError(HTTPClientError):
    """Error for invalid parameter."""

    pass

ModelNotInitializedError ¶

Bases: HTTPClientError

Error for model not initialized.

Source code in inference_sdk/http/errors.py

class ModelNotInitializedError(HTTPClientError):
    """Error for model not initialized."""

    pass

ModelNotSelectedError ¶

Bases: HTTPClientError

Error for model not selected.

Source code in inference_sdk/http/errors.py

class ModelNotSelectedError(HTTPClientError):
    """Error for model not selected."""

    pass

ModelTaskTypeNotSupportedError ¶

Bases: HTTPClientError

Error for model task type not supported.

Source code in inference_sdk/http/errors.py

class ModelTaskTypeNotSupportedError(HTTPClientError):
    """Error for model task type not supported."""

    pass

WrongClientModeError ¶

Bases: HTTPClientError

Error for wrong client mode.

Source code in inference_sdk/http/errors.py

class WrongClientModeError(HTTPClientError):
    """Error for wrong client mode."""

    pass

`http/utils`¶

Internal utilities for request building, image encoding/decoding, response post-processing, retries, and API key handling.

inference_sdk.http.utils.aliases ¶

Functions¶

resolve_ocr_path ¶

resolve_ocr_path(model_name)

Resolve an OCR model name to its corresponding endpoint path.

Parameters:

Name	Type	Description	Default
`model_name`	`str`	The name of the OCR model.	required

Returns:

Type	Description
`str`	The endpoint path for the OCR model.

Source code in inference_sdk/http/utils/aliases.py

def resolve_ocr_path(model_name: str) -> str:
    """Resolve an OCR model name to its corresponding endpoint path.

    Args:
        model_name: The name of the OCR model.

    Returns:
        The endpoint path for the OCR model.
    """
    model_name = model_name.lower()
    if model_name not in OCR_ENDPOINTS:
        raise ValueError(f"OCR not supported: {model_name}")
    return OCR_ENDPOINTS[model_name]

resolve_roboflow_model_alias ¶

resolve_roboflow_model_alias(model_id)

Resolve a Roboflow model alias to a registered model ID.

Parameters:

Name	Type	Description	Default
`model_id`	`str`	The model alias to resolve.	required

Returns:

Type	Description
`str`	The registered model ID.

Source code in inference_sdk/http/utils/aliases.py

def resolve_roboflow_model_alias(model_id: str) -> str:
    """Resolve a Roboflow model alias to a registered model ID.

    Args:
        model_id: The model alias to resolve.

    Returns:
        The registered model ID.
    """
    return REGISTERED_ALIASES.get(model_id, model_id)

inference_sdk.http.utils.encoding ¶

Classes¶

Functions¶

bytes_to_opencv_image ¶

bytes_to_opencv_image(payload, array_type=np.uint8)

Decode a bytes object to an OpenCV image.

Parameters:

Name	Type	Description	Default
`payload`	`bytes`	The bytes object to decode.	required
`array_type`	`number`	The type of the array.	`uint8`

Returns:

Type	Description
`ndarray`	The OpenCV image.

Source code in inference_sdk/http/utils/encoding.py

def bytes_to_opencv_image(
    payload: bytes, array_type: np.number = np.uint8
) -> np.ndarray:
    """Decode a bytes object to an OpenCV image.

    Args:
        payload: The bytes object to decode.
        array_type: The type of the array.

    Returns:
        The OpenCV image.
    """
    bytes_array = np.frombuffer(payload, dtype=array_type)
    decoding_result = cv2.imdecode(bytes_array, cv2.IMREAD_UNCHANGED)
    if decoding_result is None:
        raise EncodingError("Could not encode bytes to OpenCV image.")
    return decoding_result

bytes_to_pillow_image ¶

bytes_to_pillow_image(payload)

Decode a bytes object to a PIL image.

Parameters:

Name	Type	Description	Default
`payload`	`bytes`	The bytes object to decode.	required

Returns:

Type	Description
`Image`	The PIL image.

Source code in inference_sdk/http/utils/encoding.py

def bytes_to_pillow_image(payload: bytes) -> Image.Image:
    """Decode a bytes object to a PIL image.

    Args:
        payload: The bytes object to decode.

    Returns:
        The PIL image.
    """
    buffer = BytesIO(payload)
    try:
        return Image.open(buffer)
    except UnidentifiedImageError as error:
        raise EncodingError("Could not encode bytes to PIL image.") from error

encode_base_64 ¶

encode_base_64(payload)

Encode a bytes object to a base64 string.

Parameters:

Name	Type	Description	Default
`payload`	`bytes`	The bytes object to encode.	required

Returns:

Type	Description
`str`	The base64 string.

Source code in inference_sdk/http/utils/encoding.py

def encode_base_64(payload: bytes) -> str:
    """Encode a bytes object to a base64 string.

    Args:
        payload: The bytes object to encode.

    Returns:
        The base64 string.
    """
    return base64.b64encode(payload).decode("utf-8")

numpy_array_to_base64_jpeg ¶

numpy_array_to_base64_jpeg(image)

Encode a numpy array to a base64 JPEG string.

Parameters:

Name	Type	Description	Default
`image`	`ndarray`	The numpy array to encode.	required

Returns:

Type	Description
`Union[str]`	The base64 JPEG string.

Source code in inference_sdk/http/utils/encoding.py

def numpy_array_to_base64_jpeg(
    image: np.ndarray,
) -> Union[str]:
    """Encode a numpy array to a base64 JPEG string.

    Args:
        image: The numpy array to encode.

    Returns:
        The base64 JPEG string.
    """
    _, img_encoded = cv2.imencode(".jpg", image)
    image_bytes = np.array(img_encoded).tobytes()
    return encode_base_64(payload=image_bytes)

pillow_image_to_base64_jpeg ¶

pillow_image_to_base64_jpeg(image)

Encode a PIL image to a base64 JPEG string.

Parameters:

Name	Type	Description	Default
`image`	`Image`	The PIL image to encode.	required

Returns:

Type	Description
`str`	The base64 JPEG string.

Source code in inference_sdk/http/utils/encoding.py

def pillow_image_to_base64_jpeg(image: Image.Image) -> str:
    """Encode a PIL image to a base64 JPEG string.

    Args:
        image: The PIL image to encode.

    Returns:
        The base64 JPEG string.
    """
    with BytesIO() as buffer:
        image.save(buffer, format="JPEG")
        return encode_base_64(payload=buffer.getvalue())

inference_sdk.http.utils.executors ¶

Classes¶

RequestMethod ¶

Bases: Enum

Enum for the request method.

Attributes:

Name	Type	Description
`GET`		The GET method.
`POST`		The POST method.

Source code in inference_sdk/http/utils/executors.py

class RequestMethod(Enum):
    """Enum for the request method.

    Attributes:
        GET: The GET method.
        POST: The POST method.
    """

    GET = "get"
    POST = "post"

Functions¶

execute_requests_packages ¶

execute_requests_packages(
    requests_data, request_method, max_concurrent_requests
)

Execute a list of requests in parallel.

Parameters:

Name	Type	Description	Default
`requests_data`	`List[RequestData]`	The list of requests to execute.	required
`request_method`	`RequestMethod`	The method to use for the requests.	required
`max_concurrent_requests`	`int`	The maximum number of concurrent requests.	required

Returns:

Type	Description
`List[Response]`	The list of responses.

Source code in inference_sdk/http/utils/executors.py

def execute_requests_packages(
    requests_data: List[RequestData],
    request_method: RequestMethod,
    max_concurrent_requests: int,
) -> List[Response]:
    """Execute a list of requests in parallel.

    Args:
        requests_data: The list of requests to execute.
        request_method: The method to use for the requests.
        max_concurrent_requests: The maximum number of concurrent requests.

    Returns:
        The list of responses.
    """
    requests_data_packages = make_batches(
        iterable=requests_data,
        batch_size=max_concurrent_requests,
    )
    results = []
    all_request_data = []
    for requests_data_package in requests_data_packages:
        responses = make_parallel_requests(
            requests_data=requests_data_package,
            request_method=request_method,
        )
        results.extend(responses)
        all_request_data.extend(requests_data_package)
    _collect_remote_processing_times(results, all_request_data)
    for response in results:
        api_key_safe_raise_for_status(response=response)
    return results

execute_requests_packages_async `async` ¶

execute_requests_packages_async(
    requests_data, request_method, max_concurrent_requests
)

Execute a list of requests in parallel asynchronously.

Parameters:

Name	Type	Description	Default
`requests_data`	`List[RequestData]`	The list of requests to execute.	required
`request_method`	`RequestMethod`	The method to use for the requests.	required
`max_concurrent_requests`	`int`	The maximum number of concurrent requests.	required

Returns:

Type	Description
`List[Union[dict, bytes]]`	The list of responses.

Source code in inference_sdk/http/utils/executors.py

async def execute_requests_packages_async(
    requests_data: List[RequestData],
    request_method: RequestMethod,
    max_concurrent_requests: int,
) -> List[Union[dict, bytes]]:
    """Execute a list of requests in parallel asynchronously.

    Args:
        requests_data: The list of requests to execute.
        request_method: The method to use for the requests.
        max_concurrent_requests: The maximum number of concurrent requests.

    Returns:
        The list of responses.
    """
    requests_data_packages = make_batches(
        iterable=requests_data,
        batch_size=max_concurrent_requests,
    )
    results = []
    for requests_data_package in requests_data_packages:
        responses = await make_parallel_requests_async(
            requests_data=requests_data_package,
            request_method=request_method,
        )
        results.extend(responses)
    return results

make_parallel_requests ¶

make_parallel_requests(requests_data, request_method)

Execute a list of requests in parallel.

Parameters:

Name	Type	Description	Default
`requests_data`	`List[RequestData]`	The list of requests to execute.	required
`request_method`	`RequestMethod`	The method to use for the requests.	required

Returns:

Type	Description
`List[Response]`	The list of responses.

Source code in inference_sdk/http/utils/executors.py

def make_parallel_requests(
    requests_data: List[RequestData],
    request_method: RequestMethod,
) -> List[Response]:
    """Execute a list of requests in parallel.

    Args:
        requests_data: The list of requests to execute.
        request_method: The method to use for the requests.

    Returns:
        The list of responses.
    """
    workers = len(requests_data)
    make_request_closure = partial(make_request, request_method=request_method)
    with ThreadPoolExecutor(max_workers=workers) as executor:
        return list(executor.map(make_request_closure, requests_data))

make_parallel_requests_async `async` ¶

make_parallel_requests_async(requests_data, request_method)

Execute a list of requests in parallel asynchronously.

Parameters:

Name	Type	Description	Default
`requests_data`	`List[RequestData]`	The list of requests to execute.	required
`request_method`	`RequestMethod`	The method to use for the requests.	required

Returns:

Type	Description
`List[Union[dict, bytes]]`	The list of responses.

Source code in inference_sdk/http/utils/executors.py

async def make_parallel_requests_async(
    requests_data: List[RequestData],
    request_method: RequestMethod,
) -> List[Union[dict, bytes]]:
    """Execute a list of requests in parallel asynchronously.

    Args:
        requests_data: The list of requests to execute.
        request_method: The method to use for the requests.

    Returns:
        The list of responses.
    """
    async with aiohttp.ClientSession() as session:
        make_request_closure = partial(
            make_request_async,
            request_method=request_method,
            session=session,
        )
        coroutines = [make_request_closure(data) for data in requests_data]
        responses = list(await asyncio.gather(*coroutines))
        return [r[1] for r in responses]

make_request ¶

make_request(request_data, request_method)

Make a request to the API.

Parameters:

Name	Type	Description	Default
`request_data`	`RequestData`	The request data.	required
`request_method`	`RequestMethod`	The method to use for the request.	required

Returns:

Type	Description
`Response`	The response from the API.

Source code in inference_sdk/http/utils/executors.py

@backoff.on_predicate(
    backoff.constant,
    predicate=lambda r: r.status_code in RETRYABLE_STATUS_CODES,
    max_tries=3,
    interval=1,
    backoff_log_level=logging.DEBUG,
    giveup_log_level=logging.DEBUG,
)
@backoff.on_exception(
    backoff.constant,
    exception=ConnectionError,
    max_tries=3,
    interval=1,
    backoff_log_level=logging.DEBUG,
    giveup_log_level=logging.DEBUG,
)
def make_request(request_data: RequestData, request_method: RequestMethod) -> Response:
    """Make a request to the API.

    Args:
        request_data: The request data.
        request_method: The method to use for the request.

    Returns:
        The response from the API.
    """
    method = requests.get if request_method is RequestMethod.GET else requests.post
    return method(
        request_data.url,
        headers=request_data.headers,
        params=request_data.parameters,
        data=request_data.data,
        json=request_data.payload,
    )

make_request_async `async` ¶

make_request_async(request_data, request_method, session)

Make a request to the API asynchronously.

Parameters:

Name	Type	Description	Default
`request_data`	`RequestData`	The request data.	required
`request_method`	`RequestMethod`	The method to use for the request.	required
`session`	`ClientSession`	The session to use for the request.	required

Returns:

Type	Description
`Tuple[int, Union[bytes, dict]]`	The response from the API.

Source code in inference_sdk/http/utils/executors.py

@backoff.on_predicate(
    backoff.constant,
    predicate=lambda r: r[0] in RETRYABLE_STATUS_CODES,
    max_tries=3,
    interval=1,
    on_giveup=raise_client_error,
    backoff_log_level=logging.DEBUG,
    giveup_log_level=logging.DEBUG,
)
@backoff.on_exception(
    backoff.constant,
    exception=ClientConnectionError,
    max_tries=3,
    interval=1,
    backoff_log_level=logging.DEBUG,
    giveup_log_level=logging.DEBUG,
)
async def make_request_async(
    request_data: RequestData,
    request_method: RequestMethod,
    session: aiohttp.ClientSession,
) -> Tuple[int, Union[bytes, dict]]:
    """Make a request to the API asynchronously.

    Args:
        request_data: The request data.
        request_method: The method to use for the request.
        session: The session to use for the request.

    Returns:
        The response from the API.
    """
    method = session.get if request_method is RequestMethod.GET else session.post
    parameters_serialised = None
    if request_data.parameters is not None:
        parameters_serialised = {
            name: (
                str(value)
                if not issubclass(type(value), list)
                else [str(e) for e in value]
            )
            for name, value in request_data.parameters.items()
        }
    async with method(
        request_data.url,
        headers=request_data.headers,
        params=parameters_serialised,
        data=request_data.data,
        json=request_data.payload,
    ) as response:
        try:
            response_data = await response.json()
        except:
            response_data = await response.read()
        if response_is_not_retryable_error(response=response):
            response.raise_for_status()
        return response.status, response_data

raise_client_error ¶

raise_client_error(details)

Raise a client error.

Parameters:

Name	Type	Description	Default
`details`	`dict`	The details of the error.	required

Source code in inference_sdk/http/utils/executors.py

def raise_client_error(details: dict) -> None:
    """Raise a client error.

    Args:
        details: The details of the error.
    """
    status_code = details["value"][0]
    request_data = details["kwargs"]["request_data"]
    raise ClientResponseError(
        request_info=RequestInfo(
            url=request_data.url,
            method="POST",
            headers={},
        ),
        history=(),
        status=status_code,
    )

response_is_not_retryable_error ¶

response_is_not_retryable_error(response)

Check if the response is not a retryable error.

Parameters:

Name	Type	Description	Default
`response`	`ClientResponse`	The response to check.	required

Returns:

Type	Description
`bool`	True if the response is not a retryable error, False otherwise.

Source code in inference_sdk/http/utils/executors.py

def response_is_not_retryable_error(response: ClientResponse) -> bool:
    """Check if the response is not a retryable error.

    Args:
        response: The response to check.

    Returns:
        True if the response is not a retryable error, False otherwise.
    """
    return response.status != 200 and response.status not in RETRYABLE_STATUS_CODES

inference_sdk.http.utils.iterables ¶

Functions¶

make_batches ¶

make_batches(iterable, batch_size)

Make batches from an iterable.

Parameters:

Name	Type	Description	Default
`iterable`	`Iterable[T]`	The iterable to make batches from.	required
`batch_size`	`int`	The size of the batches.	required

Returns:

Type	Description
`None`	The batches.

Source code in inference_sdk/http/utils/iterables.py

def make_batches(
    iterable: Iterable[T], batch_size: int
) -> Generator[List[T], None, None]:
    """Make batches from an iterable.

    Args:
        iterable: The iterable to make batches from.
        batch_size: The size of the batches.

    Returns:
        The batches.
    """
    batch_size = max(batch_size, 1)
    batch = []
    for element in iterable:
        batch.append(element)
        if len(batch) >= batch_size:
            yield batch
            batch = []
    if len(batch) > 0:
        yield batch

remove_empty_values ¶

remove_empty_values(dictionary)

Remove empty values from a dictionary.

Parameters:

Name	Type	Description	Default
`dictionary`	`dict`	The dictionary to remove empty values from.	required

Returns:

Type	Description
`dict`	The dictionary with empty values removed.

Source code in inference_sdk/http/utils/iterables.py

def remove_empty_values(dictionary: dict) -> dict:
    """Remove empty values from a dictionary.

    Args:
        dictionary: The dictionary to remove empty values from.

    Returns:
        The dictionary with empty values removed.
    """
    return {k: v for k, v in dictionary.items() if v is not None}

unwrap_single_element_list ¶

unwrap_single_element_list(sequence)

Unwrap a single element list.

Parameters:

Name	Type	Description	Default
`sequence`	`List[T]`	The list to unwrap.	required

Returns:

Type	Description
`Union[T, List[T]]`	The unwrapped list.

Source code in inference_sdk/http/utils/iterables.py

def unwrap_single_element_list(sequence: List[T]) -> Union[T, List[T]]:
    """Unwrap a single element list.

    Args:
        sequence: The list to unwrap.

    Returns:
        The unwrapped list.
    """
    if len(sequence) == 1:
        return sequence[0]
    return sequence

inference_sdk.http.utils.loaders ¶

Classes¶

Functions¶

load_directory_inference_input ¶

load_directory_inference_input(
    directory_path, image_extensions
)

Load an inference input from a directory.

Parameters:

Name	Type	Description	Default
`directory_path`	`str`	The path to the directory.	required
`image_extensions`	`Optional[List[str]]`	The extensions of the images.	required

Returns:

Type	Description
`None`	The generator of the inference input.

Source code in inference_sdk/http/utils/loaders.py

def load_directory_inference_input(
    directory_path: str,
    image_extensions: Optional[List[str]],
) -> Generator[Tuple[Union[str, int], np.ndarray], None, None]:
    """Load an inference input from a directory.

    Args:
        directory_path: The path to the directory.
        image_extensions: The extensions of the images.

    Returns:
        The generator of the inference input.
    """
    paths = {
        path.as_posix().lower()
        for path in sv.list_files_with_extensions(
            directory=directory_path,
            extensions=image_extensions,
        )
    }
    # making a set due to case-insensitive behaviour of Windows
    # see: https://stackoverflow.com/questions/7199039/file-paths-in-windows-environment-not-case-sensitive
    for path in paths:
        yield path, cv2.imread(path)

load_image_from_string ¶

load_image_from_string(
    reference, max_height=None, max_width=None
)

Load an image from a string.

Parameters:

Name	Type	Description	Default
`reference`	`str`	The reference to the image.	required
`max_height`	`Optional[int]`	The maximum height of the image.	`None`
`max_width`	`Optional[int]`	The maximum width of the image.	`None`

Returns:

Type	Description
`Tuple[str, Optional[float]]`	The image and the scaling factor.

Source code in inference_sdk/http/utils/loaders.py

def load_image_from_string(
    reference: str,
    max_height: Optional[int] = None,
    max_width: Optional[int] = None,
) -> Tuple[str, Optional[float]]:
    """Load an image from a string.

    Args:
        reference: The reference to the image.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The image and the scaling factor.
    """
    if uri_is_http_link(uri=reference):
        return load_image_from_url(
            url=reference, max_height=max_height, max_width=max_width
        )
    if os.path.exists(reference):
        if max_height is None or max_width is None:
            with open(reference, "rb") as f:
                img_bytes = f.read()
            img_base64_str = encode_base_64(payload=img_bytes)
            return img_base64_str, None
        local_image = cv2.imread(reference)
        if local_image is None:
            raise EncodingError(f"Could not load image from {reference}")
        local_image, scaling_factor = resize_opencv_image(
            image=local_image,
            max_height=max_height,
            max_width=max_width,
        )
        return numpy_array_to_base64_jpeg(image=local_image), scaling_factor
    if max_height is not None and max_width is not None:
        image_bytes = base64.b64decode(reference)
        image = bytes_to_opencv_image(payload=image_bytes)
        image, scaling_factor = resize_opencv_image(
            image=image,
            max_height=max_height,
            max_width=max_width,
        )
        return numpy_array_to_base64_jpeg(image=image), scaling_factor
    return reference, None

load_image_from_string_async `async` ¶

load_image_from_string_async(
    reference, max_height=None, max_width=None
)

Load an image from a string asynchronously.

Parameters:

Name	Type	Description	Default
`reference`	`str`	The reference to the image.	required
`max_height`	`Optional[int]`	The maximum height of the image.	`None`
`max_width`	`Optional[int]`	The maximum width of the image.	`None`

Returns:

Type	Description
`Tuple[str, Optional[float]]`	The image and the scaling factor.

Source code in inference_sdk/http/utils/loaders.py

async def load_image_from_string_async(
    reference: str,
    max_height: Optional[int] = None,
    max_width: Optional[int] = None,
) -> Tuple[str, Optional[float]]:
    """Load an image from a string asynchronously.

    Args:
        reference: The reference to the image.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The image and the scaling factor.
    """
    if uri_is_http_link(uri=reference):
        return await load_image_from_url_async(
            url=reference, max_height=max_height, max_width=max_width
        )
    if os.path.exists(reference):
        local_image = cv2.imread(reference)
        if local_image is None:
            raise EncodingError(f"Could not load image from {reference}")
        local_image, scaling_factor = resize_opencv_image(
            image=local_image,
            max_height=max_height,
            max_width=max_width,
        )
        return numpy_array_to_base64_jpeg(image=local_image), scaling_factor
    if max_height is not None and max_width is not None:
        image_bytes = base64.b64decode(reference)
        image = bytes_to_opencv_image(payload=image_bytes)
        image, scaling_factor = resize_opencv_image(
            image=image,
            max_height=max_height,
            max_width=max_width,
        )
        return numpy_array_to_base64_jpeg(image=image), scaling_factor
    return reference, None

load_image_from_url ¶

load_image_from_url(url, max_height=None, max_width=None)

Load an image from a URL.

Parameters:

Name	Type	Description	Default
`url`	`str`	The URL of the image.	required
`max_height`	`Optional[int]`	The maximum height of the image.	`None`
`max_width`	`Optional[int]`	The maximum width of the image.	`None`

Returns:

Type	Description
`Tuple[str, Optional[float]]`	The image and the scaling factor.

Source code in inference_sdk/http/utils/loaders.py

def load_image_from_url(
    url: str,
    max_height: Optional[int] = None,
    max_width: Optional[int] = None,
) -> Tuple[str, Optional[float]]:
    """Load an image from a URL.

    Args:
        url: The URL of the image.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The image and the scaling factor.
    """
    response = requests.get(url)
    response.raise_for_status()
    if max_height is None or max_width is None:
        return encode_base_64(response.content), None
    image = bytes_to_opencv_image(payload=response.content)
    resized_image, scaling_factor = resize_opencv_image(
        image=image,
        max_height=max_height,
        max_width=max_width,
    )
    serialised_image = numpy_array_to_base64_jpeg(image=resized_image)
    return serialised_image, scaling_factor

load_image_from_url_async `async` ¶

load_image_from_url_async(
    url, max_height=None, max_width=None
)

Load an image from a URL asynchronously.

Parameters:

Name	Type	Description	Default
`url`	`str`	The URL of the image.	required
`max_height`	`Optional[int]`	The maximum height of the image.	`None`
`max_width`	`Optional[int]`	The maximum width of the image.	`None`

Returns:

Type	Description
`Tuple[str, Optional[float]]`	The image and the scaling factor.

Source code in inference_sdk/http/utils/loaders.py

async def load_image_from_url_async(
    url: str,
    max_height: Optional[int] = None,
    max_width: Optional[int] = None,
) -> Tuple[str, Optional[float]]:
    """Load an image from a URL asynchronously.

    Args:
        url: The URL of the image.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The image and the scaling factor.
    """
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as response:
            response.raise_for_status()
            response_payload = await response.read()
    if max_height is None or max_width is None:
        return encode_base_64(response_payload), None
    image = bytes_to_opencv_image(payload=response_payload)
    resized_image, scaling_factor = resize_opencv_image(
        image=image,
        max_height=max_height,
        max_width=max_width,
    )
    serialised_image = numpy_array_to_base64_jpeg(image=resized_image)
    return serialised_image, scaling_factor

load_nested_batches_of_inference_input ¶

load_nested_batches_of_inference_input(
    inference_input, max_height=None, max_width=None
)

Load a nested batch of inference input.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[list, ImagesReference]`	The inference input.	required
`max_height`	`Optional[int]`	The maximum height of the image.	`None`
`max_width`	`Optional[int]`	The maximum width of the image.	`None`

Returns:

Type	Description
`Union[Tuple[str, Optional[float]], list]`	The nested batch of inference input.

Source code in inference_sdk/http/utils/loaders.py

def load_nested_batches_of_inference_input(
    inference_input: Union[list, ImagesReference],
    max_height: Optional[int] = None,
    max_width: Optional[int] = None,
) -> Union[Tuple[str, Optional[float]], list]:
    """Load a nested batch of inference input.

    Args:
        inference_input: The inference input.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The nested batch of inference input.
    """
    if not isinstance(inference_input, list):
        return load_static_inference_input(
            inference_input=inference_input,
            max_height=max_height,
            max_width=max_width,
        )[0]
    result = []
    for element in inference_input:
        result.append(
            load_nested_batches_of_inference_input(
                inference_input=element,
                max_height=max_height,
                max_width=max_width,
            )
        )
    return result

load_static_inference_input ¶

load_static_inference_input(
    inference_input, max_height=None, max_width=None
)

Load a static inference input.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	The inference input.	required
`max_height`	`Optional[int]`	The maximum height of the image.	`None`
`max_width`	`Optional[int]`	The maximum width of the image.	`None`

Returns:

Type	Description
`List[Tuple[str, Optional[float]]]`	The list of the inference input.

Source code in inference_sdk/http/utils/loaders.py

def load_static_inference_input(
    inference_input: Union[ImagesReference, List[ImagesReference]],
    max_height: Optional[int] = None,
    max_width: Optional[int] = None,
) -> List[Tuple[str, Optional[float]]]:
    """Load a static inference input.

    Args:
        inference_input: The inference input.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The list of the inference input.
    """
    if issubclass(type(inference_input), list):
        results = []
        for element in inference_input:
            results.extend(
                load_static_inference_input(
                    inference_input=element,
                    max_height=max_height,
                    max_width=max_width,
                )
            )
        return results
    if issubclass(type(inference_input), str):
        return [
            load_image_from_string(
                reference=inference_input, max_height=max_height, max_width=max_width
            )
        ]
    if issubclass(type(inference_input), np.ndarray):
        image, scaling_factor = resize_opencv_image(
            image=inference_input,
            max_height=max_height,
            max_width=max_width,
        )
        return [(numpy_array_to_base64_jpeg(image=image), scaling_factor)]
    if issubclass(type(inference_input), Image.Image):
        image, scaling_factor = resize_pillow_image(
            image=inference_input,
            max_height=max_height,
            max_width=max_width,
        )
        return [(pillow_image_to_base64_jpeg(image=image), scaling_factor)]
    raise InvalidInputFormatError(
        f"Unknown type of input ({inference_input.__class__.__name__}) submitted."
    )

load_static_inference_input_async `async` ¶

load_static_inference_input_async(
    inference_input, max_height=None, max_width=None
)

Load a static inference input asynchronously.

Parameters:

Name	Type	Description	Default
`inference_input`	`Union[ImagesReference, List[ImagesReference]]`	The inference input.	required
`max_height`	`Optional[int]`	The maximum height of the image.	`None`
`max_width`	`Optional[int]`	The maximum width of the image.	`None`

Returns:

Type	Description
`List[Tuple[str, Optional[float]]]`	The list of the inference input.

Source code in inference_sdk/http/utils/loaders.py

async def load_static_inference_input_async(
    inference_input: Union[ImagesReference, List[ImagesReference]],
    max_height: Optional[int] = None,
    max_width: Optional[int] = None,
) -> List[Tuple[str, Optional[float]]]:
    """Load a static inference input asynchronously.

    Args:
        inference_input: The inference input.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The list of the inference input.
    """
    if issubclass(type(inference_input), list):
        results = []
        for element in inference_input:
            results.extend(
                await load_static_inference_input_async(
                    inference_input=element,
                    max_height=max_height,
                    max_width=max_width,
                )
            )
        return results
    if issubclass(type(inference_input), str):
        return [
            await load_image_from_string_async(
                reference=inference_input, max_height=max_height, max_width=max_width
            )
        ]
    if issubclass(type(inference_input), np.ndarray):
        image, scaling_factor = resize_opencv_image(
            image=inference_input,
            max_height=max_height,
            max_width=max_width,
        )
        return [(numpy_array_to_base64_jpeg(image=image), scaling_factor)]
    if issubclass(type(inference_input), Image.Image):
        image, scaling_factor = resize_pillow_image(
            image=inference_input,
            max_height=max_height,
            max_width=max_width,
        )
        return [(pillow_image_to_base64_jpeg(image=image), scaling_factor)]
    raise InvalidInputFormatError(
        f"Unknown type of input ({inference_input.__class__.__name__}) submitted."
    )

load_stream_inference_input ¶

load_stream_inference_input(input_uri, image_extensions)

Load an inference input from a stream.

Parameters:

Name	Type	Description	Default
`input_uri`	`str`	The URI of the input.	required
`image_extensions`	`Optional[List[str]]`	The extensions of the images.	required

Returns:

Type	Description
`None`	The generator of the inference input.

Source code in inference_sdk/http/utils/loaders.py

def load_stream_inference_input(
    input_uri: str,
    image_extensions: Optional[List[str]],
) -> Generator[Tuple[Union[str, int], np.ndarray], None, None]:
    """Load an inference input from a stream.

    Args:
        input_uri: The URI of the input.
        image_extensions: The extensions of the images.

    Returns:
        The generator of the inference input.
    """
    if os.path.isdir(input_uri):
        yield from load_directory_inference_input(
            directory_path=input_uri, image_extensions=image_extensions
        )
    else:
        yield from enumerate(sv.get_video_frames_generator(source_path=input_uri))

uri_is_http_link ¶

uri_is_http_link(uri)

Check if the URI is an HTTP link.

Parameters:

Name	Type	Description	Default
`uri`	`str`	The URI to check.	required

Returns:

Type	Description
`bool`	True if the URI is an HTTP link, False otherwise.

Source code in inference_sdk/http/utils/loaders.py

def uri_is_http_link(uri: str) -> bool:
    """Check if the URI is an HTTP link.

    Args:
        uri: The URI to check.

    Returns:
        True if the URI is an HTTP link, False otherwise.
    """
    return uri.startswith("http://") or uri.startswith("https://")

inference_sdk.http.utils.post_processing ¶

Classes¶

Functions¶

adjust_bbox_coordinates_to_client_scaling_factor ¶

adjust_bbox_coordinates_to_client_scaling_factor(
    bbox, scaling_factor
)

Adjust a bbox coordinates to the client scaling factor.

Parameters:

Name	Type	Description	Default
`bbox`	`dict`	The bbox to adjust.	required
`scaling_factor`	`float`	The scaling factor.	required

Returns:

Type	Description
`dict`	The adjusted bbox.

Source code in inference_sdk/http/utils/post_processing.py

def adjust_bbox_coordinates_to_client_scaling_factor(
    bbox: dict,
    scaling_factor: float,
) -> dict:
    """Adjust a bbox coordinates to the client scaling factor.

    Args:
        bbox: The bbox to adjust.
        scaling_factor: The scaling factor.

    Returns:
        The adjusted bbox.
    """
    bbox["x"] = bbox["x"] / scaling_factor
    bbox["y"] = bbox["y"] / scaling_factor
    bbox["width"] = bbox["width"] / scaling_factor
    bbox["height"] = bbox["height"] / scaling_factor
    return bbox

adjust_object_detection_predictions_to_client_scaling_factor ¶

adjust_object_detection_predictions_to_client_scaling_factor(
    predictions, scaling_factor
)

Adjust a list of object detection predictions to the client scaling factor.

Parameters:

Name	Type	Description	Default
`predictions`	`List[dict]`	The list of object detection predictions.	required
`scaling_factor`	`float`	The scaling factor.	required

Returns:

Type	Description
`List[dict]`	The adjusted list of object detection predictions.

Source code in inference_sdk/http/utils/post_processing.py

def adjust_object_detection_predictions_to_client_scaling_factor(
    predictions: List[dict],
    scaling_factor: float,
) -> List[dict]:
    """Adjust a list of object detection predictions to the client scaling factor.

    Args:
        predictions: The list of object detection predictions.
        scaling_factor: The scaling factor.

    Returns:
        The adjusted list of object detection predictions.
    """
    result = []
    for prediction in predictions:
        prediction = adjust_bbox_coordinates_to_client_scaling_factor(
            bbox=prediction,
            scaling_factor=scaling_factor,
        )
        result.append(prediction)
    return result

adjust_points_coordinates_to_client_scaling_factor ¶

adjust_points_coordinates_to_client_scaling_factor(
    points, scaling_factor
)

Adjust a list of points coordinates to the client scaling factor.

Parameters:

Name	Type	Description	Default
`points`	`List[dict]`	The list of points.	required
`scaling_factor`	`float`	The scaling factor.	required

Returns:

Type	Description
`List[dict]`	The adjusted list of points.

Source code in inference_sdk/http/utils/post_processing.py

def adjust_points_coordinates_to_client_scaling_factor(
    points: List[dict],
    scaling_factor: float,
) -> List[dict]:
    """Adjust a list of points coordinates to the client scaling factor.

    Args:
        points: The list of points.
        scaling_factor: The scaling factor.

    Returns:
        The adjusted list of points.
    """
    result = []
    for point in points:
        point["x"] = point["x"] / scaling_factor
        point["y"] = point["y"] / scaling_factor
        result.append(point)
    return result

adjust_prediction_to_client_scaling_factor ¶

adjust_prediction_to_client_scaling_factor(
    prediction, scaling_factor
)

Adjust a prediction to the client scaling factor.

Parameters:

Name	Type	Description	Default
`prediction`	`dict`	The prediction to adjust.	required
`scaling_factor`	`Optional[float]`	The scaling factor.	required

Returns:

Type	Description
`dict`	The adjusted prediction.

Source code in inference_sdk/http/utils/post_processing.py

def adjust_prediction_to_client_scaling_factor(
    prediction: dict,
    scaling_factor: Optional[float],
) -> dict:
    """Adjust a prediction to the client scaling factor.

    Args:
        prediction: The prediction to adjust.
        scaling_factor: The scaling factor.

    Returns:
        The adjusted prediction.
    """
    if scaling_factor is None or prediction.get("is_stub", False):
        return prediction
    if "image" in prediction:
        prediction["image"] = {
            "width": round(prediction["image"]["width"] / scaling_factor),
            "height": round(prediction["image"]["height"] / scaling_factor),
        }
    if predictions_should_not_be_post_processed(prediction=prediction):
        return prediction
    if "points" in prediction["predictions"][0]:
        prediction["predictions"] = (
            adjust_prediction_with_bbox_and_points_to_client_scaling_factor(
                predictions=prediction["predictions"],
                scaling_factor=scaling_factor,
                points_key="points",
            )
        )
    elif "keypoints" in prediction["predictions"][0]:
        prediction["predictions"] = (
            adjust_prediction_with_bbox_and_points_to_client_scaling_factor(
                predictions=prediction["predictions"],
                scaling_factor=scaling_factor,
                points_key="keypoints",
            )
        )
    elif "x" in prediction["predictions"][0] and "y" in prediction["predictions"][0]:
        prediction["predictions"] = (
            adjust_object_detection_predictions_to_client_scaling_factor(
                predictions=prediction["predictions"],
                scaling_factor=scaling_factor,
            )
        )
    return prediction

adjust_prediction_with_bbox_and_points_to_client_scaling_factor ¶

adjust_prediction_with_bbox_and_points_to_client_scaling_factor(
    predictions, scaling_factor, points_key
)

Adjust a list of predictions with bbox and points to the client scaling factor.

Parameters:

Name	Type	Description	Default
`predictions`	`List[dict]`	The list of predictions.	required
`scaling_factor`	`float`	The scaling factor.	required
`points_key`	`str`	The key of the points.	required

Returns:

Type	Description
`List[dict]`	The adjusted list of predictions.

Source code in inference_sdk/http/utils/post_processing.py

def adjust_prediction_with_bbox_and_points_to_client_scaling_factor(
    predictions: List[dict],
    scaling_factor: float,
    points_key: str,
) -> List[dict]:
    """Adjust a list of predictions with bbox and points to the client scaling factor.

    Args:
        predictions: The list of predictions.
        scaling_factor: The scaling factor.
        points_key: The key of the points.

    Returns:
        The adjusted list of predictions.
    """
    result = []
    for prediction in predictions:
        prediction = adjust_bbox_coordinates_to_client_scaling_factor(
            bbox=prediction,
            scaling_factor=scaling_factor,
        )
        prediction[points_key] = adjust_points_coordinates_to_client_scaling_factor(
            points=prediction[points_key],
            scaling_factor=scaling_factor,
        )
        result.append(prediction)
    return result

combine_clip_embeddings ¶

combine_clip_embeddings(embeddings)

Combine clip embeddings.

Parameters:

Name	Type	Description	Default
`embeddings`	`Union[dict, List[dict]]`	The embeddings to combine.	required

Returns:

Type	Description
`List[dict]`	The combined embeddings.

Source code in inference_sdk/http/utils/post_processing.py

def combine_clip_embeddings(embeddings: Union[dict, List[dict]]) -> List[dict]:
    """Combine clip embeddings.

    Args:
        embeddings: The embeddings to combine.

    Returns:
        The combined embeddings.
    """
    if issubclass(type(embeddings), list):
        result = []
        for e in embeddings:
            result.extend(combine_clip_embeddings(embeddings=e))
        return result
    frame_id = embeddings["frame_id"]
    time = embeddings["time"]
    if len(embeddings["embeddings"]) > 1:
        new_embeddings = [
            {"frame_id": frame_id, "time": time, "embeddings": [e]}
            for e in embeddings["embeddings"]
        ]
    else:
        new_embeddings = [embeddings]
    return new_embeddings

combine_gaze_detections ¶

combine_gaze_detections(detections)

Combine gaze detections.

Parameters:

Name	Type	Description	Default
`detections`	`Union[dict, List[Union[dict, List[dict]]]]`	The detections to combine.	required

Returns:

Type	Description
`Union[dict, List[Dict]]`	The combined detections.

Source code in inference_sdk/http/utils/post_processing.py

def combine_gaze_detections(
    detections: Union[dict, List[Union[dict, List[dict]]]],
) -> Union[dict, List[Dict]]:
    """Combine gaze detections.

    Args:
        detections: The detections to combine.

    Returns:
        The combined detections.
    """
    if not issubclass(type(detections), list):
        return detections
    detections = [e if issubclass(type(e), list) else [e] for e in detections]
    return list(itertools.chain.from_iterable(detections))

decode_workflow_output ¶

decode_workflow_output(workflow_output, expected_format)

Decode a workflow output.

Parameters:

Name	Type	Description	Default
`workflow_output`	`Dict[str, Any]`	The workflow output to decode.	required
`expected_format`	`VisualisationResponseFormat`	The expected format of the workflow output.	required

Returns:

Type	Description
`Dict[str, Any]`	The decoded workflow output.

Source code in inference_sdk/http/utils/post_processing.py

def decode_workflow_output(
    workflow_output: Dict[str, Any],
    expected_format: VisualisationResponseFormat,
) -> Dict[str, Any]:
    """Decode a workflow output.

    Args:
        workflow_output: The workflow output to decode.
        expected_format: The expected format of the workflow output.

    Returns:
        The decoded workflow output.
    """
    result = {}
    for key, value in workflow_output.items():
        if is_workflow_image(value=value):
            value = decode_workflow_output_image(
                value=value,
                expected_format=expected_format,
            )
        elif issubclass(type(value), dict):
            value = decode_workflow_output(
                workflow_output=value, expected_format=expected_format
            )
        elif issubclass(type(value), list):
            value = decode_workflow_output_list(
                elements=value,
                expected_format=expected_format,
            )
        result[key] = value
    return result

decode_workflow_output_image ¶

decode_workflow_output_image(value, expected_format)

Decode a workflow output image.

Parameters:

Name	Type	Description	Default
`value`	`Dict[str, Any]`	The value to decode.	required
`expected_format`	`VisualisationResponseFormat`	The expected format of the value.	required

Returns:

Type	Description
`Union[str, ndarray, Image]`	The decoded value.

Source code in inference_sdk/http/utils/post_processing.py

def decode_workflow_output_image(
    value: Dict[str, Any],
    expected_format: VisualisationResponseFormat,
) -> Union[str, np.ndarray, Image.Image]:
    """Decode a workflow output image.

    Args:
        value: The value to decode.
        expected_format: The expected format of the value.

    Returns:
        The decoded value.
    """
    if expected_format is VisualisationResponseFormat.BASE64:
        return value["value"]
    return transform_base64_visualisation(
        visualisation=value["value"],
        expected_format=expected_format,
    )

decode_workflow_output_list ¶

decode_workflow_output_list(elements, expected_format)

Decode a list of workflow outputs.

Parameters:

Name	Type	Description	Default
`elements`	`List[Any]`	The list of elements to decode.	required
`expected_format`	`VisualisationResponseFormat`	The expected format of the elements.	required

Returns:

Type	Description
`List[Any]`	The decoded list of elements.

Source code in inference_sdk/http/utils/post_processing.py

def decode_workflow_output_list(
    elements: List[Any],
    expected_format: VisualisationResponseFormat,
) -> List[Any]:
    """Decode a list of workflow outputs.

    Args:
        elements: The list of elements to decode.
        expected_format: The expected format of the elements.

    Returns:
        The decoded list of elements.
    """
    result = []
    for element in elements:
        if is_workflow_image(value=element):
            element = decode_workflow_output_image(
                value=element,
                expected_format=expected_format,
            )
        elif issubclass(type(element), dict):
            element = decode_workflow_output(
                workflow_output=element, expected_format=expected_format
            )
        elif issubclass(type(element), list):
            element = decode_workflow_output_list(
                elements=element,
                expected_format=expected_format,
            )
        result.append(element)
    return result

decode_workflow_outputs ¶

decode_workflow_outputs(workflow_outputs, expected_format)

Decode a list of workflow outputs.

Parameters:

Name	Type	Description	Default
`workflow_outputs`	`List[Dict[str, Any]]`	The list of workflow outputs.	required
`expected_format`	`VisualisationResponseFormat`	The expected format of the workflow outputs.	required

Returns:

Type	Description
`List[Dict[str, Any]]`	The decoded list of workflow outputs.

Source code in inference_sdk/http/utils/post_processing.py

def decode_workflow_outputs(
    workflow_outputs: List[Dict[str, Any]],
    expected_format: VisualisationResponseFormat,
) -> List[Dict[str, Any]]:
    """Decode a list of workflow outputs.

    Args:
        workflow_outputs: The list of workflow outputs.
        expected_format: The expected format of the workflow outputs.

    Returns:
        The decoded list of workflow outputs.
    """
    return [
        decode_workflow_output(
            workflow_output=workflow_output,
            expected_format=expected_format,
        )
        for workflow_output in workflow_outputs
    ]

filter_model_descriptions ¶

filter_model_descriptions(descriptions, model_id)

Filter model descriptions.

Parameters:

Name	Type	Description	Default
`descriptions`	`List[ModelDescription]`	The list of model descriptions.	required
`model_id`	`str`	The model ID.	required

Returns:

Type	Description
`Optional[ModelDescription]`	The filtered model description.

Source code in inference_sdk/http/utils/post_processing.py

def filter_model_descriptions(
    descriptions: List[ModelDescription],
    model_id: str,
) -> Optional[ModelDescription]:
    """Filter model descriptions.

    Args:
        descriptions: The list of model descriptions.
        model_id: The model ID.

    Returns:
        The filtered model description.
    """
    matching_models = [d for d in descriptions if d.model_id == model_id]
    if len(matching_models) > 0:
        return matching_models[0]
    return None

is_workflow_image ¶

is_workflow_image(value)

Check if the value is a workflow image.

Parameters:

Name	Type	Description	Default
`value`	`Any`	The value to check.	required

Returns:

Type	Description
`bool`	True if the value is a workflow image, False otherwise.

Source code in inference_sdk/http/utils/post_processing.py

def is_workflow_image(value: Any) -> bool:
    """Check if the value is a workflow image.

    Args:
        value: The value to check.

    Returns:
        True if the value is a workflow image, False otherwise.
    """
    return issubclass(type(value), dict) and value.get("type") == "base64"

predictions_should_not_be_post_processed ¶

predictions_should_not_be_post_processed(prediction)

Check if the predictions should not be post-processed.

Parameters:

Name	Type	Description	Default
`prediction`	`dict`	The prediction to check.	required

Returns:

Type	Description
`bool`	True if the predictions should not be post-processed, False otherwise.

Source code in inference_sdk/http/utils/post_processing.py

def predictions_should_not_be_post_processed(prediction: dict) -> bool:
    """Check if the predictions should not be post-processed.

    Args:
        prediction: The prediction to check.

    Returns:
        True if the predictions should not be post-processed, False otherwise.
    """
    return (
        "predictions" not in prediction
        or not issubclass(type(prediction["predictions"]), list)
        or len(prediction["predictions"]) == 0
    )

response_contains_jpeg_image ¶

response_contains_jpeg_image(response)

Check if the response contains a JPEG image.

Parameters:

Name	Type	Description	Default
`response`	`Response`	The response to check.	required

Returns:

Type	Description
`bool`	True if the response contains a JPEG image, False otherwise.

Source code in inference_sdk/http/utils/post_processing.py

def response_contains_jpeg_image(response: Response) -> bool:
    """Check if the response contains a JPEG image.

    Args:
        response: The response to check.

    Returns:
        True if the response contains a JPEG image, False otherwise.
    """
    content_type = None
    for header_name in CONTENT_TYPE_HEADERS:
        if header_name in response.headers:
            content_type = response.headers[header_name]
            break
    if content_type is None:
        return False
    return "image/jpeg" in content_type

transform_base64_visualisation ¶

transform_base64_visualisation(
    visualisation, expected_format
)

Transform a base64 visualisation.

Parameters:

Name	Type	Description	Default
`visualisation`	`str`	The visualisation to transform.	required
`expected_format`	`VisualisationResponseFormat`	The expected format of the visualisation.	required

Returns:

Type	Description
`Union[str, ndarray, Image]`	The transformed visualisation.

Source code in inference_sdk/http/utils/post_processing.py

def transform_base64_visualisation(
    visualisation: str,
    expected_format: VisualisationResponseFormat,
) -> Union[str, np.ndarray, Image.Image]:
    """Transform a base64 visualisation.

    Args:
        visualisation: The visualisation to transform.
        expected_format: The expected format of the visualisation.

    Returns:
        The transformed visualisation.
    """
    visualisation_bytes = base64.b64decode(visualisation)
    return transform_visualisation_bytes(
        visualisation=visualisation_bytes, expected_format=expected_format
    )

transform_visualisation_bytes ¶

transform_visualisation_bytes(
    visualisation, expected_format
)

Transform a visualisation bytes.

Parameters:

Name	Type	Description	Default
`visualisation`	`bytes`	The visualisation to transform.	required
`expected_format`	`VisualisationResponseFormat`	The expected format of the visualisation.	required

Returns:

Type	Description
`Union[str, ndarray, Image]`	The transformed visualisation.

Source code in inference_sdk/http/utils/post_processing.py

def transform_visualisation_bytes(
    visualisation: bytes,
    expected_format: VisualisationResponseFormat,
) -> Union[str, np.ndarray, Image.Image]:
    """Transform a visualisation bytes.

    Args:
        visualisation: The visualisation to transform.
        expected_format: The expected format of the visualisation.

    Returns:
        The transformed visualisation.
    """
    if expected_format not in IMAGES_TRANSCODING_METHODS:
        raise NotImplementedError(
            f"Expected format: {expected_format} is not supported in terms of visualisations transcoding."
        )
    transcoding_method = IMAGES_TRANSCODING_METHODS[expected_format]
    return transcoding_method(visualisation)

inference_sdk.http.utils.pre_processing ¶

Functions¶

determine_scaling_aspect_ratio ¶

determine_scaling_aspect_ratio(
    image_height, image_width, max_height, max_width
)

Determine the scaling aspect ratio.

Parameters:

Name	Type	Description	Default
`image_height`	`int`	The height of the image.	required
`image_width`	`int`	The width of the image.	required
`max_height`	`int`	The maximum height of the image.	required
`max_width`	`int`	The maximum width of the image.	required

Returns:

Type	Description
`Optional[float]`	The scaling aspect ratio.

Source code in inference_sdk/http/utils/pre_processing.py

def determine_scaling_aspect_ratio(
    image_height: int,
    image_width: int,
    max_height: int,
    max_width: int,
) -> Optional[float]:
    """Determine the scaling aspect ratio.

    Args:
        image_height: The height of the image.
        image_width: The width of the image.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The scaling aspect ratio.
    """
    height_scaling_ratio = max_height / image_height
    width_scaling_ratio = max_width / image_width
    min_scaling_ratio = min(height_scaling_ratio, width_scaling_ratio)
    return min_scaling_ratio if min_scaling_ratio < 1.0 else None

resize_opencv_image ¶

resize_opencv_image(image, max_height, max_width)

Resize an OpenCV image.

Parameters:

Name	Type	Description	Default
`image`	`ndarray`	The image to resize.	required
`max_height`	`Optional[int]`	The maximum height of the image.	required
`max_width`	`Optional[int]`	The maximum width of the image.	required

Returns:

Type	Description
`Tuple[ndarray, Optional[float]]`	The resized image and the scaling factor.

Source code in inference_sdk/http/utils/pre_processing.py

def resize_opencv_image(
    image: np.ndarray,
    max_height: Optional[int],
    max_width: Optional[int],
) -> Tuple[np.ndarray, Optional[float]]:
    """Resize an OpenCV image.

    Args:
        image: The image to resize.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The resized image and the scaling factor.
    """
    if max_width is None or max_height is None:
        return image, None
    height, width = image.shape[:2]
    scaling_ratio = determine_scaling_aspect_ratio(
        image_height=height,
        image_width=width,
        max_height=max_height,
        max_width=max_width,
    )
    if scaling_ratio is None:
        return image, None
    resized_image = cv2.resize(
        src=image, dsize=None, fx=scaling_ratio, fy=scaling_ratio
    )
    return resized_image, scaling_ratio

resize_pillow_image ¶

resize_pillow_image(image, max_height, max_width)

Resize a Pillow image.

Parameters:

Name	Type	Description	Default
`image`	`Image`	The image to resize.	required
`max_height`	`Optional[int]`	The maximum height of the image.	required
`max_width`	`Optional[int]`	The maximum width of the image.	required

Returns:

Type	Description
`Tuple[Image, Optional[float]]`	The resized image and the scaling factor.

Source code in inference_sdk/http/utils/pre_processing.py

def resize_pillow_image(
    image: Image.Image,
    max_height: Optional[int],
    max_width: Optional[int],
) -> Tuple[Image.Image, Optional[float]]:
    """Resize a Pillow image.

    Args:
        image: The image to resize.
        max_height: The maximum height of the image.
        max_width: The maximum width of the image.

    Returns:
        The resized image and the scaling factor.
    """
    if max_width is None or max_height is None:
        return image, None
    width, height = image.size
    scaling_ratio = determine_scaling_aspect_ratio(
        image_height=height,
        image_width=width,
        max_height=max_height,
        max_width=max_width,
    )
    if scaling_ratio is None:
        return image, None
    new_width = round(scaling_ratio * width)
    new_height = round(scaling_ratio * height)
    return image.resize(size=(new_width, new_height)), scaling_ratio

inference_sdk.http.utils.profilling ¶

Functions¶

save_workflows_profiler_trace ¶

save_workflows_profiler_trace(directory, profiler_trace)

Save a workflow profiler trace.

Parameters:

Name	Type	Description	Default
`directory`	`str`	The directory to save the profiler trace.	required
`profiler_trace`	`List[dict]`	The profiler trace.	required

Source code in inference_sdk/http/utils/profilling.py

def save_workflows_profiler_trace(
    directory: str,
    profiler_trace: List[dict],
) -> None:
    """Save a workflow profiler trace.

    Args:
        directory: The directory to save the profiler trace.
        profiler_trace: The profiler trace.
    """
    directory = os.path.abspath(directory)
    os.makedirs(directory, exist_ok=True)
    formatted_time = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")
    track_path = os.path.join(
        directory, f"workflow_execution_tack_{formatted_time}.json"
    )
    with open(track_path, "w") as f:
        json.dump(profiler_trace, f)

inference_sdk.http.utils.request_building ¶

Classes¶

RequestData `dataclass` ¶

Data class for request data.

Attributes:

Name	Type	Description
`url`	`str`	The URL of the request.
`request_elements`	`int`	The number of request elements.
`headers`	`Optional[Dict[str, str]]`	The headers of the request.
`parameters`	`Optional[Dict[str, Union[str, List[str]]]]`	The parameters of the request.
`data`	`Optional[Union[str, bytes]]`	The data of the request.
`payload`	`Optional[Dict[str, Any]]`	The payload of the request.
`image_scaling_factors`	`List[Optional[float]]`	The scaling factors of the images.

Source code in inference_sdk/http/utils/request_building.py

@dataclass(frozen=True)
class RequestData:
    """Data class for request data.

    Attributes:
        url: The URL of the request.
        request_elements: The number of request elements.
        headers: The headers of the request.
        parameters: The parameters of the request.
        data: The data of the request.
        payload: The payload of the request.
        image_scaling_factors: The scaling factors of the images.
    """

    url: str
    request_elements: int
    headers: Optional[Dict[str, str]]
    parameters: Optional[Dict[str, Union[str, List[str]]]]
    data: Optional[Union[str, bytes]]
    payload: Optional[Dict[str, Any]]
    image_scaling_factors: List[Optional[float]]

Functions¶

assembly_request_data ¶

assembly_request_data(
    url,
    batch_inference_inputs,
    headers,
    parameters,
    payload,
    image_placement,
)

Assemble request data.

Parameters:

Name	Type	Description	Default
`url`	`str`	The URL of the request.	required
`batch_inference_inputs`	`List[Tuple[str, Optional[float]]]`	The batch inference inputs.	required
`headers`	`Optional[Dict[str, str]]`	The headers of the request.	required
`parameters`	`Optional[Dict[str, Union[str, List[str]]]]`	The parameters of the request.	required
`payload`	`Optional[Dict[str, Any]]`	The payload of the request.	required
`image_placement`	`ImagePlacement`	The image placement.	required

Returns:

Type	Description
`RequestData`	The request data.

Source code in inference_sdk/http/utils/request_building.py

def assembly_request_data(
    url: str,
    batch_inference_inputs: List[Tuple[str, Optional[float]]],
    headers: Optional[Dict[str, str]],
    parameters: Optional[Dict[str, Union[str, List[str]]]],
    payload: Optional[Dict[str, Any]],
    image_placement: ImagePlacement,
) -> RequestData:
    """Assemble request data.

    Args:
        url: The URL of the request.
        batch_inference_inputs: The batch inference inputs.
        headers: The headers of the request.
        parameters: The parameters of the request.
        payload: The payload of the request.
        image_placement: The image placement.

    Returns:
        The request data.
    """
    data = None
    if image_placement is ImagePlacement.DATA and len(batch_inference_inputs) != 1:
        raise ValueError("Only single image can be placed in request `data`")
    if image_placement is ImagePlacement.JSON and payload is None:
        payload = {}
    if image_placement is ImagePlacement.JSON:
        payload = deepcopy(payload)
        payload = inject_images_into_payload(
            payload=payload,
            encoded_images=batch_inference_inputs,
        )
    elif image_placement is ImagePlacement.DATA:
        data = batch_inference_inputs[0][0]
    else:
        raise NotImplemented(
            f"Not implemented request building method for {image_placement}"
        )
    scaling_factors = [e[1] for e in batch_inference_inputs]

    execution_id_value = execution_id.get()
    if execution_id_value:
        headers = headers.copy()
        headers[EXECUTION_ID_HEADER] = execution_id_value
        if ENABLE_INTERNAL_REMOTE_EXEC_HEADER:
            _internal_secret = os.getenv("ROBOFLOW_INTERNAL_SERVICE_SECRET")
            if _internal_secret:
                headers[INTERNAL_REMOTE_EXEC_REQ_HEADER] = _internal_secret

    return RequestData(
        url=url,
        request_elements=len(batch_inference_inputs),
        headers=headers,
        parameters=parameters,
        data=data,
        payload=payload,
        image_scaling_factors=scaling_factors,
    )

prepare_requests_data ¶

prepare_requests_data(
    url,
    encoded_inference_inputs,
    headers,
    parameters,
    payload,
    max_batch_size,
    image_placement,
)

Prepare requests data.

Parameters:

Name	Type	Description	Default
`url`	`str`	The URL of the request.	required
`encoded_inference_inputs`	`List[Tuple[str, Optional[float]]]`	The encoded inference inputs.	required
`headers`	`Optional[Dict[str, str]]`	The headers of the request.	required
`parameters`	`Optional[Dict[str, Union[str, List[str]]]]`	The parameters of the request.	required
`payload`	`Optional[Dict[str, Any]]`	The payload of the request.	required
`max_batch_size`	`int`	The maximum batch size.	required
`image_placement`	`ImagePlacement`	The image placement.	required

Returns:

Type	Description
`List[RequestData]`	The list of request data.

Source code in inference_sdk/http/utils/request_building.py

def prepare_requests_data(
    url: str,
    encoded_inference_inputs: List[Tuple[str, Optional[float]]],
    headers: Optional[Dict[str, str]],
    parameters: Optional[Dict[str, Union[str, List[str]]]],
    payload: Optional[Dict[str, Any]],
    max_batch_size: int,
    image_placement: ImagePlacement,
) -> List[RequestData]:
    """Prepare requests data.

    Args:
        url: The URL of the request.
        encoded_inference_inputs: The encoded inference inputs.
        headers: The headers of the request.
        parameters: The parameters of the request.
        payload: The payload of the request.
        max_batch_size: The maximum batch size.
        image_placement: The image placement.

    Returns:
        The list of request data.
    """
    batches = list(
        make_batches(
            iterable=encoded_inference_inputs,
            batch_size=max_batch_size,
        )
    )
    requests_data = []
    for batch_inference_inputs in batches:
        request_data = assembly_request_data(
            url=url,
            batch_inference_inputs=batch_inference_inputs,
            headers=headers,
            parameters=parameters,
            payload=payload,
            image_placement=image_placement,
        )
        requests_data.append(request_data)
    return requests_data

inference_sdk.http.utils.requests ¶

Functions¶

api_key_safe_raise_for_status ¶

api_key_safe_raise_for_status(response)

Raise an exception if the request is not successful.

Parameters:

Name	Type	Description	Default
`response`	`Response`	The response of the request.	required

Source code in inference_sdk/http/utils/requests.py

def api_key_safe_raise_for_status(response: Response) -> None:
    """Raise an exception if the request is not successful.

    Args:
        response: The response of the request.
    """
    request_is_successful = response.status_code < 400
    if request_is_successful:
        return None
    response.url = deduct_api_key_from_string(value=response.url)
    response.raise_for_status()

deduct_api_key ¶

deduct_api_key(match)

Deduct the API key from the string.

Parameters:

Name	Type	Description	Default
`match`	`Match`	The match of the API key.	required

Returns:

Type	Description
`str`	The string with the API key deducted.

Source code in inference_sdk/http/utils/requests.py

def deduct_api_key(match: re.Match) -> str:
    """Deduct the API key from the string.

    Args:
        match: The match of the API key.

    Returns:
        The string with the API key deducted.
    """
    key_value = match.group(KEY_VALUE_GROUP)
    if len(key_value) < MIN_KEY_LENGTH_TO_REVEAL_PREFIX:
        return f"api_key=***"
    key_prefix = key_value[:2]
    key_postfix = key_value[-2:]
    return f"api_key={key_prefix}***{key_postfix}"

deduct_api_key_from_string ¶

deduct_api_key_from_string(value)

Deduct the API key from the string.

Parameters:

Name	Type	Description	Default
`value`	`str`	The string to deduct the API key from.	required

Returns:

Type	Description
`str`	The string with the API key deducted.

Source code in inference_sdk/http/utils/requests.py

def deduct_api_key_from_string(value: str) -> str:
    """Deduct the API key from the string.

    Args:
        value: The string to deduct the API key from.

    Returns:
        The string with the API key deducted.
    """
    return API_KEY_PATTERN.sub(deduct_api_key, value)

inject_images_into_payload ¶

inject_images_into_payload(
    payload, encoded_images, key="image"
)

Inject images into the payload.

Parameters:

Name	Type	Description	Default
`payload`	`dict`	The payload to inject the images into.	required
`encoded_images`	`List[Tuple[str, Optional[float]]]`	The encoded images.	required
`key`	`str`	The key of the images.	`'image'`

Returns:

Type	Description
`dict`	The payload with the images injected.

Source code in inference_sdk/http/utils/requests.py

def inject_images_into_payload(
    payload: dict,
    encoded_images: List[Tuple[str, Optional[float]]],
    key: str = "image",
) -> dict:
    """Inject images into the payload.

    Args:
        payload: The payload to inject the images into.
        encoded_images: The encoded images.
        key: The key of the images.

    Returns:
        The payload with the images injected.
    """
    if len(encoded_images) == 0:
        return payload
    if len(encoded_images) > 1:
        images_payload = [
            {"type": "base64", "value": image} for image, _ in encoded_images
        ]
        payload[key] = images_payload
    else:
        payload[key] = {"type": "base64", "value": encoded_images[0][0]}
    return payload

inject_nested_batches_of_images_into_payload ¶

inject_nested_batches_of_images_into_payload(
    payload, encoded_images, key="image"
)

Inject nested batches of images into the payload.

Parameters:

Name	Type	Description	Default
`payload`	`dict`	The payload to inject the images into.	required
`encoded_images`	`Union[list, Tuple[str, Optional[float]]]`	The encoded images.	required
`key`	`str`	The key of the images.	`'image'`

Returns:

Type	Description
`dict`	The payload with the images injected.

Source code in inference_sdk/http/utils/requests.py

def inject_nested_batches_of_images_into_payload(
    payload: dict,
    encoded_images: Union[list, Tuple[str, Optional[float]]],
    key: str = "image",
) -> dict:
    """Inject nested batches of images into the payload.

    Args:
        payload: The payload to inject the images into.
        encoded_images: The encoded images.
        key: The key of the images.

    Returns:
        The payload with the images injected.
    """
    payload_value = _batch_of_images_into_inference_format(
        encoded_images=encoded_images,
    )
    payload[key] = payload_value
    return payload

`utils`¶

General-purpose helpers: lifecycle decorators (@deprecated, @experimental), environment variable parsing, and SDK logging.

inference_sdk.utils.decorators ¶

Classes¶

Functions¶

deprecated ¶

deprecated(reason)

Create a decorator that marks functions as deprecated.

This decorator will emit a warning when the decorated function is called, indicating that the function is deprecated and providing a reason.

Parameters:

Name	Type	Description	Default
`reason`	`str`	The reason why the function is deprecated.	required

Returns:

Name	Type	Description
`callable`		A decorator function that can be applied to mark functions as deprecated.

Source code in inference_sdk/utils/decorators.py

def deprecated(reason: str):
    """Create a decorator that marks functions as deprecated.

    This decorator will emit a warning when the decorated function is called,
    indicating that the function is deprecated and providing a reason.

    Args:
        reason (str): The reason why the function is deprecated.

    Returns:
        callable: A decorator function that can be applied to mark functions as deprecated.
    """

    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            warnings.warn(
                f"{func.__name__} is deprecated: {reason}",
                category=InferenceSDKDeprecationWarning,
                stacklevel=2,
            )
            return func(*args, **kwargs)

        return wrapper

    return decorator

experimental ¶

experimental(info)

Create a decorator that marks functions as experimental.

This decorator will emit a warning when the decorated function is called, indicating that the function is experimental and providing additional information.

Parameters:

Name	Type	Description	Default
`info`	`str`	Information about the experimental status of the function.	required

Returns:

Name	Type	Description
`callable`		A decorator function that can be applied to mark functions as experimental.

Source code in inference_sdk/utils/decorators.py

def experimental(info: str):
    """Create a decorator that marks functions as experimental.

    This decorator will emit a warning when the decorated function is called,
    indicating that the function is experimental and providing additional information.

    Args:
        info (str): Information about the experimental status of the function.

    Returns:
        callable: A decorator function that can be applied to mark functions as experimental.
    """

    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            warnings.warn(
                f"{func.__name__} is experimental: {info}",
                category=InferenceSDKDeprecationWarning,
                stacklevel=2,
            )
            return func(*args, **kwargs)

        return wrapper

    return decorator

inference_sdk.utils.environment ¶

Functions¶

str2bool ¶

str2bool(value)

Convert a string or boolean value to a boolean.

Parameters:

Name	Type	Description	Default
`value`	`Union[str, bool]`	The value to convert. Can be either a string ('true'/'false') or a boolean value.	required

Returns:

Name	Type	Description
`bool`	`bool`	The boolean value. Returns True for 'true' (case-insensitive) or True input, False for 'false' (case-insensitive) or False input.

Raises:

Type	Description
`ValueError`	If the input string is not 'true' or 'false' (case-insensitive).

Source code in inference_sdk/utils/environment.py

def str2bool(value: Union[str, bool]) -> bool:
    """Convert a string or boolean value to a boolean.

    Args:
        value (Union[str, bool]): The value to convert. Can be either a string ('true'/'false')
            or a boolean value.

    Returns:
        bool: The boolean value. Returns True for 'true' (case-insensitive) or True input,
            False for 'false' (case-insensitive) or False input.

    Raises:
        ValueError: If the input string is not 'true' or 'false' (case-insensitive).
    """
    if isinstance(value, bool):
        return value
    if value.lower() == "true":
        return True
    elif value.lower() == "false":
        return False
    else:
        raise ValueError(
            f"Expected a boolean environment variable (true or false) but got '{value}'"
        )

inference_sdk.utils.logging ¶

Centralized logging configuration for the Inference SDK.

Functions¶

get_logger ¶

get_logger(module_name)

Get a logger for the specified module.

Automatically configures basic logging on first use if no handlers exist.

Parameters:

Name	Type	Description	Default
`module_name`	`str`	Name of the module requesting the logger.	required

Returns:

Type	Description
`Logger`	logging.Logger: Configured logger for the module.

Source code in inference_sdk/utils/logging.py

def get_logger(module_name: str) -> logging.Logger:
    """Get a logger for the specified module.

    Automatically configures basic logging on first use if no handlers exist.

    Args:
        module_name: Name of the module requesting the logger.

    Returns:
        logging.Logger: Configured logger for the module.
    """
    global _configured

    sdk_logger = logging.getLogger(SDK_LOGGER_NAME)

    # Configure basic logging on first use if needed
    if not _configured and not sdk_logger.handlers:
        handler = logging.StreamHandler(sys.stderr)
        handler.setFormatter(logging.Formatter("%(levelname)s [%(name)s] %(message)s"))
        sdk_logger.addHandler(handler)
        sdk_logger.setLevel(logging.INFO)
        sdk_logger.propagate = False
        _configured = True

    return logging.getLogger(f"{SDK_LOGGER_NAME}.{module_name}")

`webrtc`¶

WebRTC streaming client for real-time video inference over peer connections. Supports webcam, RTSP, MJPEG, and video file sources with configurable output routing.

inference_sdk.webrtc.client ¶

WebRTC client for the Inference SDK.

Classes¶

WebRTCClient ¶

Namespaced WebRTC API bound to an InferenceHTTPClient instance.

Provides a unified streaming interface for different video sources (webcam, RTSP, video files, manual frames).

Source code in inference_sdk/webrtc/client.py

class WebRTCClient:
    """Namespaced WebRTC API bound to an InferenceHTTPClient instance.

    Provides a unified streaming interface for different video sources
    (webcam, RTSP, video files, manual frames).
    """

    @experimental(
        info="WebRTC SDK is experimental and under active development. "
        "API may change in future releases. Please report issues at "
        "https://github.com/roboflow/inference/issues"
    )
    def __init__(self, api_url: str, api_key: Optional[str]) -> None:
        """Initialize WebRTC client.

        Args:
            api_url: Base URL for the inference API
            api_key: API key for authentication (optional)
        """
        self._api_url = api_url
        self._api_key = api_key

    def stream(
        self,
        source: StreamSource,
        *,
        workflow: Union[str, dict],
        image_input: str = "image",
        workspace: Optional[str] = None,
        config: Optional[StreamConfig] = None,
    ) -> WebRTCSession:
        """Create a WebRTC streaming session.

        Args:
            source: Stream source (WebcamSource, RTSPSource, VideoFileSource, or ManualSource)
            workflow: Either a workflow ID (str) or workflow specification (dict)
            image_input: Name of the image input in the workflow
            workspace: Workspace name (required if workflow is an ID string)
            config: Stream configuration (output routing, FPS, TURN server, etc.)

        Returns:
            WebRTCSession context manager

        Raises:
            InvalidParameterError: If workflow/workspace parameters are invalid

        Examples:
            # Pattern 1: Using run() with decorators (recommended, auto-cleanup)
            from inference_sdk.webrtc import WebcamSource

            session = client.webrtc.stream(
                source=WebcamSource(resolution=(1920, 1080)),
                workflow="object-detection",
                workspace="my-workspace"
            )

            @session.on_frame
            def process_frame(frame, metadata):
                cv2.imshow("Frame", frame)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    session.close()

            session.run()  # Auto-closes on exception or stream end

            # Pattern 2: Using video() iterator (requires context manager or explicit close)
            from inference_sdk.webrtc import RTSPSource

            # Option A: With context manager (recommended)
            with client.webrtc.stream(
                source=RTSPSource("rtsp://camera.local/stream"),
                workflow=workflow_spec_dict
            ) as session:
                for frame, metadata in session.video():
                    cv2.imshow("Frame", frame)
                    if cv2.waitKey(1) & 0xFF == ord('q'):
                        break
            # Auto-cleanup on exit

            # Option B: Manual cleanup (not recommended)
            session = client.webrtc.stream(source=RTSPSource("rtsp://..."), ...)
            for frame, metadata in session.video():
                process(frame)
            session.close()  # Must call close() explicitly!
        """
        # Validate workflow configuration
        workflow_config = self._parse_workflow_config(workflow, workspace)

        # Use default config if not provided
        if config is None:
            config = StreamConfig()

        # Create session
        return WebRTCSession(
            api_url=self._api_url,
            api_key=self._api_key,
            source=source,
            image_input_name=image_input,
            workflow_config=workflow_config,
            stream_config=config,
        )

    def _parse_workflow_config(
        self, workflow: Union[str, dict], workspace: Optional[str]
    ) -> dict:
        """Parse workflow configuration from inputs.

        Args:
            workflow: Either workflow ID (str) or specification (dict)
            workspace: Workspace name (required for ID mode)

        Returns:
            Dictionary with workflow configuration

        Raises:
            InvalidParameterError: If configuration is invalid
        """
        if isinstance(workflow, str):
            # Workflow ID mode - requires workspace
            if not workspace:
                raise InvalidParameterError(
                    "workspace parameter required when workflow is an ID string"
                )
            return {"workflow_id": workflow, "workspace_name": workspace}
        elif isinstance(workflow, dict):
            # Workflow specification mode
            return {"workflow_specification": workflow}
        else:
            raise InvalidParameterError(
                f"workflow must be a string (ID) or dict (specification), got {type(workflow)}"
            )

Functions¶

init ¶

__init__(api_url, api_key)

Initialize WebRTC client.

Parameters:

Name	Type	Description	Default
`api_url`	`str`	Base URL for the inference API	required
`api_key`	`Optional[str]`	API key for authentication (optional)	required

Source code in inference_sdk/webrtc/client.py

@experimental(
    info="WebRTC SDK is experimental and under active development. "
    "API may change in future releases. Please report issues at "
    "https://github.com/roboflow/inference/issues"
)
def __init__(self, api_url: str, api_key: Optional[str]) -> None:
    """Initialize WebRTC client.

    Args:
        api_url: Base URL for the inference API
        api_key: API key for authentication (optional)
    """
    self._api_url = api_url
    self._api_key = api_key

stream ¶

stream(
    source,
    *,
    workflow,
    image_input="image",
    workspace=None,
    config=None
)

Create a WebRTC streaming session.

Parameters:

Name	Type	Description	Default
`source`	`StreamSource`	Stream source (WebcamSource, RTSPSource, VideoFileSource, or ManualSource)	required
`workflow`	`Union[str, dict]`	Either a workflow ID (str) or workflow specification (dict)	required
`image_input`	`str`	Name of the image input in the workflow	`'image'`
`workspace`	`Optional[str]`	Workspace name (required if workflow is an ID string)	`None`
`config`	`Optional[StreamConfig]`	Stream configuration (output routing, FPS, TURN server, etc.)	`None`

Returns:

Type	Description
`WebRTCSession`	WebRTCSession context manager

Raises:

Type	Description
`InvalidParameterError`	If workflow/workspace parameters are invalid

Examples:

Pattern 1: Using run() with decorators (recommended, auto-cleanup)¶

from inference_sdk.webrtc import WebcamSource

session = client.webrtc.stream( source=WebcamSource(resolution=(1920, 1080)), workflow="object-detection", workspace="my-workspace" )

@session.on_frame def process_frame(frame, metadata): cv2.imshow("Frame", frame) if cv2.waitKey(1) & 0xFF == ord('q'): session.close()

session.run() # Auto-closes on exception or stream end

Pattern 2: Using video() iterator (requires context manager or explicit close)¶

from inference_sdk.webrtc import RTSPSource

Option A: With context manager (recommended)¶

with client.webrtc.stream( source=RTSPSource("rtsp://camera.local/stream"), workflow=workflow_spec_dict ) as session: for frame, metadata in session.video(): cv2.imshow("Frame", frame) if cv2.waitKey(1) & 0xFF == ord('q'): break

Auto-cleanup on exit¶

Option B: Manual cleanup (not recommended)¶

session = client.webrtc.stream(source=RTSPSource("rtsp://..."), ...) for frame, metadata in session.video(): process(frame) session.close() # Must call close() explicitly!

Source code in inference_sdk/webrtc/client.py

def stream(
    self,
    source: StreamSource,
    *,
    workflow: Union[str, dict],
    image_input: str = "image",
    workspace: Optional[str] = None,
    config: Optional[StreamConfig] = None,
) -> WebRTCSession:
    """Create a WebRTC streaming session.

    Args:
        source: Stream source (WebcamSource, RTSPSource, VideoFileSource, or ManualSource)
        workflow: Either a workflow ID (str) or workflow specification (dict)
        image_input: Name of the image input in the workflow
        workspace: Workspace name (required if workflow is an ID string)
        config: Stream configuration (output routing, FPS, TURN server, etc.)

    Returns:
        WebRTCSession context manager

    Raises:
        InvalidParameterError: If workflow/workspace parameters are invalid

    Examples:
        # Pattern 1: Using run() with decorators (recommended, auto-cleanup)
        from inference_sdk.webrtc import WebcamSource

        session = client.webrtc.stream(
            source=WebcamSource(resolution=(1920, 1080)),
            workflow="object-detection",
            workspace="my-workspace"
        )

        @session.on_frame
        def process_frame(frame, metadata):
            cv2.imshow("Frame", frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                session.close()

        session.run()  # Auto-closes on exception or stream end

        # Pattern 2: Using video() iterator (requires context manager or explicit close)
        from inference_sdk.webrtc import RTSPSource

        # Option A: With context manager (recommended)
        with client.webrtc.stream(
            source=RTSPSource("rtsp://camera.local/stream"),
            workflow=workflow_spec_dict
        ) as session:
            for frame, metadata in session.video():
                cv2.imshow("Frame", frame)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    break
        # Auto-cleanup on exit

        # Option B: Manual cleanup (not recommended)
        session = client.webrtc.stream(source=RTSPSource("rtsp://..."), ...)
        for frame, metadata in session.video():
            process(frame)
        session.close()  # Must call close() explicitly!
    """
    # Validate workflow configuration
    workflow_config = self._parse_workflow_config(workflow, workspace)

    # Use default config if not provided
    if config is None:
        config = StreamConfig()

    # Create session
    return WebRTCSession(
        api_url=self._api_url,
        api_key=self._api_key,
        source=source,
        image_input_name=image_input,
        workflow_config=workflow_config,
        stream_config=config,
    )

Functions¶

inference_sdk.webrtc.config ¶

Configuration for WebRTC streaming sessions.

Classes¶

StreamConfig `dataclass` ¶

Unified configuration for all WebRTC stream types.

This configuration applies to all stream sources (webcam, RTSP, video file, manual) and controls output routing, processing behavior, and network settings.

Source code in inference_sdk/webrtc/config.py

@dataclass
class StreamConfig:
    """Unified configuration for all WebRTC stream types.

    This configuration applies to all stream sources (webcam, RTSP, video file, manual)
    and controls output routing, processing behavior, and network settings.
    """

    # Output configuration
    stream_output: List[str] = field(default_factory=list)
    """List of workflow output names to stream as video"""

    data_output: List[str] = field(default_factory=list)
    """List of workflow output names to receive via data channel"""

    # Processing configuration
    realtime_processing: bool = True
    """Whether to process frames in realtime (drop if can't keep up) or queue all frames"""

    declared_fps: Optional[float] = None
    """Optional FPS declaration for the stream.

    Note: Some sources (like WebcamSource) auto-detect FPS from the video device and will
    override this value. The source's detected FPS takes precedence over this configuration.
    For sources without auto-detection (like ManualSource), this value will be used if provided.
    """

    # Network configuration
    turn_server: Optional[Dict[str, str]] = None
    """TURN server configuration: {"urls": "turn:...", "username": "...", "credential": "..."}

    Provide this configuration when your network requires a TURN server for WebRTC connectivity.
    TURN is automatically skipped for localhost connections. If not provided, the connection
    will attempt to establish directly without TURN relay.
    """

    # Workflow parameters
    workflow_parameters: Dict[str, Any] = field(default_factory=dict)
    """Parameters to pass to the workflow execution"""

    # Serverless configuration
    requested_plan: Optional[str] = None
    """Requested compute plan for serverless processing (e.g., 'webrtc-gpu-small').

    Only applicable when connecting to Roboflow serverless endpoints.
    """

    requested_region: Optional[str] = None
    """Requested region for processing (e.g., 'us', 'eu').

    Must be a valid Modal region. Only applicable when connecting to Roboflow serverless endpoints.
    See: https://modal.com/docs/guide/region-selection#region-options
    """

    processing_timeout: Optional[int] = None
    """Timeout in seconds for the server-side processing session.

    Controls how long the serverless function or worker process is allowed to run.
    If not set, the server uses its default (WEBRTC_MODAL_FUNCTION_TIME_LIMIT).
    Only applicable when connecting to Roboflow serverless endpoints.
    """

Attributes¶

data_output `class-attribute` `instance-attribute` ¶

data_output = field(default_factory=list)

List of workflow output names to receive via data channel

declared_fps `class-attribute` `instance-attribute` ¶

declared_fps = None

Optional FPS declaration for the stream.

Note: Some sources (like WebcamSource) auto-detect FPS from the video device and will override this value. The source's detected FPS takes precedence over this configuration. For sources without auto-detection (like ManualSource), this value will be used if provided.

processing_timeout `class-attribute` `instance-attribute` ¶

processing_timeout = None

Timeout in seconds for the server-side processing session.

Controls how long the serverless function or worker process is allowed to run. If not set, the server uses its default (WEBRTC_MODAL_FUNCTION_TIME_LIMIT). Only applicable when connecting to Roboflow serverless endpoints.

realtime_processing `class-attribute` `instance-attribute` ¶

realtime_processing = True

Whether to process frames in realtime (drop if can't keep up) or queue all frames

requested_plan `class-attribute` `instance-attribute` ¶

requested_plan = None

Requested compute plan for serverless processing (e.g., 'webrtc-gpu-small').

Only applicable when connecting to Roboflow serverless endpoints.

requested_region `class-attribute` `instance-attribute` ¶

requested_region = None

Requested region for processing (e.g., 'us', 'eu').

Must be a valid Modal region. Only applicable when connecting to Roboflow serverless endpoints. See: https://modal.com/docs/guide/region-selection#region-options

stream_output `class-attribute` `instance-attribute` ¶

stream_output = field(default_factory=list)

List of workflow output names to stream as video

turn_server `class-attribute` `instance-attribute` ¶

turn_server = None

TURN server configuration: {"urls": "turn:...", "username": "...", "credential": "..."}

Provide this configuration when your network requires a TURN server for WebRTC connectivity. TURN is automatically skipped for localhost connections. If not provided, the connection will attempt to establish directly without TURN relay.

workflow_parameters `class-attribute` `instance-attribute` ¶

workflow_parameters = field(default_factory=dict)

Parameters to pass to the workflow execution

inference_sdk.webrtc.datachannel ¶

WebRTC data channel binary chunking utilities.

Classes¶

ChunkReassembler ¶

Helper to reassemble chunked binary messages.

Source code in inference_sdk/webrtc/datachannel.py

class ChunkReassembler:
    """Helper to reassemble chunked binary messages."""

    def __init__(self):
        """Initialize the chunk reassembler."""
        self._chunks: Dict[int, Dict[int, bytes]] = (
            {}
        )  # {frame_id: {chunk_index: data}}
        self._total: Dict[int, int] = {}  # {frame_id: total_chunks}

    def add_chunk(self, message: bytes) -> Tuple[Optional[bytes], Optional[int]]:
        """Parse and add a chunk, returning complete payload and frame_id if all chunks received.

        Args:
            message: Raw binary message with 12-byte header

        Returns:
            Tuple of (payload, frame_id) if complete, (None, None) otherwise
        """
        # Parse the binary message
        frame_id, chunk_index, total_chunks, chunk_data = _parse_chunked_binary_message(
            message
        )

        # Initialize buffers for new frame
        if frame_id not in self._chunks:
            self._chunks[frame_id] = {}
            self._total[frame_id] = total_chunks

        # Store chunk
        self._chunks[frame_id][chunk_index] = chunk_data

        # Check if all chunks received
        if len(self._chunks[frame_id]) >= total_chunks:
            # Reassemble in order
            complete_payload = b"".join(
                self._chunks[frame_id][i] for i in range(total_chunks)
            )

            # Clean up buffers for completed frame - this is the key part!
            del self._chunks[frame_id]
            del self._total[frame_id]

            return complete_payload, frame_id

        return None, None

Functions¶

init ¶

__init__()

Initialize the chunk reassembler.

Source code in inference_sdk/webrtc/datachannel.py

def __init__(self):
    """Initialize the chunk reassembler."""
    self._chunks: Dict[int, Dict[int, bytes]] = (
        {}
    )  # {frame_id: {chunk_index: data}}
    self._total: Dict[int, int] = {}  # {frame_id: total_chunks}

add_chunk ¶

add_chunk(message)

Parse and add a chunk, returning complete payload and frame_id if all chunks received.

Parameters:

Name	Type	Description	Default
`message`	`bytes`	Raw binary message with 12-byte header	required

Returns:

Type	Description
`Tuple[Optional[bytes], Optional[int]]`	Tuple of (payload, frame_id) if complete, (None, None) otherwise

Source code in inference_sdk/webrtc/datachannel.py

def add_chunk(self, message: bytes) -> Tuple[Optional[bytes], Optional[int]]:
    """Parse and add a chunk, returning complete payload and frame_id if all chunks received.

    Args:
        message: Raw binary message with 12-byte header

    Returns:
        Tuple of (payload, frame_id) if complete, (None, None) otherwise
    """
    # Parse the binary message
    frame_id, chunk_index, total_chunks, chunk_data = _parse_chunked_binary_message(
        message
    )

    # Initialize buffers for new frame
    if frame_id not in self._chunks:
        self._chunks[frame_id] = {}
        self._total[frame_id] = total_chunks

    # Store chunk
    self._chunks[frame_id][chunk_index] = chunk_data

    # Check if all chunks received
    if len(self._chunks[frame_id]) >= total_chunks:
        # Reassemble in order
        complete_payload = b"".join(
            self._chunks[frame_id][i] for i in range(total_chunks)
        )

        # Clean up buffers for completed frame - this is the key part!
        del self._chunks[frame_id]
        del self._total[frame_id]

        return complete_payload, frame_id

    return None, None

VideoFileUploader ¶

Uploads a video file through a WebRTC datachannel in chunks.

Protocol: [chunk_index:u32][total_chunks:u32][payload] Server auto-completes when all chunks received.

Source code in inference_sdk/webrtc/datachannel.py

class VideoFileUploader:
    """Uploads a video file through a WebRTC datachannel in chunks.

    Protocol: [chunk_index:u32][total_chunks:u32][payload]
    Server auto-completes when all chunks received.
    """

    def __init__(
        self,
        path: str,
        channel: "RTCDataChannel",
        chunk_size: int = WEBRTC_VIDEO_UPLOAD_CHUNK_SIZE,
        buffer_limit: int = WEBRTC_VIDEO_UPLOAD_BUFFER_LIMIT,
    ):
        self._path = path
        self._channel = channel
        self._chunk_size = chunk_size
        self._buffer_limit = buffer_limit
        self._file_size = os.path.getsize(path)
        self._total_chunks = (self._file_size + chunk_size - 1) // chunk_size
        self._uploaded_chunks = 0

    @property
    def total_chunks(self) -> int:
        """Total number of chunks to upload."""
        return self._total_chunks

    @property
    def uploaded_chunks(self) -> int:
        """Number of chunks uploaded so far."""
        return self._uploaded_chunks

    @property
    def file_size(self) -> int:
        """Size of the file in bytes."""
        return self._file_size

    async def upload(
        self, on_progress: Optional[Callable[[int, int], None]] = None
    ) -> None:
        """Upload the file in chunks with backpressure handling.

        Args:
            on_progress: Optional callback called after each chunk with
                (uploaded_chunks, total_chunks)

        Raises:
            RuntimeError: If channel closes during upload
        """
        with open(self._path, "rb") as f:
            for chunk_idx in range(self._total_chunks):
                if self._channel.readyState != "open":
                    raise RuntimeError("Upload channel closed during upload")

                chunk_data = f.read(self._chunk_size)
                message = create_video_upload_chunk(
                    chunk_idx, self._total_chunks, chunk_data
                )

                # Backpressure: wait for buffer to drain
                while self._channel.bufferedAmount > self._buffer_limit:
                    await asyncio.sleep(0.01)
                    if self._channel.readyState != "open":
                        raise RuntimeError(
                            "Upload channel closed during backpressure wait"
                        )

                self._channel.send(message)
                self._uploaded_chunks = chunk_idx + 1

                if on_progress:
                    on_progress(self._uploaded_chunks, self._total_chunks)

                if chunk_idx % 10 == 0:
                    await asyncio.sleep(0)

Attributes¶

file_size `property` ¶

file_size

Size of the file in bytes.

total_chunks `property` ¶

total_chunks

Total number of chunks to upload.

uploaded_chunks `property` ¶

uploaded_chunks

Number of chunks uploaded so far.

Functions¶

upload `async` ¶

upload(on_progress=None)

Upload the file in chunks with backpressure handling.

Parameters:

Name	Type	Description	Default
`on_progress`	`Optional[Callable[[int, int], None]]`	Optional callback called after each chunk with (uploaded_chunks, total_chunks)	`None`

Raises:

Type	Description
`RuntimeError`	If channel closes during upload

Source code in inference_sdk/webrtc/datachannel.py

async def upload(
    self, on_progress: Optional[Callable[[int, int], None]] = None
) -> None:
    """Upload the file in chunks with backpressure handling.

    Args:
        on_progress: Optional callback called after each chunk with
            (uploaded_chunks, total_chunks)

    Raises:
        RuntimeError: If channel closes during upload
    """
    with open(self._path, "rb") as f:
        for chunk_idx in range(self._total_chunks):
            if self._channel.readyState != "open":
                raise RuntimeError("Upload channel closed during upload")

            chunk_data = f.read(self._chunk_size)
            message = create_video_upload_chunk(
                chunk_idx, self._total_chunks, chunk_data
            )

            # Backpressure: wait for buffer to drain
            while self._channel.bufferedAmount > self._buffer_limit:
                await asyncio.sleep(0.01)
                if self._channel.readyState != "open":
                    raise RuntimeError(
                        "Upload channel closed during backpressure wait"
                    )

            self._channel.send(message)
            self._uploaded_chunks = chunk_idx + 1

            if on_progress:
                on_progress(self._uploaded_chunks, self._total_chunks)

            if chunk_idx % 10 == 0:
                await asyncio.sleep(0)

Functions¶

create_video_upload_chunk ¶

create_video_upload_chunk(chunk_index, total_chunks, data)

Create a video upload chunk message.

Format: [chunk_index:u32][total_chunks:u32][payload] All integers are uint32 little-endian.

Parameters:

Name	Type	Description	Default
`chunk_index`	`int`	Zero-based index of this chunk	required
`total_chunks`	`int`	Total number of chunks in the file	required
`data`	`bytes`	Chunk payload bytes	required

Returns:

Type	Description
`bytes`	Binary message with 8-byte header + payload

Source code in inference_sdk/webrtc/datachannel.py

def create_video_upload_chunk(
    chunk_index: int, total_chunks: int, data: bytes
) -> bytes:
    """Create a video upload chunk message.

    Format: [chunk_index:u32][total_chunks:u32][payload]
    All integers are uint32 little-endian.

    Args:
        chunk_index: Zero-based index of this chunk
        total_chunks: Total number of chunks in the file
        data: Chunk payload bytes

    Returns:
        Binary message with 8-byte header + payload
    """
    return struct.pack("<II", chunk_index, total_chunks) + data

inference_sdk.webrtc.session ¶

WebRTC session management.

Classes¶

SessionState ¶

Bases: Enum

WebRTC session lifecycle states.

Source code in inference_sdk/webrtc/session.py

class SessionState(Enum):
    """WebRTC session lifecycle states."""

    NOT_STARTED = "not_started"
    STARTED = "started"
    CLOSED = "closed"

VideoMetadata `dataclass` ¶

Metadata about a video frame received from WebRTC stream.

This metadata is attached to each frame processed by the server and can be used to track frame timing, synchronization, and processing information.

Attributes:

Name	Type	Description
`frame_id`	`int`	Unique identifier for this frame in the stream
`received_at`	`datetime`	Timestamp when the server received the frame
`pts`	`Optional[int]`	Presentation timestamp from the video stream (optional)
`time_base`	`Optional[float]`	Time base for interpreting pts values (optional)
`declared_fps`	`Optional[float]`	Declared/expected frames per second (optional)
`measured_fps`	`Optional[float]`	Measured actual frames per second (optional)

Source code in inference_sdk/webrtc/session.py

@dataclass
class VideoMetadata:
    """Metadata about a video frame received from WebRTC stream.

    This metadata is attached to each frame processed by the server
    and can be used to track frame timing, synchronization, and
    processing information.

    Attributes:
        frame_id: Unique identifier for this frame in the stream
        received_at: Timestamp when the server received the frame
        pts: Presentation timestamp from the video stream (optional)
        time_base: Time base for interpreting pts values (optional)
        declared_fps: Declared/expected frames per second (optional)
        measured_fps: Measured actual frames per second (optional)
    """

    frame_id: int
    received_at: datetime
    pts: Optional[int] = None
    time_base: Optional[float] = None
    declared_fps: Optional[float] = None
    measured_fps: Optional[float] = None

WebRTCSession ¶

WebRTC session for streaming video and receiving inference results.

This class manages the WebRTC peer connection, video streaming, and data channel communication with the inference server.

The session automatically starts on first use (e.g., calling run() or video()). Call close() to cleanup resources, or rely on del for automatic cleanup.

Example

session = client.webrtc.stream(source=source, workflow=workflow)

@session.on_frame def process_frame(frame, metadata): cv2.imshow("Frame", frame) if cv2.waitKey(1) & 0xFF == ord('q'): session.close()

session.run() # Auto-starts, auto-closes on exception

Source code in inference_sdk/webrtc/session.py

class WebRTCSession:
    """WebRTC session for streaming video and receiving inference results.

    This class manages the WebRTC peer connection, video streaming,
    and data channel communication with the inference server.

    The session automatically starts on first use (e.g., calling run() or video()).
    Call close() to cleanup resources, or rely on __del__ for automatic cleanup.

    Example:
        session = client.webrtc.stream(source=source, workflow=workflow)

        @session.on_frame
        def process_frame(frame, metadata):
            cv2.imshow("Frame", frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                session.close()

        session.run()  # Auto-starts, auto-closes on exception
    """

    def __init__(
        self,
        api_url: str,
        api_key: Optional[str],
        source: StreamSource,
        image_input_name: str,
        workflow_config: dict,
        stream_config: StreamConfig,
    ) -> None:
        """Initialize WebRTC session.

        Args:
            api_url: Inference server API URL
            api_key: API key for authentication
            source: Stream source instance
            image_input_name: Name of image input in workflow
            workflow_config: Workflow configuration dict
            stream_config: Stream configuration
        """

        self._state: SessionState = SessionState.NOT_STARTED
        self._state_lock: threading.Lock = threading.Lock()

        self._api_url = api_url.rstrip("/")
        self._api_key = api_key
        self._source = source
        self._image_input_name = image_input_name
        self._workflow_config = workflow_config
        self._config = stream_config

        # Internal state
        self._loop: Optional[asyncio.AbstractEventLoop] = None
        self._loop_thread: Optional[threading.Thread] = None
        self._pc: Optional["RTCPeerConnection"] = None
        self._video_queue: "Queue[Optional[tuple[np.ndarray, VideoMetadata]]]" = Queue(
            maxsize=WEBRTC_VIDEO_QUEUE_MAX_SIZE
        )
        self._video_through_datachannel = False

        # Callback handlers
        self._frame_handlers: List[Callable] = []
        self._data_field_handlers: Dict[str, List[Callable]] = {}
        self._data_global_handler: Optional[Callable] = None

        # Chunk reassembly for binary messages
        self._chunk_reassembler = ChunkReassembler()

        # Public APIs
        self.video = _VideoStream(self, self._video_queue)

    def _init_connection(self) -> None:
        """Initialize event loop, thread, and WebRTC connection."""
        # Start event loop in background thread
        self._loop = asyncio.new_event_loop()

        def _run(loop: asyncio.AbstractEventLoop) -> None:
            asyncio.set_event_loop(loop)
            loop.run_forever()

        self._loop_thread = threading.Thread(
            target=_run, args=(self._loop,), daemon=True
        )
        self._loop_thread.start()

        # Initialize WebRTC connection
        fut = asyncio.run_coroutine_threadsafe(self._init(), self._loop)
        try:
            fut.result()
        except requests.exceptions.HTTPError as e:
            if e.response.status_code == 404:
                raise RuntimeError(
                    f"WebRTC endpoint not found at {self._api_url}/initialise_webrtc_worker.\n"
                    f"This API URL may not support WebRTC streaming.\n"
                    f"Troubleshooting:\n"
                    f"  - For self-hosted inference, ensure the server is started with WebRTC enabled\n"
                    f"  - For Roboflow Cloud, use a dedicated inference server URL (not serverless.roboflow.com)\n"
                    f"  - Verify the --api-url parameter points to the correct server\n"
                    f"Response: {e.response.text}"
                ) from e
            else:
                raise RuntimeError(
                    f"Failed to initialize WebRTC session (HTTP {e.response.status_code}).\n"
                    f"API URL: {self._api_url}\n"
                    f"Error: {e}\n"
                    f"Response: {e.response.text}"
                ) from e
        except Exception as e:
            raise RuntimeError(
                f"Failed to initialize WebRTC session: {e.__class__.__name__}: {e}\n"
                f"API URL: {self._api_url}"
            ) from e

    def _ensure_started(self) -> None:
        """Ensure connection is started (thread-safe, idempotent)."""
        with self._state_lock:
            if self._state == SessionState.NOT_STARTED:
                self._state = SessionState.STARTED
                self._init_connection()
            elif self._state == SessionState.CLOSED:
                raise RuntimeError("Cannot use closed WebRTCSession")

    def _parse_video_metadata(
        self, video_metadata_dict: Optional[dict]
    ) -> Optional[VideoMetadata]:
        """Parse video metadata from message dict.

        Args:
            video_metadata_dict: Dictionary containing video metadata fields

        Returns:
            VideoMetadata instance or None if parsing fails or dict is None
        """
        if not video_metadata_dict:
            return None

        try:
            return VideoMetadata(
                frame_id=video_metadata_dict["frame_id"],
                received_at=datetime.fromisoformat(video_metadata_dict["received_at"]),
                pts=video_metadata_dict.get("pts"),
                time_base=video_metadata_dict.get("time_base"),
                declared_fps=video_metadata_dict.get("declared_fps"),
                measured_fps=video_metadata_dict.get("measured_fps"),
            )
        except (KeyError, ValueError, TypeError) as e:
            logger.warning(f"Failed to parse video_metadata: {e}")
            return None

    def close(self) -> None:
        """Close session and cleanup all resources. Idempotent - safe to call multiple times.

        This method closes the WebRTC peer connection, releases source resources
        (webcam, video files, etc.), stops the event loop, and joins the background thread.

        It's safe to call this multiple times - subsequent calls are no-ops.

        Example:
            session = client.webrtc.stream(source=source, workflow=workflow)
            session.run()  # Auto-starts and auto-closes on exception
            session.close()  # Explicit cleanup (or let __del__ handle it)
        """
        with self._state_lock:
            if self._state == SessionState.CLOSED:
                return  # Already closed, nothing to do
            self._state = SessionState.CLOSED

        # Signal video iterator to stop by putting None sentinel
        try:
            self._video_queue.put_nowait(None)
        except Exception:
            pass  # Queue might be full, but that's okay

        # Cleanup resources (nested finally ensures all cleanup steps execute)
        try:
            # Close peer connection
            if self._loop and self._pc:
                asyncio.run_coroutine_threadsafe(self._pc.close(), self._loop).result()
        finally:
            try:
                # Cleanup source (webcam, video file, etc.)
                if self._loop and self._source:
                    asyncio.run_coroutine_threadsafe(
                        self._source.cleanup(), self._loop
                    ).result()
            finally:
                # Stop event loop and join thread
                if self._loop:
                    self._loop.call_soon_threadsafe(self._loop.stop)
                if self._loop_thread:
                    self._loop_thread.join(timeout=WEBRTC_EVENT_LOOP_SHUTDOWN_TIMEOUT)

    def __enter__(self) -> "WebRTCSession":
        """Enter context manager - returns self.

        Returns:
            WebRTCSession: The session instance for use in with statement.
        """
        return self

    def __exit__(self, exc_type, exc_val, exc_tb) -> None:
        """Exit context manager - automatically closes the session.

        Args:
            exc_type: Exception type if an exception occurred, None otherwise.
            exc_val: Exception value if an exception occurred, None otherwise.
            exc_tb: Exception traceback if an exception occurred, None otherwise.
        """
        self.close()

    def __del__(self) -> None:
        """Cleanup if user forgot to close. Not guaranteed to run immediately."""
        try:
            if self._state == SessionState.STARTED:
                logger.warning(
                    "WebRTCSession was not properly closed. "
                    "Consider calling session.close() explicitly for immediate cleanup."
                )
                self.close()
        except Exception:
            pass  # Never raise from __del__

    def wait(self, timeout: Optional[float] = None) -> None:
        """Wait for session to complete.

        Blocks until the video stream ends (None received) or timeout expires.
        Automatically starts the session if not already started.

        Args:
            timeout: Maximum time to wait in seconds (None for indefinite)

        Raises:
            TimeoutError: If timeout expires before stream ends
        """
        self._ensure_started()
        try:
            while True:
                frame_data = self._video_queue.get(timeout=timeout)
                if frame_data is None:
                    break
        except queue.Empty:
            if timeout is not None:
                raise TimeoutError(
                    f"WebRTC session wait() timed out after {timeout}s.\n"
                    "The video stream did not end within the timeout period."
                )

    def on_frame(self, callback: Callable) -> Callable:
        """Decorator to register frame callback handlers.

        The registered handlers will be called for each video frame received
        when using the run() method. Handlers must accept two parameters:
        - frame: BGR numpy array (np.ndarray)
        - metadata: Video metadata (VideoMetadata) extracted from the video frame

        Args:
            callback: Callback function that accepts (frame, metadata)

        Returns:
            The callback itself

        Examples:
            @session.on_frame
            def process_frame(frame: np.ndarray, metadata: VideoMetadata):
                print(f"Frame {metadata.frame_id} - PTS: {metadata.pts}")
                cv2.imshow("Frame", frame)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    session.stop()
        """
        self._frame_handlers.append(callback)
        return callback

    def on_data(self, field_name: Optional[str] = None) -> Callable:
        """Decorator to register data channel callback handlers.

        Can be used with or without parentheses:
            @session.on_data          # without parentheses (global handler)
            @session.on_data()        # with parentheses (global handler)
            @session.on_data("field") # with field name (field-specific handler)

        Args:
            field_name: If provided, handler receives only that field's value.
                       If None, handler receives entire serialized_output_data dict.

        Returns:
            Decorator function or decorated function

        Examples:
            # Global handler without parentheses
            @session.on_data
            def handle_all(data: dict, metadata: VideoMetadata):
                print(f"All data: {data}")

            # Field-specific handler
            @session.on_data("predictions")
            def handle_predictions(data: dict, metadata: VideoMetadata):
                print(f"Frame {metadata.frame_id}: {data}")

            # Field-specific handler (no metadata)
            @session.on_data("predictions")
            def handle_predictions(data: dict):
                print(data)

            # Global handler with parentheses
            @session.on_data()
            def handle_all(data: dict, metadata: VideoMetadata):
                print(f"All data: {data}")
        """
        # Check if being used without parentheses: @session.on_data
        # In this case, field_name is actually the function being decorated
        if callable(field_name):
            fn = field_name
            self._data_global_handler = fn
            return fn

        # Being used with parentheses: @session.on_data() or @session.on_data("field")
        def decorator(fn: Callable) -> Callable:
            if field_name is None:
                self._data_global_handler = fn
            else:
                if field_name not in self._data_field_handlers:
                    self._data_field_handlers[field_name] = []
                self._data_field_handlers[field_name].append(fn)
            return fn

        return decorator

    def run(self) -> None:
        """Block and process frames until close() is called or stream ends.

        This method iterates over incoming video frames and invokes all
        registered frame handlers for each frame. Automatically starts
        the session if not already started.

        The session automatically closes when this method exits, whether
        normally or due to an exception, ensuring resources are always
        cleaned up.

        Blocks until either:
        - close() is called (e.g., from a callback)
        - The video stream ends naturally
        - An exception occurs (session auto-closes, exception re-raised)
        - KeyboardInterrupt (Ctrl+C) is received (session auto-closes)

        Data channel handlers are invoked automatically when data arrives,
        independent of this method.

        Example:
            session = client.webrtc.stream(source=source, workflow=workflow)

            @session.on_frame
            def process(frame, metadata):
                print(f"Frame {metadata.frame_id} - PTS: {metadata.pts}")
                cv2.imshow("Frame", frame)
                if cv2.waitKey(1) & 0xFF == ord('q'):
                    session.close()  # Exits run() and cleans up

            session.run()  # Auto-starts, auto-closes, blocks here
        """
        with self:
            for frame, metadata in self.video():
                # Invoke all registered frame handlers with both parameters
                for handler in self._frame_handlers:
                    try:
                        handler(frame, metadata)
                    except Exception:
                        logger.warning("Error in frame handler", exc_info=True)

    @staticmethod
    @functools.lru_cache(maxsize=100)
    def _data_handler_length(handler: Callable) -> int:
        """Get the number of parameters expected by a data handler.

        Args:
            handler: The handler callable to inspect

        Returns:
            The number of parameters expected by the handler
        """
        sig = inspect.signature(handler)
        return len(sig.parameters)

    def _invoke_data_handler(
        self, handler: Callable, value: Any, metadata: Optional[VideoMetadata]
    ) -> None:  # noqa: ANN401
        """Invoke data handler with appropriate signature (auto-detect via introspection).

        Supports two signatures:
        - handler(value, metadata) - receives both value and metadata
        - handler(value) - receives only value

        Args:
            handler: The handler callable to invoke
            value: The data value to pass
            metadata: Optional video metadata to pass
        """
        try:
            if WebRTCSession._data_handler_length(handler) >= 2:
                # Handler expects both value and metadata
                handler(value, metadata)
            else:
                # Handler expects only value
                handler(value)
        except Exception:
            logger.exception(
                f"Failed to invoke handler {handler}. The handler should have 2 parameters with signature: handler(value, metadata) or handler(value)."
            )
            raise

    @staticmethod
    def _to_list(value: Any) -> List[Any]:
        """Convert value to list if it is not already a list."""
        if isinstance(value, list):
            return value
        return [value]

    def _send_ack(self, frame_id: int, channel: "RTCDataChannel") -> None:
        """Send cumulative ACK for flow control (only when realtime_processing=False)."""
        if self._config.realtime_processing:
            return
        if channel.readyState == "open":
            channel.send(json.dumps({"ack": frame_id}))

    async def _get_turn_config(self) -> Optional[RTCConfiguration]:
        """Get TURN configuration from user-provided config or Roboflow API.

        Priority order:
        1. User-provided config via StreamConfig.turn_server (highest priority)
        2. Auto-fetch from Roboflow API for serverless connections
        3. Return None for non-serverless connections

        Returns:
            TURN configuration dict or None
        """
        turn_config = None
        # 1. Use user-provided config if available
        if self._config.turn_server:
            turn_config = self._config.turn_server
            logger.debug("Using user-provided TURN configuration")

        # 2. Auto-fetch from Roboflow API for Roboflow-hosted connections
        elif self._api_url in ALL_ROBOFLOW_API_URLS:
            try:
                logger.debug(
                    "Fetching TURN config from Roboflow API for serverless connection"
                )
                response = requests.get(
                    f"{RF_API_BASE_URL}/webrtc_turn_config",
                    params={"api_key": self._api_key},
                    timeout=5,
                )
                response.raise_for_status()
                turn_config = response.json()
                logger.debug("Successfully fetched TURN config from Roboflow API")
            except Exception as e:
                logger.warning(f"Failed to fetch TURN config from Roboflow API: {e}")
                return None
        # standardize the TURN config to the iceServers format
        if turn_config and "iceServers" in turn_config:
            turn_config = RTCConfiguration(
                iceServers=[
                    RTCIceServer(
                        urls=WebRTCSession._to_list(server.get("urls", [])),
                        username=server.get("username"),
                        credential=server.get("credential"),
                    )
                    for server in turn_config["iceServers"]
                ]
            )
            logger.debug("Successfully converted TURN config to iceServers format")
        elif turn_config and "urls" in turn_config:
            turn_config = RTCConfiguration(
                iceServers=[
                    RTCIceServer(
                        urls=[turn_config["urls"]],
                        username=turn_config["username"],
                        credential=turn_config["credential"],
                    )
                ]
            )
            logger.debug("Successfully converted TURN config to iceServers format")
        return turn_config

    def _handle_datachannel_video_frame(
        self, serialized_data: Any, metadata: Optional[VideoMetadata]
    ) -> None:
        """Handle video frame received through data channel.

        Args:
            serialized_data: The serialized output data containing base64 image
            metadata: Video metadata for the frame
        """
        for output_name in self._config.stream_output:
            if not output_name or output_name not in serialized_data:
                continue
            img_data = serialized_data[output_name]
            if isinstance(img_data, dict) and img_data.get("type") == "base64":
                try:
                    # Decode base64 image and queue it
                    frame = _decode_base64_image(img_data["value"])
                    # Backpressure: drop oldest frame if queue full
                    if self._video_queue.full():
                        try:
                            self._video_queue.get_nowait()
                        except Exception:
                            pass
                    self._video_queue.put_nowait((frame, metadata))
                except Exception:
                    logger.warning(
                        f"Failed to decode base64 image from {output_name}",
                        exc_info=True,
                    )
                break  # Only process first matching image

    async def _init(self) -> None:
        """Initialize WebRTC connection.

        Sets up peer connection, configures source, negotiates with server.
        """
        # Check dependencies and import them
        _check_webrtc_dependencies()
        from aiortc import (
            RTCConfiguration,
            RTCIceServer,
            RTCPeerConnection,
            RTCSessionDescription,
        )
        from aiortc.contrib.media import MediaRelay
        from av import VideoFrame

        # Fetch TURN configuration (auto-fetch or user-provided)
        turn_config = await self._get_turn_config()

        pc = RTCPeerConnection(configuration=turn_config)
        relay = MediaRelay()

        # Monitor ICE connection state for failures
        # ICE consent expires after ~30s if STUN Binding Indications aren't sent.
        # This happens when event loop is starved (e.g., tight send loops).
        @pc.on("iceconnectionstatechange")
        async def _on_ice_connection_state_change() -> None:
            state = pc.iceConnectionState
            logger.info(f"ICE connection state: {state}")

            if state == "failed":
                logger.error(
                    "ICE connection failed - likely consent expiry. "
                    "This happens when the event loop is blocked and aioice "
                    "cannot send STUN consent refresh packets. Ensure code "
                    "yields to event loop (asyncio.sleep(0)) during long operations."
                )
                # Signal session to close
                try:
                    self._video_queue.put_nowait(None)
                except Exception:
                    pass
            elif state == "closed":
                logger.info("ICE connection closed - signaling end of stream")
                try:
                    self._video_queue.put_nowait(None)
                except Exception:
                    pass
            elif state == "disconnected":
                logger.warning(
                    "ICE connection disconnected - may recover automatically. "
                    "If this persists, connection will transition to 'failed'."
                )

        @pc.on("connectionstatechange")
        async def _on_connection_state_change() -> None:
            state = pc.connectionState
            logger.info(f"Connection state: {state}")
            if state in ("failed", "closed"):
                if state == "failed":
                    logger.error("Connection failed - closing session")
                else:
                    logger.info("Connection closed - signaling end of stream")
                try:
                    self._video_queue.put_nowait(None)
                except Exception:
                    pass

        # Setup video receiver for frames from server
        @pc.on("track")
        def _on_track(track):  # noqa: ANN001
            subscribed = relay.subscribe(track)

            async def _reader():
                from aiortc.mediastreams import MediaStreamError

                while True:
                    try:
                        f: VideoFrame = await subscribed.recv()
                    except MediaStreamError:
                        # Remote stream finished normally
                        logger.info("Remote stream finished")
                        try:
                            self._video_queue.put_nowait(None)
                        except Exception:
                            pass
                        break
                    except Exception as e:
                        # Connection closed or track ended unexpectedly
                        logger.error(
                            f"WebRTC video track ended: {e.__class__.__name__}: {e}",
                            exc_info=True,
                        )
                        try:
                            self._video_queue.put_nowait(None)
                        except Exception:
                            pass
                        break
                    img = f.to_ndarray(format="bgr24")
                    current_metadata = VideoMetadata(
                        frame_id=f.pts,
                        received_at=datetime.now(),
                        pts=f.pts,
                        time_base=f.time_base,
                        declared_fps=None,
                        measured_fps=None,
                    )
                    # Backpressure: drop oldest frame if queue full
                    if self._video_queue.full():
                        try:
                            _ = self._video_queue.get_nowait()
                        except Exception:
                            pass
                    try:
                        self._video_queue.put_nowait((img, current_metadata))
                    except Exception:
                        pass

            asyncio.ensure_future(_reader())

        # Setup data channel
        ch = pc.createDataChannel("inference")

        # Setup data channel message handler
        @ch.on("message")
        def _on_data_message(message: Any) -> None:  # noqa: ANN401
            try:
                # Handle both bytes and str messages
                if isinstance(message, bytes):
                    # Check if it's a chunked binary message
                    if len(message) >= 12:
                        try:
                            # Try to reassemble chunks
                            complete_payload, _ = self._chunk_reassembler.add_chunk(
                                message
                            )
                            if complete_payload is None:
                                # Not all chunks received yet
                                return
                            # Server may send gzip-compressed JSON when data_output is set
                            # Gzip magic bytes: \x1f\x8b
                            if (
                                len(complete_payload) >= 2
                                and complete_payload[:2] == b"\x1f\x8b"
                            ):
                                complete_payload = gzip.decompress(complete_payload)
                            # Parse the complete JSON from reassembled payload
                            message = complete_payload.decode("utf-8")
                        except (struct.error, ValueError):
                            # Not a chunked message, try to decode as regular UTF-8
                            message = message.decode("utf-8")
                    else:
                        # Too short to be chunked, decode as regular UTF-8
                        message = message.decode("utf-8")

                parsed_message = json.loads(message)

                # Handle processing_complete signal (video file finished)
                if parsed_message.get("processing_complete"):
                    logger.info("Received processing_complete signal")
                    try:
                        self._video_queue.put_nowait(None)
                    except Exception:
                        pass
                    return

                # Extract video metadata if present (for data handlers)
                metadata = self._parse_video_metadata(
                    parsed_message.get("video_metadata")
                )

                # Get serialized output data
                serialized_data = parsed_message.get("serialized_output_data")

                # Check for base64 image in stream_output fields (for VideoFileSource)
                # This enables receiving frames via data channel instead of video track
                if serialized_data and self._video_through_datachannel:
                    self._handle_datachannel_video_frame(serialized_data, metadata)

                # Call global handler if registered
                if self._data_global_handler:
                    try:
                        # filter out video frames if video is sent through datachannel
                        filtered_data = serialized_data
                        if self._video_through_datachannel and serialized_data:
                            filtered_data = {
                                k: v
                                for k, v in serialized_data.items()
                                if k not in self._config.stream_output
                            }
                        self._invoke_data_handler(
                            self._data_global_handler, filtered_data, metadata
                        )
                    except Exception:
                        logger.warning(
                            "Error calling global data handler", exc_info=True
                        )

                # Route to field-specific handlers
                if isinstance(serialized_data, dict):
                    for field_name, field_value in serialized_data.items():
                        if field_name in self._data_field_handlers:
                            for handler in list(self._data_field_handlers[field_name]):
                                try:
                                    self._invoke_data_handler(
                                        handler, field_value, metadata
                                    )
                                except Exception:
                                    logger.warning(
                                        f"Error calling handler for field '{field_name}'",
                                        exc_info=True,
                                    )

                # Send ACK for flow control (only when realtime_processing=False)
                if metadata and metadata.frame_id is not None:
                    self._send_ack(metadata.frame_id, ch)
            except json.JSONDecodeError:
                logger.warning("Failed to parse data channel message as JSON")

        # Let source configure the peer connection
        # (adds tracks for webcam/video/manual, or recvonly transceiver for RTSP)
        await self._source.configure_peer_connection(pc)

        # Create offer and wait for ICE gathering
        offer = await pc.createOffer()
        await pc.setLocalDescription(offer)

        # Wait for ICE gathering to complete
        while pc.iceGatheringState != "complete":
            await asyncio.sleep(0.1)

        # Build server initialization payload
        wf_conf: Dict[str, Any] = {
            "type": "WorkflowConfiguration",
            "image_input_name": self._image_input_name,
            "workflows_parameters": self._config.workflow_parameters,
        }
        wf_conf.update(self._workflow_config)

        payload = {
            "api_key": self._api_key,
            "workflow_configuration": wf_conf,
            "webrtc_offer": {
                "type": pc.localDescription.type,
                "sdp": pc.localDescription.sdp,
            },
            "webrtc_realtime_processing": self._config.realtime_processing,
            "stream_output": self._config.stream_output,
            "data_output": self._config.data_output,
        }

        # Add WebRTC config if available (auto-fetched or user-provided)
        # Server accepts webrtc_config with iceServers array format
        if turn_config:
            payload["webrtc_config"] = {
                "iceServers": [
                    {
                        "urls": ice_server.urls,
                        "username": ice_server.username,
                        "credential": ice_server.credential,
                    }
                    for ice_server in turn_config.iceServers
                ]
            }

        # Add FPS if provided
        if self._config.declared_fps:
            payload["declared_fps"] = self._config.declared_fps

        # Add serverless-specific parameters
        if self._config.requested_plan is not None:
            payload["requested_plan"] = self._config.requested_plan

        if self._config.requested_region is not None:
            payload["requested_region"] = self._config.requested_region

        if self._config.processing_timeout is not None:
            payload["processing_timeout"] = self._config.processing_timeout

        # Merge source-specific parameters
        # (rtsp_url for RTSP, declared_fps for webcam, stream_output/data_output overrides for VideoFile)
        payload.update(self._source.get_initialization_params(self._config))
        # Check if video is will be sent through datachannel instead of video track
        self._video_through_datachannel = bool(
            self._config.stream_output and not payload.get("stream_output")
        )

        # Call server to initialize worker
        url = f"{self._api_url}/initialise_webrtc_worker"
        headers = {"Content-Type": "application/json"}
        resp = requests.post(url, json=payload, headers=headers, timeout=90)
        resp.raise_for_status()
        ans: Dict[str, Any] = resp.json()

        # Set remote description
        answer = RTCSessionDescription(sdp=ans["sdp"], type=ans["type"])
        await pc.setRemoteDescription(answer)

        # Start video file upload if applicable
        if isinstance(self._source, VideoFileSource):
            asyncio.ensure_future(self._source.start_upload())

        self._pc = pc

Functions¶

del ¶

__del__()

Cleanup if user forgot to close. Not guaranteed to run immediately.

Source code in inference_sdk/webrtc/session.py

def __del__(self) -> None:
    """Cleanup if user forgot to close. Not guaranteed to run immediately."""
    try:
        if self._state == SessionState.STARTED:
            logger.warning(
                "WebRTCSession was not properly closed. "
                "Consider calling session.close() explicitly for immediate cleanup."
            )
            self.close()
    except Exception:
        pass  # Never raise from __del__

enter ¶

__enter__()

Enter context manager - returns self.

Returns:

Name	Type	Description
`WebRTCSession`	`WebRTCSession`	The session instance for use in with statement.

Source code in inference_sdk/webrtc/session.py

def __enter__(self) -> "WebRTCSession":
    """Enter context manager - returns self.

    Returns:
        WebRTCSession: The session instance for use in with statement.
    """
    return self

exit ¶

__exit__(exc_type, exc_val, exc_tb)

Exit context manager - automatically closes the session.

Parameters:

Name	Description	Default
`exc_type`	Exception type if an exception occurred, None otherwise.	required
`exc_val`	Exception value if an exception occurred, None otherwise.	required
`exc_tb`	Exception traceback if an exception occurred, None otherwise.	required

Source code in inference_sdk/webrtc/session.py

def __exit__(self, exc_type, exc_val, exc_tb) -> None:
    """Exit context manager - automatically closes the session.

    Args:
        exc_type: Exception type if an exception occurred, None otherwise.
        exc_val: Exception value if an exception occurred, None otherwise.
        exc_tb: Exception traceback if an exception occurred, None otherwise.
    """
    self.close()

init ¶

__init__(
    api_url,
    api_key,
    source,
    image_input_name,
    workflow_config,
    stream_config,
)

Initialize WebRTC session.

Parameters:

Name	Type	Description	Default
`api_url`	`str`	Inference server API URL	required
`api_key`	`Optional[str]`	API key for authentication	required
`source`	`StreamSource`	Stream source instance	required
`image_input_name`	`str`	Name of image input in workflow	required
`workflow_config`	`dict`	Workflow configuration dict	required
`stream_config`	`StreamConfig`	Stream configuration	required

Source code in inference_sdk/webrtc/session.py

def __init__(
    self,
    api_url: str,
    api_key: Optional[str],
    source: StreamSource,
    image_input_name: str,
    workflow_config: dict,
    stream_config: StreamConfig,
) -> None:
    """Initialize WebRTC session.

    Args:
        api_url: Inference server API URL
        api_key: API key for authentication
        source: Stream source instance
        image_input_name: Name of image input in workflow
        workflow_config: Workflow configuration dict
        stream_config: Stream configuration
    """

    self._state: SessionState = SessionState.NOT_STARTED
    self._state_lock: threading.Lock = threading.Lock()

    self._api_url = api_url.rstrip("/")
    self._api_key = api_key
    self._source = source
    self._image_input_name = image_input_name
    self._workflow_config = workflow_config
    self._config = stream_config

    # Internal state
    self._loop: Optional[asyncio.AbstractEventLoop] = None
    self._loop_thread: Optional[threading.Thread] = None
    self._pc: Optional["RTCPeerConnection"] = None
    self._video_queue: "Queue[Optional[tuple[np.ndarray, VideoMetadata]]]" = Queue(
        maxsize=WEBRTC_VIDEO_QUEUE_MAX_SIZE
    )
    self._video_through_datachannel = False

    # Callback handlers
    self._frame_handlers: List[Callable] = []
    self._data_field_handlers: Dict[str, List[Callable]] = {}
    self._data_global_handler: Optional[Callable] = None

    # Chunk reassembly for binary messages
    self._chunk_reassembler = ChunkReassembler()

    # Public APIs
    self.video = _VideoStream(self, self._video_queue)

close ¶

close()

Close session and cleanup all resources. Idempotent - safe to call multiple times.

This method closes the WebRTC peer connection, releases source resources (webcam, video files, etc.), stops the event loop, and joins the background thread.

It's safe to call this multiple times - subsequent calls are no-ops.

Example

session = client.webrtc.stream(source=source, workflow=workflow) session.run() # Auto-starts and auto-closes on exception session.close() # Explicit cleanup (or let del handle it)

Source code in inference_sdk/webrtc/session.py

def close(self) -> None:
    """Close session and cleanup all resources. Idempotent - safe to call multiple times.

    This method closes the WebRTC peer connection, releases source resources
    (webcam, video files, etc.), stops the event loop, and joins the background thread.

    It's safe to call this multiple times - subsequent calls are no-ops.

    Example:
        session = client.webrtc.stream(source=source, workflow=workflow)
        session.run()  # Auto-starts and auto-closes on exception
        session.close()  # Explicit cleanup (or let __del__ handle it)
    """
    with self._state_lock:
        if self._state == SessionState.CLOSED:
            return  # Already closed, nothing to do
        self._state = SessionState.CLOSED

    # Signal video iterator to stop by putting None sentinel
    try:
        self._video_queue.put_nowait(None)
    except Exception:
        pass  # Queue might be full, but that's okay

    # Cleanup resources (nested finally ensures all cleanup steps execute)
    try:
        # Close peer connection
        if self._loop and self._pc:
            asyncio.run_coroutine_threadsafe(self._pc.close(), self._loop).result()
    finally:
        try:
            # Cleanup source (webcam, video file, etc.)
            if self._loop and self._source:
                asyncio.run_coroutine_threadsafe(
                    self._source.cleanup(), self._loop
                ).result()
        finally:
            # Stop event loop and join thread
            if self._loop:
                self._loop.call_soon_threadsafe(self._loop.stop)
            if self._loop_thread:
                self._loop_thread.join(timeout=WEBRTC_EVENT_LOOP_SHUTDOWN_TIMEOUT)

on_data ¶

on_data(field_name=None)

Decorator to register data channel callback handlers.

Can be used with or without parentheses

@session.on_data # without parentheses (global handler) @session.on_data() # with parentheses (global handler) @session.on_data("field") # with field name (field-specific handler)

Parameters:

Name	Type	Description	Default
`field_name`	`Optional[str]`	If provided, handler receives only that field's value. If None, handler receives entire serialized_output_data dict.	`None`

Returns:

Type	Description
`Callable`	Decorator function or decorated function

Examples:

Global handler without parentheses¶

@session.on_data def handle_all(data: dict, metadata: VideoMetadata): print(f"All data: {data}")

Field-specific handler¶

@session.on_data("predictions") def handle_predictions(data: dict, metadata: VideoMetadata): print(f"Frame {metadata.frame_id}: {data}")

Field-specific handler (no metadata)¶

@session.on_data("predictions") def handle_predictions(data: dict): print(data)

Global handler with parentheses¶

@session.on_data() def handle_all(data: dict, metadata: VideoMetadata): print(f"All data: {data}")

Source code in inference_sdk/webrtc/session.py

def on_data(self, field_name: Optional[str] = None) -> Callable:
    """Decorator to register data channel callback handlers.

    Can be used with or without parentheses:
        @session.on_data          # without parentheses (global handler)
        @session.on_data()        # with parentheses (global handler)
        @session.on_data("field") # with field name (field-specific handler)

    Args:
        field_name: If provided, handler receives only that field's value.
                   If None, handler receives entire serialized_output_data dict.

    Returns:
        Decorator function or decorated function

    Examples:
        # Global handler without parentheses
        @session.on_data
        def handle_all(data: dict, metadata: VideoMetadata):
            print(f"All data: {data}")

        # Field-specific handler
        @session.on_data("predictions")
        def handle_predictions(data: dict, metadata: VideoMetadata):
            print(f"Frame {metadata.frame_id}: {data}")

        # Field-specific handler (no metadata)
        @session.on_data("predictions")
        def handle_predictions(data: dict):
            print(data)

        # Global handler with parentheses
        @session.on_data()
        def handle_all(data: dict, metadata: VideoMetadata):
            print(f"All data: {data}")
    """
    # Check if being used without parentheses: @session.on_data
    # In this case, field_name is actually the function being decorated
    if callable(field_name):
        fn = field_name
        self._data_global_handler = fn
        return fn

    # Being used with parentheses: @session.on_data() or @session.on_data("field")
    def decorator(fn: Callable) -> Callable:
        if field_name is None:
            self._data_global_handler = fn
        else:
            if field_name not in self._data_field_handlers:
                self._data_field_handlers[field_name] = []
            self._data_field_handlers[field_name].append(fn)
        return fn

    return decorator

on_frame ¶

on_frame(callback)

Decorator to register frame callback handlers.

The registered handlers will be called for each video frame received when using the run() method. Handlers must accept two parameters: - frame: BGR numpy array (np.ndarray) - metadata: Video metadata (VideoMetadata) extracted from the video frame

Parameters:

Name	Type	Description	Default
`callback`	`Callable`	Callback function that accepts (frame, metadata)	required

Returns:

Type	Description
`Callable`	The callback itself

Examples:

@session.on_frame def process_frame(frame: np.ndarray, metadata: VideoMetadata): print(f"Frame {metadata.frame_id} - PTS: {metadata.pts}") cv2.imshow("Frame", frame) if cv2.waitKey(1) & 0xFF == ord('q'): session.stop()

Source code in inference_sdk/webrtc/session.py

def on_frame(self, callback: Callable) -> Callable:
    """Decorator to register frame callback handlers.

    The registered handlers will be called for each video frame received
    when using the run() method. Handlers must accept two parameters:
    - frame: BGR numpy array (np.ndarray)
    - metadata: Video metadata (VideoMetadata) extracted from the video frame

    Args:
        callback: Callback function that accepts (frame, metadata)

    Returns:
        The callback itself

    Examples:
        @session.on_frame
        def process_frame(frame: np.ndarray, metadata: VideoMetadata):
            print(f"Frame {metadata.frame_id} - PTS: {metadata.pts}")
            cv2.imshow("Frame", frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                session.stop()
    """
    self._frame_handlers.append(callback)
    return callback

run ¶

run()

Block and process frames until close() is called or stream ends.

This method iterates over incoming video frames and invokes all registered frame handlers for each frame. Automatically starts the session if not already started.

The session automatically closes when this method exits, whether normally or due to an exception, ensuring resources are always cleaned up.

Blocks until either: - close() is called (e.g., from a callback) - The video stream ends naturally - An exception occurs (session auto-closes, exception re-raised) - KeyboardInterrupt (Ctrl+C) is received (session auto-closes)

Data channel handlers are invoked automatically when data arrives, independent of this method.

Example

session = client.webrtc.stream(source=source, workflow=workflow)

@session.on_frame def process(frame, metadata): print(f"Frame {metadata.frame_id} - PTS: {metadata.pts}") cv2.imshow("Frame", frame) if cv2.waitKey(1) & 0xFF == ord('q'): session.close() # Exits run() and cleans up

session.run() # Auto-starts, auto-closes, blocks here

Source code in inference_sdk/webrtc/session.py

def run(self) -> None:
    """Block and process frames until close() is called or stream ends.

    This method iterates over incoming video frames and invokes all
    registered frame handlers for each frame. Automatically starts
    the session if not already started.

    The session automatically closes when this method exits, whether
    normally or due to an exception, ensuring resources are always
    cleaned up.

    Blocks until either:
    - close() is called (e.g., from a callback)
    - The video stream ends naturally
    - An exception occurs (session auto-closes, exception re-raised)
    - KeyboardInterrupt (Ctrl+C) is received (session auto-closes)

    Data channel handlers are invoked automatically when data arrives,
    independent of this method.

    Example:
        session = client.webrtc.stream(source=source, workflow=workflow)

        @session.on_frame
        def process(frame, metadata):
            print(f"Frame {metadata.frame_id} - PTS: {metadata.pts}")
            cv2.imshow("Frame", frame)
            if cv2.waitKey(1) & 0xFF == ord('q'):
                session.close()  # Exits run() and cleans up

        session.run()  # Auto-starts, auto-closes, blocks here
    """
    with self:
        for frame, metadata in self.video():
            # Invoke all registered frame handlers with both parameters
            for handler in self._frame_handlers:
                try:
                    handler(frame, metadata)
                except Exception:
                    logger.warning("Error in frame handler", exc_info=True)

wait ¶

wait(timeout=None)

Wait for session to complete.

Blocks until the video stream ends (None received) or timeout expires. Automatically starts the session if not already started.

Parameters:

Name	Type	Description	Default
`timeout`	`Optional[float]`	Maximum time to wait in seconds (None for indefinite)	`None`

Raises:

Type	Description
`TimeoutError`	If timeout expires before stream ends

Source code in inference_sdk/webrtc/session.py

def wait(self, timeout: Optional[float] = None) -> None:
    """Wait for session to complete.

    Blocks until the video stream ends (None received) or timeout expires.
    Automatically starts the session if not already started.

    Args:
        timeout: Maximum time to wait in seconds (None for indefinite)

    Raises:
        TimeoutError: If timeout expires before stream ends
    """
    self._ensure_started()
    try:
        while True:
            frame_data = self._video_queue.get(timeout=timeout)
            if frame_data is None:
                break
    except queue.Empty:
        if timeout is not None:
            raise TimeoutError(
                f"WebRTC session wait() timed out after {timeout}s.\n"
                "The video stream did not end within the timeout period."
            )

Functions¶

inference_sdk.webrtc.sources ¶

Stream source abstractions for WebRTC SDK.

This module defines the StreamSource interface and concrete implementations for different video streaming sources (webcam, RTSP, video files, manual frames).

Classes¶

MJPEGSource ¶

Bases: StreamSource

Stream source for MJPEG streams.

Source code in inference_sdk/webrtc/sources.py

class MJPEGSource(StreamSource):
    """Stream source for MJPEG streams."""

    def __init__(self, url: str):
        if not url.startswith(("http://", "https://")):
            raise InvalidParameterError(
                f"Invalid MJPEG URL: {url}. Must start with http:// or https://"
            )
        self.url = url

    async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
        pc.addTransceiver("video", direction="recvonly")

    def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
        return {"mjpeg_url": self.url}

ManualSource ¶

Bases: StreamSource

Stream source for manually sent frames.

This source allows the user to programmatically send frames to be processed by the workflow using the send() method.

Source code in inference_sdk/webrtc/sources.py

class ManualSource(StreamSource):
    """Stream source for manually sent frames.

    This source allows the user to programmatically send frames
    to be processed by the workflow using the send() method.
    """

    def __init__(self):
        """Initialize manual source."""
        self._track: Optional[_ManualTrack] = None

    async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
        """Create manual track and add it to the peer connection."""
        # Create special track that accepts programmatic frames
        self._track = _ManualTrack()
        pc.addTrack(self._track)

    def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
        """Return manual mode flag."""
        return {"manual_mode": True}

    def send(self, frame: np.ndarray) -> None:
        """Send a frame to be processed by the workflow.

        Args:
            frame: BGR numpy array (H, W, 3) uint8

        Raises:
            RuntimeError: If session not started
        """
        if not self._track:
            raise RuntimeError("Session not started. Use within 'with' context.")
        self._track.queue_frame(frame)

Functions¶

init ¶

__init__()

Initialize manual source.

Source code in inference_sdk/webrtc/sources.py

def __init__(self):
    """Initialize manual source."""
    self._track: Optional[_ManualTrack] = None

configure_peer_connection `async` ¶

configure_peer_connection(pc)

Create manual track and add it to the peer connection.

Source code in inference_sdk/webrtc/sources.py

async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
    """Create manual track and add it to the peer connection."""
    # Create special track that accepts programmatic frames
    self._track = _ManualTrack()
    pc.addTrack(self._track)

get_initialization_params ¶

get_initialization_params(config)

Return manual mode flag.

Source code in inference_sdk/webrtc/sources.py

def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
    """Return manual mode flag."""
    return {"manual_mode": True}

send ¶

send(frame)

Send a frame to be processed by the workflow.

Parameters:

Name	Type	Description	Default
`frame`	`ndarray`	BGR numpy array (H, W, 3) uint8	required

Raises:

Type	Description
`RuntimeError`	If session not started

Source code in inference_sdk/webrtc/sources.py

def send(self, frame: np.ndarray) -> None:
    """Send a frame to be processed by the workflow.

    Args:
        frame: BGR numpy array (H, W, 3) uint8

    Raises:
        RuntimeError: If session not started
    """
    if not self._track:
        raise RuntimeError("Session not started. Use within 'with' context.")
    self._track.queue_frame(frame)

RTSPSource ¶

Bases: StreamSource

Stream source for RTSP camera streams.

This source doesn't create a local track - instead, the server captures the RTSP stream and sends processed video back to the client.

Source code in inference_sdk/webrtc/sources.py

class RTSPSource(StreamSource):
    """Stream source for RTSP camera streams.

    This source doesn't create a local track - instead, the server
    captures the RTSP stream and sends processed video back to the client.
    """

    def __init__(self, url: str):
        """Initialize RTSP source.

        Args:
            url: RTSP URL (e.g., "rtsp://camera.local/stream")
                Credentials can be included: "rtsp://user:pass@host/stream"
        """
        if not url.startswith(("rtsp://", "rtsps://")):
            raise InvalidParameterError(
                f"Invalid RTSP URL: {url}. Must start with rtsp:// or rtsps://"
            )
        self.url = url

    async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
        """Add receive-only video transceiver (server sends video to us)."""
        # Don't create a local track - we're receiving video from server
        # Add receive-only transceiver
        pc.addTransceiver("video", direction="recvonly")

    def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
        """Return RTSP URL for server to capture."""
        # Server needs to know the RTSP URL to capture
        return {"rtsp_url": self.url}

Functions¶

init ¶

__init__(url)

Initialize RTSP source.

Parameters:

Name	Type	Description	Default
`url`	`str`	RTSP URL (e.g., "rtsp://camera.local/stream") Credentials can be included: "rtsp://user:pass@host/stream"	required

Source code in inference_sdk/webrtc/sources.py

def __init__(self, url: str):
    """Initialize RTSP source.

    Args:
        url: RTSP URL (e.g., "rtsp://camera.local/stream")
            Credentials can be included: "rtsp://user:pass@host/stream"
    """
    if not url.startswith(("rtsp://", "rtsps://")):
        raise InvalidParameterError(
            f"Invalid RTSP URL: {url}. Must start with rtsp:// or rtsps://"
        )
    self.url = url

configure_peer_connection `async` ¶

configure_peer_connection(pc)

Add receive-only video transceiver (server sends video to us).

Source code in inference_sdk/webrtc/sources.py

async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
    """Add receive-only video transceiver (server sends video to us)."""
    # Don't create a local track - we're receiving video from server
    # Add receive-only transceiver
    pc.addTransceiver("video", direction="recvonly")

get_initialization_params ¶

get_initialization_params(config)

Return RTSP URL for server to capture.

Source code in inference_sdk/webrtc/sources.py

def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
    """Return RTSP URL for server to capture."""
    # Server needs to know the RTSP URL to capture
    return {"rtsp_url": self.url}

StreamSource ¶

Bases: ABC

Base interface for all stream sources.

A StreamSource is responsible for: 1. Configuring the RTCPeerConnection (adding tracks or transceivers) 2. Providing initialization parameters for the server 3. Cleaning up resources when done

Source code in inference_sdk/webrtc/sources.py

class StreamSource(ABC):
    """Base interface for all stream sources.

    A StreamSource is responsible for:
    1. Configuring the RTCPeerConnection (adding tracks or transceivers)
    2. Providing initialization parameters for the server
    3. Cleaning up resources when done
    """

    @abstractmethod
    async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
        """Configure the peer connection for this source type.

        This is where the source decides:
        - Whether to add a local track (webcam, video file, manual)
        - Whether to add a receive-only transceiver (RTSP)
        - Any other peer connection configuration

        Args:
            pc: The RTCPeerConnection to configure
        """
        pass

    @abstractmethod
    def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
        """Get parameters to send to server in /initialise_webrtc_worker payload.

        Args:
            config: Stream configuration with stream_output, data_output, etc.

        Returns:
            Dictionary of parameters specific to this source type.
            Examples:
            - RTSP: {"rtsp_url": "rtsp://..."}
            - Video file: {"stream_output": [], "data_output": [...]}
            - Webcam/Manual: {} (empty, no server-side source)
        """
        pass

    async def cleanup(self) -> None:
        """Cleanup resources when session ends.

        Default implementation does nothing. Override if cleanup is needed.
        """
        pass

Functions¶

cleanup `async` ¶

cleanup()

Cleanup resources when session ends.

Default implementation does nothing. Override if cleanup is needed.

Source code in inference_sdk/webrtc/sources.py

async def cleanup(self) -> None:
    """Cleanup resources when session ends.

    Default implementation does nothing. Override if cleanup is needed.
    """
    pass

configure_peer_connection `abstractmethod` `async` ¶

configure_peer_connection(pc)

Configure the peer connection for this source type.

This is where the source decides: - Whether to add a local track (webcam, video file, manual) - Whether to add a receive-only transceiver (RTSP) - Any other peer connection configuration

Parameters:

Name	Type	Description	Default
`pc`	`RTCPeerConnection`	The RTCPeerConnection to configure	required

Source code in inference_sdk/webrtc/sources.py

@abstractmethod
async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
    """Configure the peer connection for this source type.

    This is where the source decides:
    - Whether to add a local track (webcam, video file, manual)
    - Whether to add a receive-only transceiver (RTSP)
    - Any other peer connection configuration

    Args:
        pc: The RTCPeerConnection to configure
    """
    pass

get_initialization_params `abstractmethod` ¶

get_initialization_params(config)

Get parameters to send to server in /initialise_webrtc_worker payload.

Parameters:

Name	Type	Description	Default
`config`	`StreamConfig`	Stream configuration with stream_output, data_output, etc.	required

Returns:

Name	Type	Description
	`Dict[str, Any]`	Dictionary of parameters specific to this source type.
`Examples`	`Dict[str, Any]`
	`Dict[str, Any]`	RTSP: {"rtsp_url": "rtsp://..."}
	`Dict[str, Any]`	Video file: {"stream_output": [], "data_output": [...]}
	`Dict[str, Any]`	Webcam/Manual: {} (empty, no server-side source)

Source code in inference_sdk/webrtc/sources.py

@abstractmethod
def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
    """Get parameters to send to server in /initialise_webrtc_worker payload.

    Args:
        config: Stream configuration with stream_output, data_output, etc.

    Returns:
        Dictionary of parameters specific to this source type.
        Examples:
        - RTSP: {"rtsp_url": "rtsp://..."}
        - Video file: {"stream_output": [], "data_output": [...]}
        - Webcam/Manual: {} (empty, no server-side source)
    """
    pass

VideoFileSource ¶

Bases: StreamSource

Stream source for video files.

Uploads video file via datachannel to the server, which processes it and streams results back. This is more efficient than frame-by-frame streaming for pre-recorded video files.

Supports two output modes: - Datachannel mode (default): Frames received as base64 JSON via datachannel. Higher bandwidth but includes all workflow output data inline. - Video track mode: Frames received via WebRTC video track with hardware- accelerated codec (H.264/VP8). Lower bandwidth, workflow data sent separately.

Source code in inference_sdk/webrtc/sources.py

class VideoFileSource(StreamSource):
    """Stream source for video files.

    Uploads video file via datachannel to the server, which processes it
    and streams results back. This is more efficient than frame-by-frame
    streaming for pre-recorded video files.

    Supports two output modes:
    - Datachannel mode (default): Frames received as base64 JSON via datachannel.
      Higher bandwidth but includes all workflow output data inline.
    - Video track mode: Frames received via WebRTC video track with hardware-
      accelerated codec (H.264/VP8). Lower bandwidth, workflow data sent separately.
    """

    def __init__(
        self,
        path: str,
        on_upload_progress: Optional[UploadProgressCallback] = None,
        use_datachannel_frames: bool = True,
        realtime_processing: bool = False,
    ):
        """Initialize video file source.

        Args:
            path: Path to video file (any format supported by FFmpeg)
            on_upload_progress: Optional callback called during upload with
                (uploaded_chunks, total_chunks). Use to track upload progress.
            use_datachannel_frames: If enabled, frames are received through the
                datachannel. It consumes much more network bandwidth, but it
                provides guaranteed in-order and high quality delivery of the
                frames. If False, frames are received via WebRTC video track
                with hardware-accelerated codec (lower bandwidth).
            realtime_processing: If True, process frames at original video FPS
                (throttled playback for live preview). If False (default),
                process all frames as fast as possible (batch mode).
        """
        self.path = path
        self.on_upload_progress = on_upload_progress
        self.use_datachannel_frames = use_datachannel_frames
        self.realtime_processing = realtime_processing
        self._upload_channel: Optional["RTCDataChannel"] = None
        self._uploader: Optional[VideoFileUploader] = None
        # Note: _upload_started is created lazily in configure_peer_connection()
        # to avoid Python 3.9 issue where asyncio.Event binds to wrong event loop
        self._upload_started: Optional[asyncio.Event] = None

    async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
        """Configure peer connection for video file upload.

        Creates video_upload datachannel for file transfer. In video track mode,
        also adds a receive-only transceiver for processed video output.
        """
        # Create event in the async context to bind to correct event loop (Python 3.9 compat)
        self._upload_started = asyncio.Event()

        # Create upload channel - server will create VideoFileUploadHandler
        self._upload_channel = pc.createDataChannel("video_upload")

        # Add receive-only transceiver for video track output mode (when not using datachannel)
        if not self.use_datachannel_frames:
            pc.addTransceiver("video", direction="recvonly")

        # Setup channel open handler to signal upload can start
        @self._upload_channel.on("open")
        def on_open() -> None:
            self._upload_started.set()

    def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
        """Return params for video file processing mode.

        In datachannel mode (default), merges stream_output into data_output
        so frames are received as base64 via the inference datachannel.
        In video track mode, preserves stream_output for video track rendering.
        """
        params: Dict[str, Any] = {
            "webrtc_realtime_processing": self.realtime_processing,
            "video_file_upload": True,  # Signal to server that video will be uploaded
        }

        if not self.use_datachannel_frames:
            # Video track mode: keep stream_output for video track rendering
            return params

        # Datachannel mode (default): merge stream_output into data_output
        data_output = list(config.data_output or [])
        if config.stream_output:
            for field in config.stream_output:
                if field and field not in data_output:
                    data_output.append(field)

        params["stream_output"] = []  # No video track
        params["data_output"] = data_output  # Receive frames via data channel
        return params

    async def start_upload(self) -> None:
        """Start uploading the video file.

        Called by session after connection is established.
        Uses self.on_upload_progress if provided.
        """
        # Wait for channel to open
        await self._upload_started.wait()

        if not self._upload_channel:
            raise RuntimeError("Upload channel not configured")

        self._uploader = VideoFileUploader(self.path, self._upload_channel)
        await self._uploader.upload(on_progress=self.on_upload_progress)
        # self._upload_complete.set()

    async def cleanup(self) -> None:
        """No cleanup needed - upload channel is managed by peer connection."""
        pass

Functions¶

init ¶

__init__(
    path,
    on_upload_progress=None,
    use_datachannel_frames=True,
    realtime_processing=False,
)

Initialize video file source.

Parameters:

Name	Type	Description	Default
`path`	`str`	Path to video file (any format supported by FFmpeg)	required
`on_upload_progress`	`Optional[UploadProgressCallback]`	Optional callback called during upload with (uploaded_chunks, total_chunks). Use to track upload progress.	`None`
`use_datachannel_frames`	`bool`	If enabled, frames are received through the datachannel. It consumes much more network bandwidth, but it provides guaranteed in-order and high quality delivery of the frames. If False, frames are received via WebRTC video track with hardware-accelerated codec (lower bandwidth).	`True`
`realtime_processing`	`bool`	If True, process frames at original video FPS (throttled playback for live preview). If False (default), process all frames as fast as possible (batch mode).	`False`

Source code in inference_sdk/webrtc/sources.py

def __init__(
    self,
    path: str,
    on_upload_progress: Optional[UploadProgressCallback] = None,
    use_datachannel_frames: bool = True,
    realtime_processing: bool = False,
):
    """Initialize video file source.

    Args:
        path: Path to video file (any format supported by FFmpeg)
        on_upload_progress: Optional callback called during upload with
            (uploaded_chunks, total_chunks). Use to track upload progress.
        use_datachannel_frames: If enabled, frames are received through the
            datachannel. It consumes much more network bandwidth, but it
            provides guaranteed in-order and high quality delivery of the
            frames. If False, frames are received via WebRTC video track
            with hardware-accelerated codec (lower bandwidth).
        realtime_processing: If True, process frames at original video FPS
            (throttled playback for live preview). If False (default),
            process all frames as fast as possible (batch mode).
    """
    self.path = path
    self.on_upload_progress = on_upload_progress
    self.use_datachannel_frames = use_datachannel_frames
    self.realtime_processing = realtime_processing
    self._upload_channel: Optional["RTCDataChannel"] = None
    self._uploader: Optional[VideoFileUploader] = None
    # Note: _upload_started is created lazily in configure_peer_connection()
    # to avoid Python 3.9 issue where asyncio.Event binds to wrong event loop
    self._upload_started: Optional[asyncio.Event] = None

cleanup `async` ¶

cleanup()

No cleanup needed - upload channel is managed by peer connection.

Source code in inference_sdk/webrtc/sources.py

async def cleanup(self) -> None:
    """No cleanup needed - upload channel is managed by peer connection."""
    pass

configure_peer_connection `async` ¶

configure_peer_connection(pc)

Configure peer connection for video file upload.

Creates video_upload datachannel for file transfer. In video track mode, also adds a receive-only transceiver for processed video output.

Source code in inference_sdk/webrtc/sources.py

async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
    """Configure peer connection for video file upload.

    Creates video_upload datachannel for file transfer. In video track mode,
    also adds a receive-only transceiver for processed video output.
    """
    # Create event in the async context to bind to correct event loop (Python 3.9 compat)
    self._upload_started = asyncio.Event()

    # Create upload channel - server will create VideoFileUploadHandler
    self._upload_channel = pc.createDataChannel("video_upload")

    # Add receive-only transceiver for video track output mode (when not using datachannel)
    if not self.use_datachannel_frames:
        pc.addTransceiver("video", direction="recvonly")

    # Setup channel open handler to signal upload can start
    @self._upload_channel.on("open")
    def on_open() -> None:
        self._upload_started.set()

get_initialization_params ¶

get_initialization_params(config)

Return params for video file processing mode.

In datachannel mode (default), merges stream_output into data_output so frames are received as base64 via the inference datachannel. In video track mode, preserves stream_output for video track rendering.

Source code in inference_sdk/webrtc/sources.py

def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
    """Return params for video file processing mode.

    In datachannel mode (default), merges stream_output into data_output
    so frames are received as base64 via the inference datachannel.
    In video track mode, preserves stream_output for video track rendering.
    """
    params: Dict[str, Any] = {
        "webrtc_realtime_processing": self.realtime_processing,
        "video_file_upload": True,  # Signal to server that video will be uploaded
    }

    if not self.use_datachannel_frames:
        # Video track mode: keep stream_output for video track rendering
        return params

    # Datachannel mode (default): merge stream_output into data_output
    data_output = list(config.data_output or [])
    if config.stream_output:
        for field in config.stream_output:
            if field and field not in data_output:
                data_output.append(field)

    params["stream_output"] = []  # No video track
    params["data_output"] = data_output  # Receive frames via data channel
    return params

start_upload `async` ¶

start_upload()

Start uploading the video file.

Called by session after connection is established. Uses self.on_upload_progress if provided.

Source code in inference_sdk/webrtc/sources.py

async def start_upload(self) -> None:
    """Start uploading the video file.

    Called by session after connection is established.
    Uses self.on_upload_progress if provided.
    """
    # Wait for channel to open
    await self._upload_started.wait()

    if not self._upload_channel:
        raise RuntimeError("Upload channel not configured")

    self._uploader = VideoFileUploader(self.path, self._upload_channel)
    await self._uploader.upload(on_progress=self.on_upload_progress)

WebcamSource ¶

Bases: StreamSource

Stream source for local webcam/USB camera.

This source creates a local video track that captures frames from a webcam device using OpenCV and sends them to the server.

Source code in inference_sdk/webrtc/sources.py

class WebcamSource(StreamSource):
    """Stream source for local webcam/USB camera.

    This source creates a local video track that captures frames from
    a webcam device using OpenCV and sends them to the server.
    """

    def __init__(
        self, device_id: int = 0, resolution: Optional[Tuple[int, int]] = None
    ):
        """Initialize webcam source.

        Args:
            device_id: Camera device index (0 for default camera)
            resolution: Optional (width, height) tuple to set camera resolution
        """
        self.device_id = device_id
        self.resolution = resolution
        self._track: Optional[_WebcamVideoTrack] = None
        self._declared_fps: Optional[float] = None

    async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
        """Create webcam video track and add it to the peer connection."""
        # Create local video track that reads from OpenCV
        self._track = _WebcamVideoTrack(self.device_id, self.resolution)

        # Capture FPS for server
        self._declared_fps = self._track.get_declared_fps()

        # Add track to send video
        pc.addTrack(self._track)

    def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
        """Return FPS if available."""
        params: Dict[str, Any] = {}
        if self._declared_fps:
            params["declared_fps"] = self._declared_fps
        return params

    async def cleanup(self) -> None:
        """Release webcam resources."""
        if self._track:
            self._track.release()

Functions¶

init ¶

__init__(device_id=0, resolution=None)

Initialize webcam source.

Parameters:

Name	Type	Description	Default
`device_id`	`int`	Camera device index (0 for default camera)	`0`
`resolution`	`Optional[Tuple[int, int]]`	Optional (width, height) tuple to set camera resolution	`None`

Source code in inference_sdk/webrtc/sources.py

def __init__(
    self, device_id: int = 0, resolution: Optional[Tuple[int, int]] = None
):
    """Initialize webcam source.

    Args:
        device_id: Camera device index (0 for default camera)
        resolution: Optional (width, height) tuple to set camera resolution
    """
    self.device_id = device_id
    self.resolution = resolution
    self._track: Optional[_WebcamVideoTrack] = None
    self._declared_fps: Optional[float] = None

cleanup `async` ¶

cleanup()

Release webcam resources.

Source code in inference_sdk/webrtc/sources.py

async def cleanup(self) -> None:
    """Release webcam resources."""
    if self._track:
        self._track.release()

configure_peer_connection `async` ¶

configure_peer_connection(pc)

Create webcam video track and add it to the peer connection.

Source code in inference_sdk/webrtc/sources.py

async def configure_peer_connection(self, pc: RTCPeerConnection) -> None:
    """Create webcam video track and add it to the peer connection."""
    # Create local video track that reads from OpenCV
    self._track = _WebcamVideoTrack(self.device_id, self.resolution)

    # Capture FPS for server
    self._declared_fps = self._track.get_declared_fps()

    # Add track to send video
    pc.addTrack(self._track)

get_initialization_params ¶

get_initialization_params(config)

Return FPS if available.

Source code in inference_sdk/webrtc/sources.py

def get_initialization_params(self, config: "StreamConfig") -> Dict[str, Any]:
    """Return FPS if available."""
    params: Dict[str, Any] = {}
    if self._declared_fps:
        params["declared_fps"] = self._declared_fps
    return params

inference_sdk API Reference¶

Top-level¶

inference_sdk.config ¶

Classes¶

InferenceSDKDeprecationWarning ¶

RemoteProcessingTimeCollector ¶

Functions¶

drain ¶

summarize ¶

Functions¶

http¶

inference_sdk.http.client ¶

Classes¶

InferenceHTTPClient ¶

Attributes¶

client_mode property ¶

inference_configuration property ¶

selected_model property ¶

webrtc property ¶

Functions¶

__init__ ¶

clip_compare ¶

clip_compare_async async ¶

configure ¶

consume_inference_pipeline_result ¶

depth_estimation ¶

depth_estimation_async async ¶

detect_gazes ¶

detect_gazes_async async ¶

get_clip_image_embeddings ¶

get_clip_image_embeddings_async async ¶

get_clip_text_embeddings ¶

get_clip_text_embeddings_async async ¶

get_inference_pipeline_status ¶

get_model_description ¶

get_model_description_async async ¶

get_perception_encoder_image_embeddings ¶

get_perception_encoder_text_embeddings ¶

get_server_info ¶

infer ¶

infer_async async ¶

infer_from_api_v0 ¶

infer_from_api_v0_async async ¶

infer_from_workflow ¶

infer_from_yolo_world ¶

infer_from_yolo_world_async async ¶

infer_lmm ¶

infer_lmm_async async ¶

infer_on_stream ¶

init classmethod ¶

list_inference_pipelines ¶

list_loaded_models ¶

list_loaded_models_async async ¶

load_model ¶

load_model_async async ¶

ocr_image ¶

ocr_image_async async ¶

pause_inference_pipeline ¶

resume_inference_pipeline ¶

run_workflow ¶

sam2_segment_image ¶

sam2_segment_image_async async ¶

sam3_3d_infer ¶

sam3_3d_infer_async async ¶

sam3_concept_segment ¶

sam3_concept_segment_async async ¶

sam3_embed_image ¶

sam3_embed_image_async async ¶

sam3_visual_segment ¶

sam3_visual_segment_async async ¶

select_api_v0 ¶

select_api_v1 ¶

select_model ¶

start_inference_pipeline_with_workflow ¶

terminate_inference_pipeline ¶

unload_model ¶

use_api_v0 ¶

use_api_v1 ¶

use_configuration ¶

use_model ¶

`inference_sdk` API Reference¶

`http`¶

client_mode `property` ¶

inference_configuration `property` ¶

selected_model `property` ¶

webrtc `property` ¶

init ¶

clip_compare_async `async` ¶

depth_estimation_async `async` ¶

detect_gazes_async `async` ¶

get_clip_image_embeddings_async `async` ¶

get_clip_text_embeddings_async `async` ¶

get_model_description_async `async` ¶

infer_async `async` ¶

infer_from_api_v0_async `async` ¶

infer_from_yolo_world_async `async` ¶

infer_lmm_async `async` ¶

init `classmethod` ¶

list_loaded_models_async `async` ¶

load_model_async `async` ¶

ocr_image_async `async` ¶

sam2_segment_image_async `async` ¶

sam3_3d_infer_async `async` ¶

sam3_concept_segment_async `async` ¶

sam3_embed_image_async `async` ¶

sam3_visual_segment_async `async` ¶

InferenceConfiguration `dataclass` ¶

ModelDescription `dataclass` ¶

RegisteredModels `dataclass` ¶

ServerInfo `dataclass` ¶

api_message `property` ¶

description `property` ¶

status_code `property` ¶

`http/utils`¶

execute_requests_packages_async `async` ¶

make_parallel_requests_async `async` ¶

make_request_async `async` ¶