InferenceHTTPClient
¶
InferenceHTTPClient
was created to make it easy for users to consume HTTP API exposed by inference
server. You
can think of it, as a friendly wrapper over requests
that you can use, instead of creating calling logic on
your own.
🔥 quickstart¶
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
print(predictions)
What are the client capabilities?¶
- Executing inference for models hosted at Roboflow platform (use client version
v0
) - Executing inference for models hosted in local (or on-prem) docker images with
inference
HTTP API - Works against single image (given as a local path, URL,
np.ndarray
orPIL.Image
) - Minimalistic batch inference implemented (you can pass multiple images)
- Implemented inference from video file and directory with images
Why client has two modes - v0
and v1
?¶
We are constantly improving our infrence
package - initial version (v0
) is compatible with
models deployed at Roboflow platform (task types: classification
, object-detection
, instance-segmentation
and
keypoints-detection
)
are supported. Version v1
is available in locally hosted Docker images with HTTP API.
Locally hosted inference
server exposes endpoints for model manipulations, but those endpoints are not available
at the moment for models deployed at Roboflow platform.
api_url
parameter passed to InferenceHTTPClient
will decide on default client mode - URLs with *.roboflow.com
will be defaulted to version v0
.
Usage of model registry control methods with v0
clients will raise WrongClientModeError
.
How I can adjust InferenceHTTPClient
to work in my use-case?¶
There are few ways on how configuration can be altered:
configuring with context managers¶
Methods use_configuration(...)
, use_api_v0(...)
, use_api_v1(...)
, use_model(...)
are designed to
work in context managers. Once context manager is left - old config values are restored.
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
with CLIENT.use_api_v0():
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
with CLIENT.use_configuration(custom_configuration):
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
with CLIENT.use_model("soccer-players-5fuqs/1"):
_ = CLIENT.infer(image_url)
# after leaving context manager - changes are reverted and `model_id` is still required
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
As you can see - model_id
is required to be given for prediction method only when default model is not configured.
Setting the configuration once and using till next change¶
Methods configure(...)
, select_api_v0(...)
, select_api_v1(...)
, select_model(...)
are designed alter the client
state and will be preserved until next change.
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
custom_configuration = InferenceConfiguration(confidence_threshold=0.8)
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.select_api_v0()
_ = CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
# API v0 still holds
CLIENT.configure(custom_configuration)
CLIENT.infer(image_url, model_id="soccer-players-5fuqs/1")
# API v0 and custom configuration still holds
CLIENT.select_model(model_id="soccer-players-5fuqs/1")
_ = CLIENT.infer(image_url)
# API v0, custom configuration and selected model - still holds
_ = CLIENT.infer(image_url)
One may also initialise in chain
mode:
from inference_sdk import InferenceHTTPClient, InferenceConfiguration
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
.select_api_v0() \
.select_model("soccer-players-5fuqs/1")
Overriding model_id
for specific call¶
model_id
can be overriden for specific call
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(api_url="http://localhost:9001", api_key="ROBOFLOW_API_KEY") \
.select_model("soccer-players-5fuqs/1")
_ = CLIENT.infer(image_url, model_id="another-model/1")
Batch inference¶
You may want to predict against multiple images at single call. It is possible, but so far - client-side batching is implemented in naive way (sequential requests to API) - stay tuned for future improvements.
from inference_sdk import InferenceHTTPClient
image_url = "https://source.roboflow.com/pwYAXv9BTpqLyFfgQoPZ/u48G0UpWfk8giSw7wrU8/original.jpg"
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
predictions = CLIENT.infer([image_url] * 5, model_id="soccer-players-5fuqs/1")
print(predictions)
Inference against stream¶
One may want to infer against video or directory of images - and that modes are supported in inference-client
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
for frame_id, frame, prediction in CLIENT.infer_on_stream("video.mp4", model_id="soccer-players-5fuqs/1"):
# frame_id is the number of frame
# frame - np.ndarray with video frame
# prediction - prediction from the model
pass
for file_path, image, prediction in CLIENT.infer_on_stream("local/dir/", model_id="soccer-players-5fuqs/1"):
# file_path - path to the image
# frame - np.ndarray with video frame
# prediction - prediction from the model
pass
What is actually returned as prediction?¶
inference_client
returns plain Python dictionaries that are responses from model serving API. Modification
is done only in context of visualization
key that keep server-generated prediction visualisation (it
can be transcoded to the format of choice) and in terms of client-side re-scaling.
Methods to control inference
server (in v1
mode only)¶
Getting server info¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_server_info()
Listing loaded models¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.list_loaded_models()
Getting specific model description¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_model_description(model_id="some/1", allow_loading=True)
If allow_loading
is set to True
- model will be loaded as side-effect if it is not already loaded.
Default: True
.
Loading model¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.load_model(model_id="some/1", set_as_default=True)
The pointed model will be loaded. If set_as_default
is set to True
- after successful load, model
will be used as default model for the client. Default value: False
.
Unloading model¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_model(model_id="some/1")
Sometimes (to avoid OOM at server side) - unloading model will be required. test_postprocessing.py
Unloading all models¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_all_models()
Details about client configuration¶
inference-client
provides InferenceConfiguration
dataclass to hold whole configuration.
from inference_sdk import InferenceConfiguration
Overriding fields in this config changes the behaviour of client (and API serving model). Specific fields are used in specific contexts. In particular:
Inference in v0
mode¶
The following fields are passed to API
* confidence_threshold
(as confidence
) - to alter model thresholding
* keypoint_confidence_threshold
as (keypoint_confidence
) - to filter out detected keypoints
based on model confidence
* format
- to visualise on server side - use image
(but then you loose prediction details from response)
* visualize_labels
(as labels
) - used in visualisation to show / hide labels for classes
* mask_decode_mode
* tradeoff_factor
* max_detections
- max detections to return from model
* iou_threshold
(as overlap
) - to dictate NMS IoU threshold
* stroke_width
- width of stroke in visualisation
* count_inference
as countinference
* service_secret
* disable_preproc_auto_orientation
, disable_preproc_contrast
, disable_preproc_grayscale
,
disable_preproc_static_crop
to alter server-side pre-processing
Classification model in v1
mode:¶
visualize_predictions
- flag to enable / disable visualisationconfidence_threshold
asconfidence
stroke_width
- width of stroke in visualisationdisable_preproc_auto_orientation
,disable_preproc_contrast
,disable_preproc_grayscale
,disable_preproc_static_crop
to alter server-side pre-processing
Object detection model in v1
mode:¶
visualize_predictions
- flag to enable / disable visualisationvisualize_labels
- flag to enable / disable labels visualisation if visualisation is enabledconfidence_threshold
asconfidence
class_filter
to filter out list of classesclass_agnostic_nms
- flag to control whether NMS is class-agnosticfix_batch_size
iou_threshold
- to dictate NMS IoU thresholdstroke_width
- width of stroke in visualisationmax_detections
- max detections to return from modelmax_candidates
- max candidates to post-processing from modeldisable_preproc_auto_orientation
,disable_preproc_contrast
,disable_preproc_grayscale
,disable_preproc_static_crop
to alter server-side pre-processing
Keypoints detection model in v1
mode:¶
visualize_predictions
- flag to enable / disable visualisationvisualize_labels
- flag to enable / disable labels visualisation if visualisation is enabledconfidence_threshold
asconfidence
keypoint_confidence_threshold
as (keypoint_confidence
) - to filter out detected keypoints based on model confidenceclass_filter
to filter out list of object classesclass_agnostic_nms
- flag to control whether NMS is class-agnosticfix_batch_size
iou_threshold
- to dictate NMS IoU thresholdstroke_width
- width of stroke in visualisationmax_detections
- max detections to return from modelmax_candidates
- max candidates to post-processing from modeldisable_preproc_auto_orientation
,disable_preproc_contrast
,disable_preproc_grayscale
,disable_preproc_static_crop
to alter server-side pre-processing
Instance segmentation model in v1
mode:¶
visualize_predictions
- flag to enable / disable visualisationvisualize_labels
- flag to enable / disable labels visualisation if visualisation is enabledconfidence_threshold
asconfidence
class_filter
to filter out list of classesclass_agnostic_nms
- flag to control whether NMS is class-agnosticfix_batch_size
iou_threshold
- to dictate NMS IoU thresholdstroke_width
- width of stroke in visualisationmax_detections
- max detections to return from modelmax_candidates
- max candidates to post-processing from modeldisable_preproc_auto_orientation
,disable_preproc_contrast
,disable_preproc_grayscale
,disable_preproc_static_crop
to alter server-side pre-processingmask_decode_mode
tradeoff_factor
Configuration of client¶
output_visualisation_format
- one of (VisualisationResponseFormat.BASE64
,VisualisationResponseFormat.NUMPY
,VisualisationResponseFormat.PILLOW
) - given that server-side visualisation is enabled - one may choose what format should be used in outputimage_extensions_for_directory_scan
- while usingCLIENT.infer_on_stream(...)
with local directory this parameter controls type of files (extensions) allowed to be processed - default:["jpg", "jpeg", "JPG", "JPEG", "png", "PNG"]
client_downsizing_disabled
- set toTrue
if you want to avoid client-side downsizing - defaultFalse
. Client-side scaling is only supposed to down-scale (keeping aspect-ratio) the input for inference - to utilise internet connection more efficiently (but for the price of images manipulation / transcoding). If model registry endpoint is available (modev1
) - model input size information will be used, if not:default_max_input_size
will be in use.