Skip to content

Model Management

Model Weights Download

When using a self-hosted Inference server, you can pre-load models to download and cache weights before running inference:

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="YOUR_ROBOFLOW_API_KEY"
)

# Pre-load the model (downloads weights to server cache)
client.load_model(model_id="rfdetr-base")

Alternatively, running a first inference will trigger the download automatically.

For workflows, you should also pre-load all models used in the workflow and run the workflow once to cache its definition.

You can verify which models are loaded on the server:

loaded_models = client.list_loaded_models()
print(f"Loaded models: {loaded_models}")

Read more about weights caching, persistent storage, and Docker configuration.

Methods to control inference server

Getting server info

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_server_info()

Listing loaded models

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.list_loaded_models()

Async equivalent: list_loaded_models_async()

Getting specific model description

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_model_description(model_id="some/1", allow_loading=True)

If allow_loading is set to True: model will be loaded as side-effect if it is not already loaded. Default: True.

Async equivalent: get_model_description_async()

Loading model

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.load_model(model_id="some/1", set_as_default=True)

The pointed model will be loaded. If set_as_default is set to True: after successful load, model will be used as default model for the client. Default value: False.

Async equivalent: load_model_async()

Unloading model

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_model(model_id="some/1")

Sometimes (to avoid OOM at server side) - unloading model will be required.

Async equivalent: unload_model_async()

Unloading all models

from inference_sdk import InferenceHTTPClient

# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
    api_url="http://localhost:9001",
    api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_all_models()

Async equivalent: unload_all_models_async()