Model Management¶
Model Weights Download¶
When using a self-hosted Inference server, you can pre-load models to download and cache weights before running inference:
from inference_sdk import InferenceHTTPClient
client = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="YOUR_ROBOFLOW_API_KEY"
)
# Pre-load the model (downloads weights to server cache)
client.load_model(model_id="rfdetr-base")
Alternatively, running a first inference will trigger the download automatically.
For workflows, you should also pre-load all models used in the workflow and run the workflow once to cache its definition.
You can verify which models are loaded on the server:
loaded_models = client.list_loaded_models()
print(f"Loaded models: {loaded_models}")
Read more about weights caching, persistent storage, and Docker configuration.
Methods to control inference server¶
Getting server info¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_server_info()
Listing loaded models¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.list_loaded_models()
Async equivalent: list_loaded_models_async()
Getting specific model description¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.get_model_description(model_id="some/1", allow_loading=True)
If allow_loading is set to True: model will be loaded as side-effect if it is not already loaded.
Default: True.
Async equivalent: get_model_description_async()
Loading model¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.load_model(model_id="some/1", set_as_default=True)
The pointed model will be loaded. If set_as_default is set to True: after successful load, model
will be used as default model for the client. Default value: False.
Async equivalent: load_model_async()
Unloading model¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_model(model_id="some/1")
Sometimes (to avoid OOM at server side) - unloading model will be required.
Async equivalent: unload_model_async()
Unloading all models¶
from inference_sdk import InferenceHTTPClient
# Replace ROBOFLOW_API_KEY with your Roboflow API Key
CLIENT = InferenceHTTPClient(
api_url="http://localhost:9001",
api_key="ROBOFLOW_API_KEY"
)
CLIENT.unload_all_models()
Async equivalent: unload_all_models_async()