Skip to content

Environment VariablesΒΆ

Inference behavior can be controlled by set of environmental variables. All environmental variables are listed in inference/core/env.py

Below is a list of some environmental values that require more in-depth explanation.

Environmental variable Description Default
ONNXRUNTIME_EXECUTION_PROVIDERS List of execution providers in priority order, warning message will be displayed if provider is not supported on user platform See here
SAM2_MAX_EMBEDDING_CACHE_SIZE The number of sam2 embeddings that will be held in memory. The embeddings will be held in gpu memory. Each embedding takes 16777216 bytes. 100
SAM2_MAX_LOGITS_CACHE_SIZE The number of sam2 logits that will be held in memory. The the logits will be in cpu memory. Each logit takes 262144 bytes. 1000
DISABLE_SAM2_LOGITS_CACHE If set to True, disables the caching of SAM2 logits. This can be useful for debugging or in scenarios where memory usage needs to be minimized, but may result in slower performance for repeated similar requests. False
ENABLE_WORKFLOWS_PROFILING If set to True, in inference server allows the server to output Workflows profiler traces the client, running in Python package with InferencePipeline it enables profiling. False
WORKFLOWS_PROFILER_BUFFER_SIZE Size of profiler buffer (number of consecutive Wrofklows Execution Engine run(...) invocations to trace in buffer. 64
RUNS_ON_JETSON Boolean flag to tell if inference runs on Jetson device - set to True in all docker builds for Jetson architecture. False
WORKFLOWS_DEFINITION_CACHE_EXPIRY Number of seconds to cache Workflows definitions as a result of get_workflow_specification(...) function call 15 * 60 - 15 minutes
DOCKER_SOCKET_PATH Path to the local socket mounted to the container - by default empty, if provided - enables pooling docker container stats from the docker deamon socket. See more here Not Set
ENABLE_PROMETHEUS Boolean flag to enable Prometeus /metrics enpoint. True for docker images in dockerhub
ENABLE_STREAM_API Flag to enable Stream Management API in inference server - see more. False
STREAM_API_PRELOADED_PROCESSES In context of Stream API - this environment variable controlls how many idle processes are warmed-up ready to be a worker for InferencePipeline - helps speeding up workers processes start on GPU 0
TRANSIENT_ROBOFLOW_API_ERRORS List of (comma separated) HTTP codes from RF API that should be retried (only applicable to GET endpoints) None
RETRY_CONNECTION_ERRORS_TO_ROBOFLOW_API Fleg to decide if connection errors for RF API should be retried (only applicable to GET endpoints) False
ROBOFLOW_API_REQUEST_TIMEOUT Timeout (in seconds given as integer) for requests to RF API None
TRANSIENT_ROBOFLOW_API_ERRORS_RETRIES Number of times transient errors (connection errors and HTTP transient codes) to RF API will be retried (only applicable to GET endpoints) 3
TRANSIENT_ROBOFLOW_API_ERRORS_RETRY_INTERVAL Delay interval of retries (for connection errors and HTTP transient codes) of RF API requests (only applicable to GET endpoints) 3
METRICS_ENABLED Flag to control Roboflow Model Monitoring True
MODEL_VALIDATION_DISABLED Flag that can make model loading faster by skipping trial inference False
DISABLE_VERSION_CHECK Flag to disable inference version chack in background thread False