Environment Variables¶

Inference behavior can be controlled by set of environmental variables. All environmental variables are listed in inference/core/env.py

Below is a list of some environmental values that require more in-depth explanation.

Environmental variable	Description	Default
`ONNXRUNTIME_EXECUTION_PROVIDERS`	List of execution providers in priority order, warning message will be displayed if provider is not supported on user platform	See here
`SAM2_MAX_EMBEDDING_CACHE_SIZE`	The number of sam2 embeddings that will be held in memory. The embeddings will be held in gpu memory. Each embedding takes 16777216 bytes.	100
`SAM2_MAX_LOGITS_CACHE_SIZE`	The number of sam2 logits that will be held in memory. The the logits will be in cpu memory. Each logit takes 262144 bytes.	1000
`DISABLE_SAM2_LOGITS_CACHE`	If set to True, disables the caching of SAM2 logits. This can be useful for debugging or in scenarios where memory usage needs to be minimized, but may result in slower performance for repeated similar requests.	False
`ENABLE_WORKFLOWS_PROFILING`	If set to True, in `inference` server allows the server to output Workflows profiler traces the client, running in Python package with `InferencePipeline` it enables profiling.	False
`WORKFLOWS_PROFILER_BUFFER_SIZE`	Size of profiler buffer (number of consecutive Wrofklows Execution Engine `run(...)` invocations to trace in buffer.	64
`RUNS_ON_JETSON`	Boolean flag to tell if `inference` runs on Jetson device - set to `True` in all docker builds for Jetson architecture.	False
`WORKFLOWS_DEFINITION_CACHE_EXPIRY`	Number of seconds to cache Workflows definitions as a result of `get_workflow_specification(...)` function call	`15 * 60` - 15 minutes
`DOCKER_SOCKET_PATH`	Path to the local socket mounted to the container - by default empty, if provided - enables pooling docker container stats from the docker deamon socket. See more here	Not Set
`ENABLE_PROMETHEUS`	Boolean flag to enable Prometheus `/metrics` enpoint.	True for docker images in dockerhub
`ENABLE_STREAM_API`	Flag to enable Stream Management API in `inference` server - see more.	False
`STREAM_API_PRELOADED_PROCESSES`	In context of Stream API - this environment variable controlls how many idle processes are warmed-up ready to be a worker for `InferencePipeline` - helps speeding up workers processes start on GPU	0
`TRANSIENT_ROBOFLOW_API_ERRORS`	List of (comma separated) HTTP codes from RF API that should be retried (only applicable to GET endpoints)	`None`
`RETRY_CONNECTION_ERRORS_TO_ROBOFLOW_API`	Fleg to decide if connection errors for RF API should be retried (only applicable to GET endpoints)	`False`
`ROBOFLOW_API_REQUEST_TIMEOUT`	Timeout (in seconds given as integer) for requests to RF API	`None`
`TRANSIENT_ROBOFLOW_API_ERRORS_RETRIES`	Number of times transient errors (connection errors and HTTP transient codes) to RF API will be retried (only applicable to GET endpoints)	`3`
`TRANSIENT_ROBOFLOW_API_ERRORS_RETRY_INTERVAL`	Delay interval of retries (for connection errors and HTTP transient codes) of RF API requests (only applicable to GET endpoints)	`3`
`METRICS_ENABLED`	Flag to control Roboflow Model Monitoring	`True`
`MODEL_MONITORING_CACHE_BACKEND`	Cache backend for inference model-monitoring pingback data. Use `default` to follow the normal cache selection (`REDIS_HOST` when configured, otherwise memory), or `memory` to force process-local buffering and keep Redis off the inference hot path.	`default`
`MODEL_VALIDATION_DISABLED`	Flag that can make model loading faster by skipping trial inference	`False`
`DISABLE_VERSION_CHECK`	Flag to disable `inference` version chack in background thread	`False`
`USE_INFERENCE_MODELS`	Flag to select `inference-models` backend	`False`
`MAX_INFERENCE_MODELS_CACHE_SIZE_MB`	Variable to enable `inference-models` cache watchdog to run scheduled cycles of verification for disk space occupied by models - when set with value `>0` - watchdog prunes model artefacts from oldest and biggest - to prevent system running out of space over time. Only applicable when `USE_INFERENCE_MODELS=True`	`-1`
`INFERENCE_MODELS_CACHE_WATCHDOG_INTERVAL_MINUTES`	Variable to controll frequency of `inference-models` cache watchdog cycles - min is 15 mins.	`60`
`ENABLE_HTTPS`	Toggles HTTPS for the inference server. When `True` the server reads `SSL_CERTFILE` / `SSL_KEYFILE` and serves traffic over TLS. See HTTPS configuration.	`False`
`SSL_CERTFILE`	Path to a PEM-encoded TLS certificate served when `ENABLE_HTTPS=True`. Defaults to a sane mount point so customers only need to bind their cert to `/etc/inference/certs/server.crt`.	`/etc/inference/certs/server.crt`
`SSL_KEYFILE`	Path to the PEM-encoded TLS private key paired with `SSL_CERTFILE`. Defaults to `/etc/inference/certs/server.key` so a single bind mount is enough to enable HTTPS.	`/etc/inference/certs/server.key`
`SSL_KEYFILE_PASSWORD`	Optional passphrase used to decrypt `SSL_KEYFILE` when the private key is encrypted.	Not set
`SSL_CA_CERTS`	Optional path to a CA bundle used when client certificate verification (mTLS) is required.	Not set