Install on NVIDIA Jetson¶

Overview¶

Jetson is NVIDIA’s line of compact, power-efficient modules designed to run AI and deep learning workloads at the edge. They combine a GPU, CPU, and neural accelerators on a single board, making them ideal for robotics, drones, smart cameras, and other embedded applications where you need real-time computer vision or inference without a cloud connection. For more details, see NVIDIA’s official Jetson overview:
https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/

Prerequisites¶

Disk Space: Allocate at least 10 GB free for the Roboflow Jetson image (8.14 GB).

JetPack Version: Must be running a supported JetPack (4.5, 4.6, 5.x, or 6.x).

Recommended Hardware: For best performance while running Inference, we recommend an NVIDIA Orin NX 16 GB or above.

Docker & NVIDIA Container Toolkit
- Requires Docker + NVIDIA runtime so containers can access the GPU.
- Instead of detailing installation here, follow these instructions:
- Docker install:
https://docs.docker.com/engine/install/ubuntu/
- NVIDIA Container Toolkit:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

We have specialized containers built with support for hardware acceleration on JetPack L4T. To automatically detect your JetPack version and use the right container with good default settings run:

pip install inference-cli
inference server start

Manually Starting the Container¶

If you want more control of the container settings you can also start it manually. Jetson devices with NVIDIA Jetpack are pre-configured with NVIDIA Container Runtime and will be hardware accelerated out of the box:

Jetpack 6Jetpack 5Jetpack 4.6Jetpack 4.5

sudo docker run -d \
    --name inference-server \
    --runtime nvidia \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    roboflow/roboflow-inference-server-jetson-6.0.0:latest

sudo docker run -d \
    --name inference-server \
    --runtime nvidia \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    roboflow/roboflow-inference-server-jetson-5.1.1:latest

Warning

Jetpack 4 is deprecated and will not receive future updates. Please migrate to Jetpack 6.

sudo docker run -d \
    --name inference-server \
    --runtime nvidia \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    CPUExecutionProvider]" \
    roboflow/roboflow-inference-server-jetson-4.6.1:latest

Warning

Jetpack 4 is deprecated and will not receive future updates. Please migrate to Jetpack 6.

sudo docker run -d \
    --name inference-server \
    --runtime nvidia \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    CPUExecutionProvider]" \
    roboflow/roboflow-inference-server-jetson-4.5.0:latest

TensorRT¶

You can optionally enable TensorRT, NVIDIA's model optimization runtime that will greatly increase your models' speed at the expense of a heavy compilation and optimization step (sometimes 15+ minutes) the first time you load each model.

Enable TensorRT by adding TensorrtExecutionProvider to the ONNXRUNTIME_EXECUTION_PROVIDERS environment variable.

Jetpack 6Jetpack 5Jetpack 4.6Jetpack 4.5

sudo docker run -d \
    --name inference-server \
    --runtime nvidia \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    -e ONNXRUNTIME_EXECUTION_PROVIDERS="[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]" \
    roboflow/roboflow-inference-server-jetson-6.0.0:latest

sudo docker run -d \
    --name inference-server \
    --runtime nvidia \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    -e ONNXRUNTIME_EXECUTION_PROVIDERS="[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]" \
    roboflow/roboflow-inference-server-jetson-5.1.1:latest

Warning

Jetpack 4 is deprecated and will not receive future updates. Please migrate to Jetpack 6.

sudo docker run -d \
    --name inference-server \
    --runtime nvidia \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    -e ONNXRUNTIME_EXECUTION_PROVIDERS="[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]" \
    roboflow/roboflow-inference-server-jetson-4.6.1:latest

Warning

Jetpack 4 is deprecated and will not receive future updates. Please migrate to Jetpack 6.

sudo docker run -d \
    --name inference-server \
    --runtime nvidia \
    --read-only \
    -p 9001:9001 \
    --volume ~/.inference/cache:/tmp:rw \
    --security-opt="no-new-privileges" \
    --cap-drop="ALL" \
    --cap-add="NET_BIND_SERVICE" \
    -e ONNXRUNTIME_EXECUTION_PROVIDERS="[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]" \
    roboflow/roboflow-inference-server-jetson-4.5.0:latest

Docker Compose¶

If you are using Docker Compose for your application, the equivalent yaml is:

Jetpack 6Jetpack 5Jetpack 4.6Jetpack 4.5

version: "3.9"

services:
  inference-server:
    container_name: inference-server
    image: roboflow/roboflow-inference-server-jetson-6.0.0:latest

    read_only: true
    ports:
      - "9001:9001"

    volumes:
      - "${HOME}/.inference/cache:/tmp:rw"

    runtime: nvidia

    # Optionally: uncomment the following lines to enable TensorRT:
    # environment:
    #   ONNXRUNTIME_EXECUTION_PROVIDERS: "[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]"

    security_opt:
      - no-new-privileges
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

version: "3.9"

services:
  inference-server:
    container_name: inference-server
    image: roboflow/roboflow-inference-server-jetson-5.1.1:latest

    read_only: true
    ports:
      - "9001:9001"

    volumes:
      - "${HOME}/.inference/cache:/tmp:rw"

    runtime: nvidia

    # Optionally: uncomment the following lines to enable TensorRT:
    # environment:
    #   ONNXRUNTIME_EXECUTION_PROVIDERS: "[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]"

    security_opt:
      - no-new-privileges
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Warning

Jetpack 4 is deprecated and will not receive future updates. Please migrate to Jetpack 6.

version: "3.9"

services:
  inference-server:
    container_name: inference-server
    image: roboflow/roboflow-inference-server-jetson-4.6.1:latest

    read_only: true
    ports:
      - "9001:9001"

    volumes:
      - "${HOME}/.inference/cache:/tmp:rw"

    runtime: nvidia

    # Optionally: uncomment the following lines to enable TensorRT:
    # environment:
    #   ONNXRUNTIME_EXECUTION_PROVIDERS: "[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]"

    security_opt:
      - no-new-privileges
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Warning

Jetpack 4 is deprecated and will not receive future updates. Please migrate to Jetpack 6.

version: "3.9"

services:
  inference-server:
    container_name: inference-server
    image: roboflow/roboflow-inference-server-jetson-4.5.0:latest

    read_only: true
    ports:
      - "9001:9001"

    volumes:
      - "${HOME}/.inference/cache:/tmp:rw"

    runtime: nvidia

    # Optionally: uncomment the following lines to enable TensorRT:
    # environment:
    #   ONNXRUNTIME_EXECUTION_PROVIDERS: "[TensorrtExecutionProvider,CUDAExecutionProvider,CPUExecutionProvider]"

    security_opt:
      - no-new-privileges
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE

Using Your New Server¶

Once you have a server running, you can access it via its API or using the Python SDK. You can also use it to build Workflows using the Roboflow Platform UI.

Python SDKNode.jsHTTP / cURL

Install the SDK¶

pip install inference-sdk

Run a workflow¶

This code runs an example model comparison Workflow on an Inference Server running on your local machine:

from inference_sdk import InferenceHTTPClient

client = InferenceHTTPClient(
    api_url="http://localhost:9001", # use local inference server
    # api_key="<YOUR API KEY>" # optional to access your private data and models
)

result = client.run_workflow(
    workspace_name="roboflow-docs",
    workflow_id="model-comparison",
    images={
        "image": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
    },
    parameters={
        "model1": "yolov8n-640",
        "model2": "yolov11n-640"
    }
)

print(result)

From a JavaScript app, hit your new server with an HTTP request.

const response = await fetch('http://localhost:9001/infer/workflows/roboflow-docs/model-comparison', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json'
    },
    body: JSON.stringify({
        // api_key: "<YOUR API KEY>" // optional to access your private data and models
        inputs: {
            "image": {
                "type": "url",
                "value": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
            },
            "model1": "yolov8n-640",
            "model2": "yolov11n-640"
        }
    })
});

const result = await response.json();
console.log(result);

Warning

Be careful not to expose your API Key to external users (in other words: don't use this snippet in a public-facing front-end app).

Using the server's API you can access it from any other client application. From the command line using cURL:

curl -X POST "http://localhost:9001/infer/workflows/roboflow-docs/model-comparison" \
-H "Content-Type: application/json" \
-d '{
    "api_key": "<YOUR API KEY -- REMOVE THIS LINE IF NOT FILLING>",
    "inputs": {
        "image": {
            "type": "url",
            "value": "https://media.roboflow.com/workflows/examples/bleachers.jpg"
        },
        "model1": "yolov8n-640",
        "model2": "yolov11n-640"
    }
}'

Tip

ChatGPT is really good at converting snippets like this into other languages. If you need help, try pasting it in and asking it to translate it to your language of choice.