Inference Benchmarks¶

This page contains performance benchmarks for various models running with Inference on different hardware platforms.

NVIDIA L4 GPU¶

Performance benchmarks for object detection models, comparing standard ONNX runtime with TensorRT-optimized adapters:

Model Type	Size	Inference/sec (ONNX)	Inference/sec (TRT)	Improvement Multiplier
rfdetr-nano	384x384	103.5	299.5	2.9
rfdetr-small	512x512	57.4	253.4	4.4
rfdetr-medium	576x576	62.8	201.8	3.2
rfdetr-large	704x704	36.9	160.5	4.3
rfdetr-xlarge	700x700	18.1	96.1	5.3
rfdetr-2xlarge	880x880	17.4	74.1	4.3

Performance benchmarks for instance segmentation models:

Model Type	Size	Inference/sec (ONNX)	Inference/sec (TRT)	Improvement Multiplier
rfdetr-seg-nano	312x312	51.9	105.3	2.0
rfdetr-seg-small	384x384	57.5	126.7	2.2
rfdetr-seg-medium	432x432	39.7	99.7	2.5
rfdetr-seg-large	504x504	32.8	93.2	2.8
rfdetr-seg-xlarge	624x624	17	68.8	4.0
rfdetr-seg-2xlarge	768x768	10.7	59	5.5

Performance benchmarks for classification models:

Model Type	Size	Inference/sec (ONNX)	Inference/sec (TRT)	Improvement Multiplier
ResNet50	224x224	358.6	600.8	1.7
ViT	224x224	238	306.5	1.3

Performance benchmarks for object detection models on Jetson Orin NX:

Model Type	Size	Inference/sec (ONNX)	Inference/sec (TRT)	Improvement Multiplier
rfdetr-nano	384x384	21.2	78.5	3.7
rfdetr-small	512x512	13.9	52.5	3.8
rfdetr-medium	576x576	11	44	4.0
yolov8n-640	640x640	35.6	89	2.5
yolov8s-640	640x640	26.3	69.5	2.6
yolov8m-640	640x640	13.5	44.5	3.3
yolov8l-640	640x640	9	32.5	3.6
yolov8x-640	640x640	6.4	22	3.4

All benchmarks were conducted using the Roboflow Inference CLI. All were used single image (batch size=1, -bs 1) and on 500 iterations (-bi 500)

Inference/sec (ONNX) - Standard ONNX runtime, measured with:
```
inference benchmark python-package-speed -m [model]
```
Inference/sec (TRT) - TensorRT-optimized adapters (supported in inference 1.0 and later), measured with:
```
USE_INFERENCE_MODELS=TRUE inference benchmark python-package-speed -m [model]
```