SAM2 Video Tracker¶
Class: SegmentAnything2VideoBlockV1
Run Segment Anything 2 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.
Feed box detections from an upstream detector (e.g. a YOLO block) as
prompts. The block multiplexes a single SAM2 camera predictor across
many video streams by keying state on video_metadata.video_identifier;
depending on prompt_mode, it either re-seeds the prompts periodically
or simply propagates existing tracks.
Intended for use with InferencePipeline, which delivers one frame at
a time and tags each frame with video metadata.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/segment_anything_2_video@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
model_id |
str |
Streaming video tracker model id resolved by inference_models. The sam2video family ships four Hiera backbone sizes; small is the default trade-off between speed and quality. sam3trackervideo is SAM3's visually prompted tracker — the same prompt contract with a larger backbone, markedly better at identity retention on long videos and crowded scenes, at higher compute cost.. |
✅ |
prompt_mode |
str |
When to consume boxes as SAM2 prompts. first_frame prompts once per session and then tracks; every_n_frames re-seeds every prompt_interval frames; every_frame re-seeds every frame. On frames where re-seeding does not happen, boxes is ignored and the block simply propagates.. |
❌ |
prompt_interval |
int |
For prompt_mode=every_n_frames: re-prompt every N frames.. |
✅ |
threshold |
float |
Minimum confidence for emitted masks.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
soft— runtimehosted_serverless,dedicated_deployment; executionremote; inputvideo - Block keeps per-video state in process memory (keyed by video_metadata.video_identifier). With remote step execution on stateless or multi-replica HTTP runtimes, successive requests may be served by different worker processes, so the state resets between calls and the output is meaningless for tracking / counting / aggregation. Use local step execution in an InferencePipeline for stable cross-frame results.
-
hard— runtimeself_hosted_cpu; executionlocal - Requires a GPU; the streaming SAM2 video model needs CUDA.
-
soft— inputimage - Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to SAM2 Video Tracker in version v1.
- inputs:
Image Slicer,Polygon Zone Visualization,Line Counter,Contrast Enhancement,Time in Zone,Semantic Segmentation Model,Stability AI Image Generation,Image Threshold,Line Counter Visualization,Trace Visualization,Path Deviation,Semantic Segmentation Model,Multi-Label Classification Model,Distance Measurement,Image Stack,Camera Calibration,QR Code Generator,Detection Offset,ByteTrack Tracker,Detection Event Log,Per-Class Confidence Filter,Icon Visualization,SIFT Comparison,Detections Transformation,Morphological Transformation,Color Visualization,Single-Label Classification Model,Perspective Correction,Corner Visualization,Mask Area Measurement,Google Vision OCR,Detections Merge,Halo Visualization,Image Blur,Dynamic Zone,Detections Combine,Morphological Transformation,Keypoint Detection Model,Camera Focus,Halo Visualization,Stability AI Inpainting,PTZ Tracking (ONVIF),Object Detection Model,Classification Label Visualization,Bounding Rectangle,SAM2 Video Tracker,Grid Visualization,Background Color Visualization,Mask Visualization,Byte Tracker,Ellipse Visualization,Reference Path Visualization,Image Slicer,Label Visualization,Velocity,Text Display,Byte Tracker,SIFT Comparison,Dot Visualization,Polygon Visualization,Identify Changes,Crop Visualization,Dynamic Crop,Absolute Static Crop,Circle Visualization,Image Preprocessing,Detections Stitch,Path Deviation,Relative Static Crop,Camera Focus,Template Matching,BoT-SORT Tracker,SAM3 Video Tracker,Gaze Detection,Segment Anything 2 Model,VLM As Detector,OCR Model,Heatmap Visualization,Motion Detection,Detections Filter,Blur Visualization,Object Detection Model,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,SAM 3 Interactive,YOLO-World Model,Background Subtraction,Keypoint Visualization,Detections Consensus,Byte Tracker,Bounding Box Visualization,SAM 3,Stitch Images,Image Convert Grayscale,Instance Segmentation Model,Detections List Roll-Up,Contrast Equalization,Mask Edge Snap,Moondream2,VLM As Detector,Line Counter,Roboflow Visual Search,Triangle Visualization,EasyOCR,Overlap Filter,SAM 3,Time in Zone,Detections Classes Replacement,Instance Segmentation Model,Pixelate Visualization,Keypoint Detection Model,Detections Stabilizer,SORT Tracker,Instance Segmentation Model,SIFT,Track Class Lock,Object Detection Model,Multi-Label Classification Model,Time in Zone,Cosine Similarity,Image Contours,Polygon Visualization,Keypoint Detection Model,Pixel Color Count,OC-SORT Tracker,SAM 3,Model Comparison Visualization,Single-Label Classification Model,Seg Preview - outputs:
Line Counter,Time in Zone,Path Deviation,Trace Visualization,Distance Measurement,Detection Offset,ByteTrack Tracker,Detection Event Log,Per-Class Confidence Filter,Icon Visualization,Detections Transformation,Color Visualization,Perspective Correction,Corner Visualization,Mask Area Measurement,Roboflow Custom Metadata,Detections Merge,Halo Visualization,Dynamic Zone,Detections Combine,Roboflow Vision Events,Size Measurement,Halo Visualization,Stability AI Inpainting,PTZ Tracking (ONVIF),Bounding Rectangle,SAM2 Video Tracker,Event Writer,Mask Visualization,Byte Tracker,Background Color Visualization,Ellipse Visualization,Velocity,Label Visualization,Byte Tracker,Dot Visualization,Polygon Visualization,Crop Visualization,Dynamic Crop,Path Deviation,Circle Visualization,Detections Stitch,BoT-SORT Tracker,Model Monitoring Inference Aggregator,Camera Focus,Segment Anything 2 Model,Florence-2 Model,Heatmap Visualization,Detections Filter,Overlap Analysis,Blur Visualization,SAM 3 Interactive,Detections Consensus,Byte Tracker,Bounding Box Visualization,Florence-2 Model,Detections List Roll-Up,Mask Edge Snap,Line Counter,Triangle Visualization,Overlap Filter,Roboflow Dataset Upload,Time in Zone,Detections Classes Replacement,Pixelate Visualization,Roboflow Dataset Upload,Detections Stabilizer,SORT Tracker,Track Class Lock,Time in Zone,Polygon Visualization,OC-SORT Tracker,Model Comparison Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
SAM2 Video Tracker in version v1 has.
Bindings
-
input
images(image): The image to infer on..boxes(Union[keypoint_detection_prediction,object_detection_prediction,instance_segmentation_prediction]): Bounding boxes to use as SAM2 prompts. Only read on frames where the block re-prompts (seeprompt_mode)..model_id(roboflow_model_id): Streaming video tracker model id resolved byinference_models. Thesam2videofamily ships four Hiera backbone sizes;smallis the default trade-off between speed and quality.sam3trackervideois SAM3's visually prompted tracker — the same prompt contract with a larger backbone, markedly better at identity retention on long videos and crowded scenes, at higher compute cost..prompt_interval(integer): Forprompt_mode=every_n_frames: re-prompt every N frames..threshold(float): Minimum confidence for emitted masks..
-
output
predictions(instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.
Example JSON definition of step SAM2 Video Tracker in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/segment_anything_2_video@v1",
"images": "$inputs.image",
"boxes": "$steps.object_detection_model.predictions",
"model_id": "sam2video/tiny",
"prompt_mode": "<block_does_not_provide_example>",
"prompt_interval": 30,
"threshold": 0.0
}