SAM2 Video Tracker¶
Class: SegmentAnything2VideoBlockV1
Run Segment Anything 2 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.
Feed box detections from an upstream detector (e.g. a YOLO block) as
prompts. The block multiplexes a single SAM2 camera predictor across
many video streams by keying state on video_metadata.video_identifier;
depending on prompt_mode, it either re-seeds the prompts periodically
or simply propagates existing tracks.
Intended for use with InferencePipeline, which delivers one frame at
a time and tags each frame with video metadata.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/segment_anything_2_video@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
model_id |
str |
Streaming SAM2 model id resolved by inference_models. The sam2video family ships four Hiera backbone sizes; small is the default trade-off between speed and quality.. |
✅ |
prompt_mode |
str |
When to consume boxes as SAM2 prompts. first_frame prompts once per session and then tracks; every_n_frames re-seeds every prompt_interval frames; every_frame re-seeds every frame. On frames where re-seeding does not happen, boxes is ignored and the block simply propagates.. |
❌ |
prompt_interval |
int |
For prompt_mode=every_n_frames: re-prompt every N frames.. |
✅ |
threshold |
float |
Minimum confidence for emitted masks.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
soft— runtimehosted_serverless,dedicated_deployment; executionremote; inputvideo - Block keeps per-video state in process memory (keyed by video_metadata.video_identifier). With remote step execution on stateless or multi-replica HTTP runtimes, successive requests may be served by different worker processes, so the state resets between calls and the output is meaningless for tracking / counting / aggregation. Use local step execution in an InferencePipeline for stable cross-frame results.
-
hard— runtimeself_hosted_cpu; executionlocal - Requires a GPU; the streaming SAM2 video model needs CUDA.
-
soft— inputimage - Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to SAM2 Video Tracker in version v1.
- inputs:
Detections Classes Replacement,Morphological Transformation,OC-SORT Tracker,Moondream2,Image Preprocessing,Overlap Filter,Seg Preview,Halo Visualization,Morphological Transformation,Detections Transformation,EasyOCR,YOLO-World Model,Multi-Label Classification Model,Segment Anything 2 Model,Object Detection Model,Time in Zone,Text Display,BoT-SORT Tracker,Image Threshold,Template Matching,Single-Label Classification Model,Pixel Color Count,Icon Visualization,Triangle Visualization,Mask Area Measurement,Pixelate Visualization,Path Deviation,Keypoint Detection Model,Time in Zone,Crop Visualization,SAM 3,Cosine Similarity,Dot Visualization,Detections Merge,Google Vision OCR,Detections List Roll-Up,Instance Segmentation Model,Mask Edge Snap,Distance Measurement,Polygon Zone Visualization,Instance Segmentation Model,Polygon Visualization,SIFT Comparison,Background Color Visualization,Absolute Static Crop,QR Code Generator,SIFT Comparison,Contrast Enhancement,Byte Tracker,Per-Class Confidence Filter,Grid Visualization,Semantic Segmentation Model,Corner Visualization,Reference Path Visualization,Image Slicer,Single-Label Classification Model,Line Counter,Dynamic Zone,Detections Filter,Halo Visualization,Dynamic Crop,OCR Model,Byte Tracker,Color Visualization,Stability AI Outpainting,Instance Segmentation Model,Detection Event Log,Gaze Detection,Bounding Rectangle,VLM As Detector,Detections Stabilizer,Relative Static Crop,Image Blur,Line Counter Visualization,Stability AI Inpainting,Blur Visualization,Object Detection Model,SORT Tracker,Path Deviation,SAM 3,Perspective Correction,Keypoint Visualization,ByteTrack Tracker,Byte Tracker,Detection Offset,Velocity,Motion Detection,Image Slicer,Camera Calibration,Model Comparison Visualization,Depth Estimation,Trace Visualization,Ellipse Visualization,Detections Combine,PTZ Tracking (ONVIF),SAM 3,Detections Consensus,Detections Stitch,Line Counter,Object Detection Model,Identify Changes,Circle Visualization,Time in Zone,Image Stack,Instance Segmentation Model,Contrast Equalization,Camera Focus,Heatmap Visualization,Background Subtraction,Image Contours,SAM2 Video Tracker,VLM As Detector,Classification Label Visualization,Bounding Box Visualization,Label Visualization,Keypoint Detection Model,Camera Focus,Keypoint Detection Model,Stitch Images,Mask Visualization,Multi-Label Classification Model,SIFT,Stability AI Image Generation,Semantic Segmentation Model,Polygon Visualization,Image Convert Grayscale - outputs:
Detections Classes Replacement,OC-SORT Tracker,Florence-2 Model,Overlap Filter,Halo Visualization,Detections Transformation,Segment Anything 2 Model,Time in Zone,BoT-SORT Tracker,Triangle Visualization,Icon Visualization,Mask Area Measurement,Model Monitoring Inference Aggregator,Pixelate Visualization,Path Deviation,Time in Zone,Crop Visualization,Dot Visualization,Detections Merge,Detections List Roll-Up,Florence-2 Model,Roboflow Dataset Upload,Mask Edge Snap,Distance Measurement,Roboflow Vision Events,Polygon Visualization,Background Color Visualization,Per-Class Confidence Filter,Byte Tracker,Corner Visualization,Line Counter,Dynamic Zone,Detections Filter,Halo Visualization,Dynamic Crop,Byte Tracker,Color Visualization,Detection Event Log,Roboflow Dataset Upload,Roboflow Custom Metadata,Bounding Rectangle,Detections Stabilizer,Stability AI Inpainting,Blur Visualization,SORT Tracker,Path Deviation,Perspective Correction,ByteTrack Tracker,Byte Tracker,Detection Offset,Velocity,Detections Combine,PTZ Tracking (ONVIF),Model Comparison Visualization,Trace Visualization,Detections Consensus,Ellipse Visualization,Detections Stitch,Line Counter,Circle Visualization,Overlap Analysis,Time in Zone,Event Writer,Heatmap Visualization,Camera Focus,SAM2 Video Tracker,Bounding Box Visualization,Label Visualization,Size Measurement,Mask Visualization,Polygon Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
SAM2 Video Tracker in version v1 has.
Bindings
-
input
images(image): The image to infer on..boxes(Union[instance_segmentation_prediction,keypoint_detection_prediction,object_detection_prediction]): Bounding boxes to use as SAM2 prompts. Only read on frames where the block re-prompts (seeprompt_mode)..model_id(roboflow_model_id): Streaming SAM2 model id resolved byinference_models. Thesam2videofamily ships four Hiera backbone sizes;smallis the default trade-off between speed and quality..prompt_interval(integer): Forprompt_mode=every_n_frames: re-prompt every N frames..threshold(float): Minimum confidence for emitted masks..
-
output
predictions(instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.
Example JSON definition of step SAM2 Video Tracker in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/segment_anything_2_video@v1",
"images": "$inputs.image",
"boxes": "$steps.object_detection_model.predictions",
"model_id": "sam2video/tiny",
"prompt_mode": "<block_does_not_provide_example>",
"prompt_interval": 30,
"threshold": 0.0
}