SAM2 Video Tracker¶
Class: SegmentAnything2VideoBlockV1
Run Segment Anything 2 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.
Feed box detections from an upstream detector (e.g. a YOLO block) as
prompts. The block multiplexes a single SAM2 camera predictor across
many video streams by keying state on video_metadata.video_identifier;
depending on prompt_mode, it either re-seeds the prompts periodically
or simply propagates existing tracks.
Intended for use with InferencePipeline, which delivers one frame at
a time and tags each frame with video metadata.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/segment_anything_2_video@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
model_id |
str |
Streaming SAM2 model id resolved by inference_models. The sam2video family ships four Hiera backbone sizes; small is the default trade-off between speed and quality.. |
✅ |
prompt_mode |
str |
When to consume boxes as SAM2 prompts. first_frame prompts once per session and then tracks; every_n_frames re-seeds every prompt_interval frames; every_frame re-seeds every frame. On frames where re-seeding does not happen, boxes is ignored and the block simply propagates.. |
❌ |
prompt_interval |
int |
For prompt_mode=every_n_frames: re-prompt every N frames.. |
✅ |
threshold |
float |
Minimum confidence for emitted masks.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
soft— runtimehosted_serverless,dedicated_deployment; executionremote; inputvideo - Block keeps per-video state in process memory (keyed by video_metadata.video_identifier). With remote step execution on stateless or multi-replica HTTP runtimes, successive requests may be served by different worker processes, so the state resets between calls and the output is meaningless for tracking / counting / aggregation. Use local step execution in an InferencePipeline for stable cross-frame results.
-
hard— runtimeself_hosted_cpu; executionlocal - Requires a GPU; the streaming SAM2 video model needs CUDA.
-
soft— inputimage - Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to SAM2 Video Tracker in version v1.
- inputs:
Stability AI Outpainting,Camera Focus,Halo Visualization,QR Code Generator,Ellipse Visualization,ByteTrack Tracker,Camera Focus,Classification Label Visualization,Bounding Box Visualization,Image Contours,Image Preprocessing,Background Subtraction,SAM 3,Object Detection Model,Multi-Label Classification Model,Pixelate Visualization,Color Visualization,Crop Visualization,Mask Visualization,YOLO-World Model,Path Deviation,Segment Anything 2 Model,Image Slicer,Template Matching,Depth Estimation,Seg Preview,Text Display,SAM2 Video Tracker,Byte Tracker,Relative Static Crop,Google Vision OCR,Object Detection Model,Icon Visualization,Time in Zone,Distance Measurement,Motion Detection,Blur Visualization,SIFT Comparison,Single-Label Classification Model,Grid Visualization,Cosine Similarity,Object Detection Model,Detections List Roll-Up,Semantic Segmentation Model,Instance Segmentation Model,EasyOCR,VLM As Detector,Stability AI Inpainting,SAM 3,Contrast Enhancement,Overlap Filter,Image Threshold,Image Convert Grayscale,Trace Visualization,Circle Visualization,Instance Segmentation Model,Label Visualization,Pixel Color Count,Morphological Transformation,Morphological Transformation,Keypoint Detection Model,Polygon Zone Visualization,Image Blur,Keypoint Visualization,Identify Changes,OC-SORT Tracker,Keypoint Detection Model,Dynamic Crop,Camera Calibration,Polygon Visualization,VLM As Detector,PTZ Tracking (ONVIF),Detections Transformation,SIFT Comparison,Stability AI Image Generation,Single-Label Classification Model,Detections Stitch,Gaze Detection,Semantic Segmentation Model,Byte Tracker,Perspective Correction,Absolute Static Crop,Mask Area Measurement,Multi-Label Classification Model,Stitch Images,Detections Classes Replacement,Time in Zone,SAM 3,Line Counter,Detection Offset,Mask Edge Snap,Detections Merge,Time in Zone,Byte Tracker,Contrast Equalization,SORT Tracker,Image Stack,Triangle Visualization,Background Color Visualization,Detections Consensus,Keypoint Detection Model,Corner Visualization,Model Comparison Visualization,Detections Combine,Dot Visualization,Line Counter Visualization,Dynamic Zone,BoT-SORT Tracker,Moondream2,Reference Path Visualization,Polygon Visualization,Halo Visualization,Line Counter,OCR Model,Detections Stabilizer,Path Deviation,Detection Event Log,SIFT,Instance Segmentation Model,Detections Filter,Heatmap Visualization,Bounding Rectangle,Instance Segmentation Model,Per-Class Confidence Filter,Velocity,Image Slicer - outputs:
Event Writer,Halo Visualization,Ellipse Visualization,ByteTrack Tracker,Camera Focus,Bounding Box Visualization,Pixelate Visualization,Color Visualization,Mask Visualization,Segment Anything 2 Model,Path Deviation,Crop Visualization,Roboflow Vision Events,SAM2 Video Tracker,Byte Tracker,Time in Zone,Distance Measurement,Icon Visualization,Blur Visualization,Detections List Roll-Up,Stability AI Inpainting,Roboflow Dataset Upload,Model Monitoring Inference Aggregator,Overlap Filter,Trace Visualization,Circle Visualization,Label Visualization,OC-SORT Tracker,Dynamic Crop,Polygon Visualization,Florence-2 Model,PTZ Tracking (ONVIF),Detections Transformation,Overlap Analysis,Detections Stitch,Byte Tracker,Perspective Correction,Mask Area Measurement,Roboflow Dataset Upload,Florence-2 Model,Detections Classes Replacement,Time in Zone,Line Counter,Detection Offset,Mask Edge Snap,Time in Zone,Detections Merge,Byte Tracker,SORT Tracker,Triangle Visualization,Detections Consensus,Background Color Visualization,Corner Visualization,Model Comparison Visualization,Detections Combine,Dot Visualization,Dynamic Zone,BoT-SORT Tracker,Polygon Visualization,Halo Visualization,Size Measurement,Roboflow Custom Metadata,Line Counter,Detections Stabilizer,Path Deviation,Detection Event Log,Detections Filter,Heatmap Visualization,Bounding Rectangle,Per-Class Confidence Filter,Velocity
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
SAM2 Video Tracker in version v1 has.
Bindings
-
input
images(image): The image to infer on..boxes(Union[keypoint_detection_prediction,object_detection_prediction,instance_segmentation_prediction]): Bounding boxes to use as SAM2 prompts. Only read on frames where the block re-prompts (seeprompt_mode)..model_id(roboflow_model_id): Streaming SAM2 model id resolved byinference_models. Thesam2videofamily ships four Hiera backbone sizes;smallis the default trade-off between speed and quality..prompt_interval(integer): Forprompt_mode=every_n_frames: re-prompt every N frames..threshold(float): Minimum confidence for emitted masks..
-
output
predictions(instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.
Example JSON definition of step SAM2 Video Tracker in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/segment_anything_2_video@v1",
"images": "$inputs.image",
"boxes": "$steps.object_detection_model.predictions",
"model_id": "sam2video/tiny",
"prompt_mode": "<block_does_not_provide_example>",
"prompt_interval": 30,
"threshold": 0.0
}