SAM2 Video Tracker¶
Class: SegmentAnything2VideoBlockV1
Run Segment Anything 2 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.
Feed box detections from an upstream detector (e.g. a YOLO block) as
prompts. The block multiplexes a single SAM2 camera predictor across
many video streams by keying state on video_metadata.video_identifier;
depending on prompt_mode, it either re-seeds the prompts periodically
or simply propagates existing tracks.
Intended for use with InferencePipeline, which delivers one frame at
a time and tags each frame with video metadata.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/segment_anything_2_video@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
model_id |
str |
Streaming SAM2 model id resolved by inference_models. The sam2video family ships four Hiera backbone sizes; small is the default trade-off between speed and quality.. |
✅ |
prompt_mode |
str |
When to consume boxes as SAM2 prompts. first_frame prompts once per session and then tracks; every_n_frames re-seeds every prompt_interval frames; every_frame re-seeds every frame. On frames where re-seeding does not happen, boxes is ignored and the block simply propagates.. |
❌ |
prompt_interval |
int |
For prompt_mode=every_n_frames: re-prompt every N frames.. |
✅ |
threshold |
float |
Minimum confidence for emitted masks.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to SAM2 Video Tracker in version v1.
- inputs:
Object Detection Model,Perspective Correction,SAM 3,BoT-SORT Tracker,Stability AI Inpainting,Image Convert Grayscale,Keypoint Detection Model,Morphological Transformation,Path Deviation,SAM 3,VLM As Detector,Line Counter,QR Code Generator,Object Detection Model,YOLO-World Model,Line Counter,Time in Zone,Polygon Zone Visualization,Image Threshold,OC-SORT Tracker,VLM As Detector,Dynamic Crop,Detections Consensus,Heatmap Visualization,Keypoint Visualization,Seg Preview,Stability AI Image Generation,Google Vision OCR,Camera Focus,Label Visualization,SAM 3,Instance Segmentation Model,Path Deviation,Contrast Enhancement,Bounding Box Visualization,Overlap Filter,Depth Estimation,Detection Offset,Multi-Label Classification Model,Keypoint Detection Model,Image Contours,EasyOCR,Relative Static Crop,Motion Detection,Multi-Label Classification Model,Polygon Visualization,Byte Tracker,Background Color Visualization,Template Matching,Mask Edge Snap,Instance Segmentation Model,Single-Label Classification Model,Image Blur,Polygon Visualization,Moondream2,Velocity,SIFT Comparison,Grid Visualization,Detection Event Log,Per-Class Confidence Filter,Triangle Visualization,Object Detection Model,Time in Zone,OCR Model,Single-Label Classification Model,SIFT Comparison,Detections Filter,Image Stack,Detections Merge,Pixelate Visualization,Stitch Images,Instance Segmentation Model,Detections Stabilizer,Image Slicer,Keypoint Detection Model,Image Preprocessing,SIFT,Line Counter Visualization,Image Slicer,Cosine Similarity,Detections Classes Replacement,Dynamic Zone,Semantic Segmentation Model,Corner Visualization,Stability AI Outpainting,Segment Anything 2 Model,Halo Visualization,Detections Transformation,Color Visualization,Time in Zone,Blur Visualization,Detections List Roll-Up,Semantic Segmentation Model,Classification Label Visualization,Camera Focus,Camera Calibration,Morphological Transformation,Trace Visualization,Detections Stitch,Distance Measurement,Gaze Detection,Reference Path Visualization,Halo Visualization,Byte Tracker,Ellipse Visualization,Model Comparison Visualization,Dot Visualization,PTZ Tracking (ONVIF),SORT Tracker,Identify Changes,Mask Visualization,Pixel Color Count,Crop Visualization,Background Subtraction,Circle Visualization,Text Display,Detections Combine,Bounding Rectangle,ByteTrack Tracker,Absolute Static Crop,SAM2 Video Tracker,Contrast Equalization,Byte Tracker,Icon Visualization,Mask Area Measurement - outputs:
Perspective Correction,BoT-SORT Tracker,Stability AI Inpainting,Path Deviation,Line Counter,Model Monitoring Inference Aggregator,Line Counter,Time in Zone,OC-SORT Tracker,Dynamic Crop,Size Measurement,Detections Consensus,Heatmap Visualization,Label Visualization,Path Deviation,Bounding Box Visualization,Overlap Filter,Detection Offset,Polygon Visualization,Byte Tracker,Background Color Visualization,Mask Edge Snap,Polygon Visualization,Velocity,Per-Class Confidence Filter,Detection Event Log,Florence-2 Model,Triangle Visualization,Time in Zone,Roboflow Custom Metadata,Detections Filter,Detections Merge,Pixelate Visualization,Detections Stabilizer,Roboflow Dataset Upload,Detections Classes Replacement,Dynamic Zone,Segment Anything 2 Model,Corner Visualization,Halo Visualization,Roboflow Dataset Upload,Detections Transformation,Time in Zone,Color Visualization,Blur Visualization,Detections List Roll-Up,Camera Focus,Distance Measurement,Trace Visualization,Detections Stitch,Halo Visualization,Byte Tracker,Ellipse Visualization,Model Comparison Visualization,Dot Visualization,PTZ Tracking (ONVIF),SORT Tracker,Mask Visualization,Crop Visualization,Circle Visualization,Detections Combine,Bounding Rectangle,ByteTrack Tracker,SAM2 Video Tracker,Florence-2 Model,Roboflow Vision Events,Byte Tracker,Icon Visualization,Mask Area Measurement
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
SAM2 Video Tracker in version v1 has.
Bindings
-
input
images(image): The image to infer on..boxes(Union[instance_segmentation_prediction,object_detection_prediction,keypoint_detection_prediction]): Bounding boxes to use as SAM2 prompts. Only read on frames where the block re-prompts (seeprompt_mode)..model_id(roboflow_model_id): Streaming SAM2 model id resolved byinference_models. Thesam2videofamily ships four Hiera backbone sizes;smallis the default trade-off between speed and quality..prompt_interval(integer): Forprompt_mode=every_n_frames: re-prompt every N frames..threshold(float): Minimum confidence for emitted masks..
-
output
predictions(instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.
Example JSON definition of step SAM2 Video Tracker in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/segment_anything_2_video@v1",
"images": "$inputs.image",
"boxes": "$steps.object_detection_model.predictions",
"model_id": "sam2video/tiny",
"prompt_mode": "<block_does_not_provide_example>",
"prompt_interval": 30,
"threshold": 0.0
}