SAM2 Video Tracker¶
Class: SegmentAnything2VideoBlockV1
Run Segment Anything 2 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.
Feed box detections from an upstream detector (e.g. a YOLO block) as
prompts. The block multiplexes a single SAM2 camera predictor across
many video streams by keying state on video_metadata.video_identifier;
depending on prompt_mode, it either re-seeds the prompts periodically
or simply propagates existing tracks.
Intended for use with InferencePipeline, which delivers one frame at
a time and tags each frame with video metadata.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/segment_anything_2_video@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
model_id |
str |
Streaming SAM2 model id resolved by inference_models. The sam2video family ships four Hiera backbone sizes; small is the default trade-off between speed and quality.. |
✅ |
prompt_mode |
str |
When to consume boxes as SAM2 prompts. first_frame prompts once per session and then tracks; every_n_frames re-seeds every prompt_interval frames; every_frame re-seeds every frame. On frames where re-seeding does not happen, boxes is ignored and the block simply propagates.. |
❌ |
prompt_interval |
int |
For prompt_mode=every_n_frames: re-prompt every N frames.. |
✅ |
threshold |
float |
Minimum confidence for emitted masks.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to SAM2 Video Tracker in version v1.
- inputs:
Icon Visualization,Moondream2,Label Visualization,Instance Segmentation Model,Multi-Label Classification Model,Object Detection Model,Dot Visualization,Camera Calibration,Polygon Zone Visualization,SIFT,Trace Visualization,Detections Stitch,SAM 3,Morphological Transformation,Perspective Correction,Time in Zone,Dynamic Zone,Instance Segmentation Model,Detections Combine,Semantic Segmentation Model,Relative Static Crop,Image Slicer,Image Threshold,Keypoint Visualization,Overlap Filter,PTZ Tracking (ONVIF),Template Matching,Absolute Static Crop,Single-Label Classification Model,Path Deviation,Blur Visualization,Circle Visualization,Keypoint Detection Model,Camera Focus,Crop Visualization,EasyOCR,Classification Label Visualization,Detections Merge,SAM 3,Gaze Detection,Polygon Visualization,Color Visualization,Line Counter,Byte Tracker,YOLO-World Model,Image Contours,Mask Area Measurement,Keypoint Detection Model,Path Deviation,Image Convert Grayscale,Single-Label Classification Model,Detections Filter,Time in Zone,Distance Measurement,VLM As Detector,Instance Segmentation Model,Cosine Similarity,Time in Zone,Triangle Visualization,Stability AI Inpainting,Pixel Color Count,Background Color Visualization,Identify Changes,Depth Estimation,Line Counter,Model Comparison Visualization,Image Blur,Stitch Images,SAM2 Video Tracker,Byte Tracker,Motion Detection,Contrast Equalization,Corner Visualization,Detections Classes Replacement,Velocity,Detections List Roll-Up,Halo Visualization,Stability AI Image Generation,Detections Consensus,Reference Path Visualization,Dynamic Crop,Line Counter Visualization,Detection Offset,SAM 3,SIFT Comparison,ByteTrack Tracker,Multi-Label Classification Model,Keypoint Detection Model,Detections Stabilizer,Heatmap Visualization,Text Display,Byte Tracker,VLM As Detector,Segment Anything 2 Model,Grid Visualization,Polygon Visualization,Camera Focus,Semantic Segmentation Model,Seg Preview,Detections Transformation,Image Slicer,Image Preprocessing,SIFT Comparison,Bounding Rectangle,SORT Tracker,Bounding Box Visualization,Stability AI Outpainting,Halo Visualization,OC-SORT Tracker,Detection Event Log,Background Subtraction,OCR Model,Object Detection Model,Object Detection Model,QR Code Generator,Pixelate Visualization,Ellipse Visualization,Google Vision OCR,Mask Visualization - outputs:
Icon Visualization,Roboflow Dataset Upload,Label Visualization,Dot Visualization,Detections Stitch,Trace Visualization,Time in Zone,Roboflow Custom Metadata,Perspective Correction,Dynamic Zone,Detections Combine,Overlap Filter,PTZ Tracking (ONVIF),Path Deviation,Blur Visualization,Circle Visualization,Crop Visualization,Detections Merge,Polygon Visualization,Color Visualization,Line Counter,Byte Tracker,Mask Area Measurement,Model Monitoring Inference Aggregator,Roboflow Vision Events,Path Deviation,Detections Filter,Time in Zone,Distance Measurement,Time in Zone,Triangle Visualization,Stability AI Inpainting,Background Color Visualization,Line Counter,Model Comparison Visualization,Byte Tracker,SAM2 Video Tracker,Corner Visualization,Detections Classes Replacement,Velocity,Detections List Roll-Up,Halo Visualization,Florence-2 Model,Detections Consensus,Size Measurement,Detection Offset,Dynamic Crop,ByteTrack Tracker,Detections Stabilizer,Roboflow Dataset Upload,Heatmap Visualization,Byte Tracker,Segment Anything 2 Model,Polygon Visualization,Mask Visualization,Camera Focus,Detections Transformation,Bounding Rectangle,SORT Tracker,Bounding Box Visualization,Halo Visualization,OC-SORT Tracker,Detection Event Log,Pixelate Visualization,Ellipse Visualization,Florence-2 Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
SAM2 Video Tracker in version v1 has.
Bindings
-
input
images(image): The image to infer on..boxes(Union[keypoint_detection_prediction,object_detection_prediction,instance_segmentation_prediction]): Bounding boxes to use as SAM2 prompts. Only read on frames where the block re-prompts (seeprompt_mode)..model_id(roboflow_model_id): Streaming SAM2 model id resolved byinference_models. Thesam2videofamily ships four Hiera backbone sizes;smallis the default trade-off between speed and quality..prompt_interval(integer): Forprompt_mode=every_n_frames: re-prompt every N frames..threshold(float): Minimum confidence for emitted masks..
-
output
predictions(instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.
Example JSON definition of step SAM2 Video Tracker in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/segment_anything_2_video@v1",
"images": "$inputs.image",
"boxes": "$steps.object_detection_model.predictions",
"model_id": "sam2video/tiny",
"prompt_mode": "<block_does_not_provide_example>",
"prompt_interval": 30,
"threshold": 0.0
}