SAM2 Video Tracker¶

Class: SegmentAnything2VideoBlockV1

Source: inference.core.workflows.core_steps.models.foundation.segment_anything2_video.v1.SegmentAnything2VideoBlockV1

Run Segment Anything 2 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.

Feed box detections from an upstream detector (e.g. a YOLO block) as prompts. The block multiplexes a single SAM2 camera predictor across many video streams by keying state on video_metadata.video_identifier; depending on prompt_mode, it either re-seeds the prompts periodically or simply propagates existing tracks.

Intended for use with InferencePipeline, which delivers one frame at a time and tags each frame with video metadata.

Type identifier¶

Use the following identifier in step "type" field: roboflow_core/segment_anything_2_video@v1to add the block as as step in your workflow.

Properties¶

Name	Type	Description	Refs
`name`	`str`	Enter a unique identifier for this step..	❌
`model_id`	`str`	Streaming SAM2 model id resolved by `inference_models`. The `sam2video` family ships four Hiera backbone sizes; `small` is the default trade-off between speed and quality..	✅
`prompt_mode`	`str`	When to consume `boxes` as SAM2 prompts. `first_frame` prompts once per session and then tracks; `every_n_frames` re-seeds every `prompt_interval` frames; `every_frame` re-seeds every frame. On frames where re-seeding does not happen, `boxes` is ignored and the block simply propagates..	❌
`prompt_interval`	`int`	For `prompt_mode=every_n_frames`: re-prompt every N frames..	✅
`threshold`	`float`	Minimum confidence for emitted masks..	✅

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections¶

Compatible Blocks

Check what blocks you can connect to SAM2 Video Tracker in version v1.

Input and Output Bindings¶

The available connections depend on its binding kinds. Check what binding kinds SAM2 Video Tracker in version v1 has.

Bindings

input
- images (image): The image to infer on..
- boxes (Union[instance_segmentation_prediction, object_detection_prediction, keypoint_detection_prediction]): Bounding boxes to use as SAM2 prompts. Only read on frames where the block re-prompts (see prompt_mode)..
- model_id (roboflow_model_id): Streaming SAM2 model id resolved by inference_models. The sam2video family ships four Hiera backbone sizes; small is the default trade-off between speed and quality..
- prompt_interval (integer): For prompt_mode=every_n_frames: re-prompt every N frames..
- threshold (float): Minimum confidence for emitted masks..
output
- predictions (instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.

Example JSON definition of step SAM2 Video Tracker in version v1

{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/segment_anything_2_video@v1",
    "images": "$inputs.image",
    "boxes": "$steps.object_detection_model.predictions",
    "model_id": "sam2video/tiny",
    "prompt_mode": "<block_does_not_provide_example>",
    "prompt_interval": 30,
    "threshold": 0.0
}