SAM3 Video Tracker¶

Class: SegmentAnything3VideoBlockV1

Source: inference.core.workflows.core_steps.models.foundation.segment_anything3_video.v1.SegmentAnything3VideoBlockV1

Run Segment Anything 3 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.

Provide the concepts to track as text in class_names (e.g. ["person", "forklift"]) — no upstream detector is needed. SAM3 runs fused detection and tracking on every frame, so objects matching a concept that enter the scene mid-stream are picked up automatically and assigned fresh tracker_ids. Each emitted mask carries the prompt it matched as its class name and the model's detection score as its confidence.

The block multiplexes a single SAM3 streaming model across many video streams by keying state on video_metadata.video_identifier; a session is re-seeded only when the source stream restarts or class_names changes. For detector-driven (box-prompted) video tracking, use the SAM2 Video Tracker block instead.

Intended for use with InferencePipeline, which delivers one frame at a time and tags each frame with video metadata.

Type identifier¶

Use the following identifier in step "type" field: roboflow_core/sam3_video@v1to add the block as as step in your workflow.

Properties¶

Name	Type	Description	Refs
`name`	`str`	Enter a unique identifier for this step..	❌
`class_names`	`Union[List[str], str]`	Concepts to segment and track, as a list of phrases (or a single comma-separated string). Each emitted mask carries the concept it matched as its class name..	✅
`model_id`	`str`	Streaming SAM3 model id resolved by `inference_models`..	✅
`threshold`	`float`	Minimum detection score for emitted masks. Scores come from SAM3's per-object concept detection head..	✅

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Runtime compatibility¶

soft — runtime hosted_serverless, dedicated_deployment; execution remote; input video: Block keeps per-video state in process memory (keyed by video_metadata.video_identifier). With remote step execution on stateless or multi-replica HTTP runtimes, successive requests may be served by different worker processes, so the state resets between calls and the output is meaningless for tracking / counting / aggregation. Use local step execution in an InferencePipeline for stable cross-frame results.
hard — runtime self_hosted_cpu; execution local: Requires a GPU; the streaming SAM3 video model needs CUDA.
soft — input image: Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.

Available Connections¶

Compatible Blocks

Check what blocks you can connect to SAM3 Video Tracker in version v1.

Input and Output Bindings¶

The available connections depend on its binding kinds. Check what binding kinds SAM3 Video Tracker in version v1 has.

Bindings

input
- images (image): The image to infer on..
- class_names (Union[string, list_of_values]): Concepts to segment and track, as a list of phrases (or a single comma-separated string). Each emitted mask carries the concept it matched as its class name..
- model_id (roboflow_model_id): Streaming SAM3 model id resolved by inference_models..
- threshold (float): Minimum detection score for emitted masks. Scores come from SAM3's per-object concept detection head..
output
- predictions (instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.

Example JSON definition of step SAM3 Video Tracker in version v1

{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/sam3_video@v1",
    "images": "$inputs.image",
    "class_names": [
        "person",
        "forklift"
    ],
    "model_id": "sam3video",
    "threshold": 0.5
}