Skip to content

SAM3 Video Tracker

Class: SegmentAnything3VideoBlockV1

Source: inference.core.workflows.core_steps.models.foundation.segment_anything3_video.v1.SegmentAnything3VideoBlockV1

Run Segment Anything 3 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.

Provide the concepts to track as text in class_names (e.g. ["person", "forklift"]) — no upstream detector is needed. SAM3 runs fused detection and tracking on every frame, so objects matching a concept that enter the scene mid-stream are picked up automatically and assigned fresh tracker_ids. Each emitted mask carries the prompt it matched as its class name and the model's detection score as its confidence.

The block multiplexes a single SAM3 streaming model across many video streams by keying state on video_metadata.video_identifier; a session is re-seeded only when the source stream restarts or class_names changes. For detector-driven (box-prompted) video tracking, use the SAM2 Video Tracker block instead.

Intended for use with InferencePipeline, which delivers one frame at a time and tags each frame with video metadata.

Type identifier

Use the following identifier in step "type" field: roboflow_core/sam3_video@v1to add the block as as step in your workflow.

Properties

Name Type Description Refs
name str Enter a unique identifier for this step..
class_names Union[List[str], str] Concepts to segment and track, as a list of phrases (or a single comma-separated string). Each emitted mask carries the concept it matched as its class name..
model_id str Streaming SAM3 model id resolved by inference_models..
threshold float Minimum detection score for emitted masks. Scores come from SAM3's per-object concept detection head..

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Runtime compatibility

soft — runtime hosted_serverless, dedicated_deployment; execution remote; input video
Block keeps per-video state in process memory (keyed by video_metadata.video_identifier). With remote step execution on stateless or multi-replica HTTP runtimes, successive requests may be served by different worker processes, so the state resets between calls and the output is meaningless for tracking / counting / aggregation. Use local step execution in an InferencePipeline for stable cross-frame results.
hard — runtime self_hosted_cpu; execution local
Requires a GPU; the streaming SAM3 video model needs CUDA.
soft — input image
Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.

Available Connections

Compatible Blocks

Check what blocks you can connect to SAM3 Video Tracker in version v1.

Input and Output Bindings

The available connections depend on its binding kinds. Check what binding kinds SAM3 Video Tracker in version v1 has.

Bindings
  • input

    • images (image): The image to infer on..
    • class_names (Union[string, list_of_values]): Concepts to segment and track, as a list of phrases (or a single comma-separated string). Each emitted mask carries the concept it matched as its class name..
    • model_id (roboflow_model_id): Streaming SAM3 model id resolved by inference_models..
    • threshold (float): Minimum detection score for emitted masks. Scores come from SAM3's per-object concept detection head..
  • output

Example JSON definition of step SAM3 Video Tracker in version v1
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/sam3_video@v1",
    "images": "$inputs.image",
    "class_names": [
        "person",
        "forklift"
    ],
    "model_id": "sam3video",
    "threshold": 0.5
}