SAM3 Video Tracker¶
Class: SegmentAnything3VideoBlockV1
Run Segment Anything 3 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.
Provide the concepts to track as text in class_names (e.g.
["person", "forklift"]) — no upstream detector is needed. SAM3 runs
fused detection and tracking on every frame, so objects matching a
concept that enter the scene mid-stream are picked up automatically and
assigned fresh tracker_ids. Each emitted mask carries the prompt it
matched as its class name and the model's detection score as its
confidence.
The block multiplexes a single SAM3 streaming model across many video
streams by keying state on video_metadata.video_identifier; a session
is re-seeded only when the source stream restarts or class_names
changes. For detector-driven (box-prompted) video tracking, use the
SAM2 Video Tracker block instead.
Intended for use with InferencePipeline, which delivers one frame at
a time and tags each frame with video metadata.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/sam3_video@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
class_names |
Union[List[str], str] |
Concepts to segment and track, as a list of phrases (or a single comma-separated string). Each emitted mask carries the concept it matched as its class name.. | ✅ |
model_id |
str |
Streaming SAM3 model id resolved by inference_models.. |
✅ |
threshold |
float |
Minimum detection score for emitted masks. Scores come from SAM3's per-object concept detection head.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
soft— runtimehosted_serverless,dedicated_deployment; executionremote; inputvideo - Block keeps per-video state in process memory (keyed by video_metadata.video_identifier). With remote step execution on stateless or multi-replica HTTP runtimes, successive requests may be served by different worker processes, so the state resets between calls and the output is meaningless for tracking / counting / aggregation. Use local step execution in an InferencePipeline for stable cross-frame results.
-
hard— runtimeself_hosted_cpu; executionlocal - Requires a GPU; the streaming SAM3 video model needs CUDA.
-
soft— inputimage - Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to SAM3 Video Tracker in version v1.
- inputs:
Halo Visualization,Stitch OCR Detections,GLM-OCR,Image Threshold,Stitch Images,Morphological Transformation,Classification Label Visualization,Twilio SMS/MMS Notification,Crop Visualization,Icon Visualization,Stability AI Outpainting,Blur Visualization,VLM As Classifier,Reference Path Visualization,MoonshotAI Kimi,OpenAI,Google Gemini,Single-Label Classification Model,Single-Label Classification Model,Anthropic Claude,Webhook Sink,Camera Focus,Instance Segmentation Model,QR Code Generator,Size Measurement,Instance Segmentation Model,Model Comparison Visualization,Florence-2 Model,MQTT Writer,Trace Visualization,Ellipse Visualization,Anthropic Claude,Object Detection Model,Keypoint Detection Model,Dot Visualization,Perspective Correction,Label Visualization,Image Convert Grayscale,Florence-2 Model,Instance Segmentation Model,Text Display,Qwen-VL,Llama 3.2 Vision,Roboflow Dataset Upload,PLC ModbusTCP,Image Blur,Keypoint Detection Model,Absolute Static Crop,Gaze Detection,SIFT,CSV Formatter,Keypoint Detection Model,LMM,Google Gemini,Dimension Collapse,EasyOCR,Qwen 3.5 API,Qwen 3.6 API,Local File Sink,Google Gemma,Triangle Visualization,Camera Focus,Contrast Equalization,Polygon Visualization,OpenAI,Heatmap Visualization,Multi-Label Classification Model,Clip Comparison,Google Gemma API,Detections List Roll-Up,Contrast Enhancement,Google Gemini,PLC EthernetIP,Halo Visualization,Color Visualization,Morphological Transformation,MoonshotAI Kimi,Stitch OCR Detections,LMM For Classification,Event Writer,VLM As Detector,Llama 3.2 Vision,Buffer,Polygon Visualization,Image Stack,Email Notification,Mask Visualization,Anthropic Claude,Multi-Label Classification Model,Identify Changes,Stability AI Inpainting,Roboflow Asset Library Attributes,Microsoft SQL Server Sink,Keypoint Visualization,OpenAI,Background Subtraction,Multi-Label Classification Model,Roboflow Vision Events,Twilio SMS Notification,Email Notification,Semantic Segmentation Model,Image Slicer,Image Contours,Line Counter Visualization,CogVLM,Object Detection Model,Image Preprocessing,OPC UA Writer Sink,Semantic Segmentation Model,Dynamic Crop,Depth Estimation,Bounding Box Visualization,Motion Detection,Qwen3.5-VL,Current Time,Cosine Similarity,Clip Comparison,Corner Visualization,Polygon Zone Visualization,Camera Calibration,Roboflow Dataset Upload,Grid Visualization,Stability AI Image Generation,OpenAI,S3 Sink,Circle Visualization,Image Slicer,OCR Model,Single-Label Classification Model,Relative Static Crop,Roboflow Custom Metadata,Instance Segmentation Model,Model Monitoring Inference Aggregator,OpenAI-Compatible LLM,Slack Notification,OpenRouter,Object Detection Model,SIFT Comparison,Pixelate Visualization,Google Vision OCR,Background Color Visualization,Dynamic Zone - outputs:
Halo Visualization,Overlap Analysis,SAM 3 Interactive,Crop Visualization,Icon Visualization,Detections Transformation,Blur Visualization,ByteTrack Tracker,Detections Classes Replacement,Byte Tracker,Track Class Lock,Size Measurement,Mask Edge Snap,Model Comparison Visualization,Path Deviation,Florence-2 Model,Trace Visualization,Ellipse Visualization,BoT-SORT Tracker,Dot Visualization,Perspective Correction,Label Visualization,Florence-2 Model,Per-Class Confidence Filter,Roboflow Dataset Upload,Detections Stabilizer,Detections Merge,Velocity,OC-SORT Tracker,Triangle Visualization,Camera Focus,Time in Zone,Line Counter,SORT Tracker,SAM2 Video Tracker,Polygon Visualization,Heatmap Visualization,Detections Stitch,Detections List Roll-Up,Halo Visualization,Color Visualization,Event Writer,Polygon Visualization,Mask Visualization,Detections Filter,Distance Measurement,Stability AI Inpainting,Bounding Rectangle,PTZ Tracking (ONVIF),Time in Zone,Overlap Filter,Roboflow Vision Events,Mask Area Measurement,Detection Offset,Detections Consensus,Byte Tracker,Dynamic Crop,Path Deviation,Byte Tracker,Bounding Box Visualization,Detections Combine,Roboflow Dataset Upload,Corner Visualization,Segment Anything 2 Model,Circle Visualization,Time in Zone,Roboflow Custom Metadata,Model Monitoring Inference Aggregator,Detection Event Log,Pixelate Visualization,Background Color Visualization,Line Counter,Dynamic Zone
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
SAM3 Video Tracker in version v1 has.
Bindings
-
input
images(image): The image to infer on..class_names(Union[string,list_of_values]): Concepts to segment and track, as a list of phrases (or a single comma-separated string). Each emitted mask carries the concept it matched as its class name..model_id(roboflow_model_id): Streaming SAM3 model id resolved byinference_models..threshold(float): Minimum detection score for emitted masks. Scores come from SAM3's per-object concept detection head..
-
output
predictions(instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.
Example JSON definition of step SAM3 Video Tracker in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/sam3_video@v1",
"images": "$inputs.image",
"class_names": [
"person",
"forklift"
],
"model_id": "sam3video",
"threshold": 0.5
}