SAM3 Video Tracker¶
Class: SegmentAnything3VideoBlockV1
Run Segment Anything 3 on a live video stream frame by frame, keeping per-video temporal memory so object identities are preserved across frames.
Provide the concepts to track as text in class_names (e.g.
["person", "forklift"]) — no upstream detector is needed. SAM3 runs
fused detection and tracking on every frame, so objects matching a
concept that enter the scene mid-stream are picked up automatically and
assigned fresh tracker_ids. Each emitted mask carries the prompt it
matched as its class name and the model's detection score as its
confidence.
The block multiplexes a single SAM3 streaming model across many video
streams by keying state on video_metadata.video_identifier; a session
is re-seeded only when the source stream restarts or class_names
changes. For detector-driven (box-prompted) video tracking, use the
SAM2 Video Tracker block instead.
Intended for use with InferencePipeline, which delivers one frame at
a time and tags each frame with video metadata.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/sam3_video@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
class_names |
Union[List[str], str] |
Concepts to segment and track, as a list of phrases (or a single comma-separated string). Each emitted mask carries the concept it matched as its class name.. | ✅ |
model_id |
str |
Streaming SAM3 model id resolved by inference_models.. |
✅ |
threshold |
float |
Minimum detection score for emitted masks. Scores come from SAM3's per-object concept detection head.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
soft— runtimehosted_serverless,dedicated_deployment; executionremote; inputvideo - Block keeps per-video state in process memory (keyed by video_metadata.video_identifier). With remote step execution on stateless or multi-replica HTTP runtimes, successive requests may be served by different worker processes, so the state resets between calls and the output is meaningless for tracking / counting / aggregation. Use local step execution in an InferencePipeline for stable cross-frame results.
-
hard— runtimeself_hosted_cpu; executionlocal - Requires a GPU; the streaming SAM3 video model needs CUDA.
-
soft— inputimage - Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to SAM3 Video Tracker in version v1.
- inputs:
VLM As Classifier,MoonshotAI Kimi,Stability AI Image Generation,Trace Visualization,Image Stack,Anthropic Claude,Icon Visualization,SIFT Comparison,Morphological Transformation,Color Visualization,LMM For Classification,Single-Label Classification Model,Perspective Correction,Corner Visualization,Clip Comparison,Roboflow Custom Metadata,Halo Visualization,Dynamic Zone,Qwen-VL,Keypoint Detection Model,Email Notification,Halo Visualization,Object Detection Model,Google Gemma,Background Color Visualization,Ellipse Visualization,Email Notification,Twilio SMS/MMS Notification,Text Display,Polygon Visualization,Crop Visualization,Absolute Static Crop,Image Preprocessing,Model Monitoring Inference Aggregator,Relative Static Crop,OpenRouter,OpenAI,PLC ModbusTCP,Florence-2 Model,OpenAI,Heatmap Visualization,Motion Detection,OCR Model,Blur Visualization,Dimension Collapse,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,Google Gemini,Clip Comparison,Google Gemini,PLC EthernetIP,Background Subtraction,Keypoint Visualization,Buffer,CSV Formatter,Webhook Sink,Stitch Images,Florence-2 Model,Current Time,Detections List Roll-Up,Contrast Equalization,OpenAI,VLM As Detector,Google Gemini,Triangle Visualization,Slack Notification,SIFT,Local File Sink,Multi-Label Classification Model,Cosine Similarity,Image Contours,Keypoint Detection Model,GLM-OCR,Roboflow Asset Library Attributes,Image Slicer,Polygon Zone Visualization,Contrast Enhancement,Google Gemma API,Semantic Segmentation Model,Stitch OCR Detections,Image Threshold,Line Counter Visualization,Semantic Segmentation Model,Multi-Label Classification Model,Camera Calibration,QR Code Generator,S3 Sink,Microsoft SQL Server Sink,Google Vision OCR,Twilio SMS Notification,Image Blur,Morphological Transformation,Camera Focus,Size Measurement,Roboflow Vision Events,Stability AI Inpainting,Classification Label Visualization,Stitch OCR Detections,Event Writer,Grid Visualization,Qwen3.5-VL,Mask Visualization,Llama 3.2 Vision,Reference Path Visualization,Image Slicer,Label Visualization,OPC UA Writer Sink,Dot Visualization,Identify Changes,Dynamic Crop,Circle Visualization,Llama 3.2 Vision,Camera Focus,Gaze Detection,MoonshotAI Kimi,OpenAI-Compatible LLM,Single-Label Classification Model,CogVLM,Object Detection Model,Qwen 3.6 API,Bounding Box Visualization,Multi-Label Classification Model,LMM,OpenAI,Image Convert Grayscale,Instance Segmentation Model,Roboflow Visual Search,EasyOCR,Roboflow Dataset Upload,Instance Segmentation Model,Pixelate Visualization,Keypoint Detection Model,Roboflow Dataset Upload,PLC Writer,Instance Segmentation Model,Qwen 3.5 API,Anthropic Claude,Object Detection Model,MQTT Writer,Polygon Visualization,Model Comparison Visualization,Single-Label Classification Model - outputs:
Line Counter,Time in Zone,Path Deviation,Trace Visualization,Distance Measurement,Detection Offset,ByteTrack Tracker,Detection Event Log,Per-Class Confidence Filter,Icon Visualization,Detections Transformation,Color Visualization,Perspective Correction,Corner Visualization,Mask Area Measurement,Roboflow Custom Metadata,Detections Merge,Halo Visualization,Dynamic Zone,Detections Combine,Roboflow Vision Events,Size Measurement,Halo Visualization,Stability AI Inpainting,PTZ Tracking (ONVIF),Bounding Rectangle,SAM2 Video Tracker,Event Writer,Mask Visualization,Byte Tracker,Background Color Visualization,Ellipse Visualization,Velocity,Label Visualization,Byte Tracker,Dot Visualization,Polygon Visualization,Crop Visualization,Dynamic Crop,Path Deviation,Circle Visualization,Detections Stitch,BoT-SORT Tracker,Model Monitoring Inference Aggregator,Camera Focus,Segment Anything 2 Model,Florence-2 Model,Heatmap Visualization,Detections Filter,Overlap Analysis,Blur Visualization,SAM 3 Interactive,Detections Consensus,Byte Tracker,Bounding Box Visualization,Florence-2 Model,Detections List Roll-Up,Mask Edge Snap,Line Counter,Triangle Visualization,Overlap Filter,Roboflow Dataset Upload,Time in Zone,Detections Classes Replacement,Pixelate Visualization,Roboflow Dataset Upload,Detections Stabilizer,SORT Tracker,Track Class Lock,Time in Zone,Polygon Visualization,OC-SORT Tracker,Model Comparison Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
SAM3 Video Tracker in version v1 has.
Bindings
-
input
images(image): The image to infer on..class_names(Union[list_of_values,string]): Concepts to segment and track, as a list of phrases (or a single comma-separated string). Each emitted mask carries the concept it matched as its class name..model_id(roboflow_model_id): Streaming SAM3 model id resolved byinference_models..threshold(float): Minimum detection score for emitted masks. Scores come from SAM3's per-object concept detection head..
-
output
predictions(instance_segmentation_prediction): Prediction with detected bounding boxes and segmentation masks in form of sv.Detections(...) object.
Example JSON definition of step SAM3 Video Tracker in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/sam3_video@v1",
"images": "$inputs.image",
"class_names": [
"person",
"forklift"
],
"model_id": "sam3video",
"threshold": 0.5
}