VLM as Detector¶
v2¶
Class: VLMAsDetectorBlockV2
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_status
is setTrue
whenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/vlm_as_detector@v2
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector
in version v2
.
- inputs:
Bounding Box Visualization
,Stitch Images
,SIFT
,Color Visualization
,Stability AI Outpainting
,Llama 3.2 Vision
,Blur Visualization
,Image Threshold
,Model Comparison Visualization
,SIFT Comparison
,Anthropic Claude
,Image Slicer
,Corner Visualization
,Background Color Visualization
,Dimension Collapse
,Image Contours
,Camera Focus
,Absolute Static Crop
,Label Visualization
,Florence-2 Model
,Line Counter Visualization
,Mask Visualization
,QR Code Generator
,Classification Label Visualization
,Buffer
,Trace Visualization
,Reference Path Visualization
,Polygon Visualization
,Perspective Correction
,Florence-2 Model
,Camera Calibration
,Image Blur
,Dynamic Crop
,OpenAI
,Circle Visualization
,Grid Visualization
,Clip Comparison
,Image Convert Grayscale
,Image Slicer
,Depth Estimation
,Dot Visualization
,Image Preprocessing
,OpenAI
,Google Gemini
,Relative Static Crop
,Ellipse Visualization
,Stability AI Inpainting
,Keypoint Visualization
,Dynamic Zone
,Halo Visualization
,Polygon Zone Visualization
,Icon Visualization
,Triangle Visualization
,Clip Comparison
,Crop Visualization
,Stability AI Image Generation
,Size Measurement
,Pixelate Visualization
- outputs:
Model Monitoring Inference Aggregator
,Bounding Box Visualization
,Twilio SMS Notification
,Keypoint Detection Model
,Detections Merge
,Overlap Filter
,Detections Stabilizer
,Model Comparison Visualization
,SIFT Comparison
,Gaze Detection
,Corner Visualization
,Background Color Visualization
,Distance Measurement
,Time in Zone
,Mask Visualization
,Classification Label Visualization
,Detections Transformation
,Trace Visualization
,Polygon Visualization
,Perspective Correction
,Instance Segmentation Model
,Florence-2 Model
,Path Deviation
,Instance Segmentation Model
,Byte Tracker
,PTZ Tracking (ONVIF)
.md),Line Counter
,Dot Visualization
,Ellipse Visualization
,Object Detection Model
,Keypoint Detection Model
,Dynamic Zone
,Time in Zone
,Polygon Zone Visualization
,Halo Visualization
,Icon Visualization
,Triangle Visualization
,Crop Visualization
,Size Measurement
,Slack Notification
,Pixelate Visualization
,Single-Label Classification Model
,Path Deviation
,Stitch OCR Detections
,Single-Label Classification Model
,Color Visualization
,Email Notification
,Blur Visualization
,Florence-2 Model
,Label Visualization
,Line Counter Visualization
,Detections Consensus
,Detection Offset
,Multi-Label Classification Model
,Reference Path Visualization
,Velocity
,Dynamic Crop
,Byte Tracker
,Roboflow Dataset Upload
,Circle Visualization
,Segment Anything 2 Model
,Template Matching
,Webhook Sink
,Byte Tracker
,Stability AI Inpainting
,Line Counter
,Keypoint Visualization
,Multi-Label Classification Model
,Detections Classes Replacement
,Roboflow Dataset Upload
,Object Detection Model
,Detections Filter
,Detections Stitch
,Roboflow Custom Metadata
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector
in version v2
has.
Bindings
-
input
image
(image
): The image which was the base to generate VLM prediction.vlm_output
(language_model_output
): The string with raw classification prediction to parse..classes
(list_of_values
): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status
(boolean
): Boolean flag.predictions
(object_detection_prediction
): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id
(inference_id
): Inference identifier.
Example JSON definition of step VLM as Detector
in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}
v1¶
Class: VLMAsDetectorBlockV1
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_status
is setTrue
whenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/vlm_as_detector@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector
in version v1
.
- inputs:
Bounding Box Visualization
,Stitch Images
,SIFT
,Color Visualization
,Stability AI Outpainting
,Llama 3.2 Vision
,Blur Visualization
,Image Threshold
,Model Comparison Visualization
,SIFT Comparison
,Anthropic Claude
,Image Slicer
,Corner Visualization
,Background Color Visualization
,Dimension Collapse
,Image Contours
,Camera Focus
,Absolute Static Crop
,Label Visualization
,Florence-2 Model
,Line Counter Visualization
,Mask Visualization
,QR Code Generator
,Classification Label Visualization
,Buffer
,Trace Visualization
,Reference Path Visualization
,Polygon Visualization
,Perspective Correction
,Florence-2 Model
,Camera Calibration
,Image Blur
,Dynamic Crop
,OpenAI
,Circle Visualization
,Grid Visualization
,Clip Comparison
,Image Convert Grayscale
,Image Slicer
,Depth Estimation
,Dot Visualization
,Image Preprocessing
,OpenAI
,Google Gemini
,Relative Static Crop
,Ellipse Visualization
,Stability AI Inpainting
,Keypoint Visualization
,Dynamic Zone
,Halo Visualization
,Polygon Zone Visualization
,Icon Visualization
,Triangle Visualization
,Clip Comparison
,Crop Visualization
,Stability AI Image Generation
,Size Measurement
,Pixelate Visualization
- outputs:
Model Monitoring Inference Aggregator
,Bounding Box Visualization
,Twilio SMS Notification
,Llama 3.2 Vision
,Keypoint Detection Model
,Stability AI Outpainting
,Detections Merge
,Overlap Filter
,Moondream2
,Image Threshold
,Detections Stabilizer
,Model Comparison Visualization
,SIFT Comparison
,LMM
,Gaze Detection
,Corner Visualization
,Background Color Visualization
,Distance Measurement
,Time in Zone
,CogVLM
,Mask Visualization
,QR Code Generator
,Classification Label Visualization
,Detections Transformation
,Trace Visualization
,Polygon Visualization
,Perspective Correction
,Instance Segmentation Model
,Florence-2 Model
,Path Deviation
,Local File Sink
,Clip Comparison
,Instance Segmentation Model
,Byte Tracker
,PTZ Tracking (ONVIF)
.md),LMM For Classification
,Line Counter
,Dot Visualization
,Google Gemini
,Ellipse Visualization
,Object Detection Model
,Keypoint Detection Model
,Pixel Color Count
,Dynamic Zone
,Time in Zone
,Polygon Zone Visualization
,Halo Visualization
,Icon Visualization
,Triangle Visualization
,Crop Visualization
,Size Measurement
,Slack Notification
,Pixelate Visualization
,Single-Label Classification Model
,Path Deviation
,CLIP Embedding Model
,Stitch OCR Detections
,Single-Label Classification Model
,Color Visualization
,Email Notification
,Blur Visualization
,Anthropic Claude
,Florence-2 Model
,Label Visualization
,Line Counter Visualization
,Detections Consensus
,Detection Offset
,Cache Get
,Multi-Label Classification Model
,Reference Path Visualization
,Velocity
,Image Blur
,Dynamic Crop
,Byte Tracker
,OpenAI
,Roboflow Dataset Upload
,Circle Visualization
,Segment Anything 2 Model
,Template Matching
,Cache Set
,Webhook Sink
,YOLO-World Model
,Byte Tracker
,OpenAI
,OpenAI
,Image Preprocessing
,Stability AI Inpainting
,Line Counter
,Keypoint Visualization
,Multi-Label Classification Model
,Detections Classes Replacement
,Roboflow Dataset Upload
,Object Detection Model
,Stability AI Image Generation
,Detections Filter
,Detections Stitch
,Perception Encoder Embedding Model
,Roboflow Custom Metadata
,Google Vision OCR
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector
in version v1
has.
Bindings
-
input
image
(image
): The image which was the base to generate VLM prediction.vlm_output
(language_model_output
): The string with raw classification prediction to parse..classes
(list_of_values
): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status
(boolean
): Boolean flag.predictions
(object_detection_prediction
): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id
(string
): String value.
Example JSON definition of step VLM as Detector
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}