VLM as Detector¶
v2¶
Class: VLMAsDetectorBlockV2
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_status
is setTrue
whenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/vlm_as_detector@v2
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector
in version v2
.
- inputs:
Stability AI Inpainting
,Florence-2 Model
,Label Visualization
,Depth Estimation
,Corner Visualization
,Triangle Visualization
,Florence-2 Model
,Background Color Visualization
,Image Blur
,Polygon Zone Visualization
,Model Comparison Visualization
,Line Counter Visualization
,Camera Focus
,Circle Visualization
,Perspective Correction
,Relative Static Crop
,Grid Visualization
,Stability AI Image Generation
,Trace Visualization
,Image Slicer
,Clip Comparison
,Blur Visualization
,Dynamic Zone
,Classification Label Visualization
,Image Convert Grayscale
,Image Preprocessing
,Clip Comparison
,Dimension Collapse
,SIFT Comparison
,OpenAI
,Stitch Images
,Reference Path Visualization
,Stability AI Outpainting
,Llama 3.2 Vision
,Anthropic Claude
,Size Measurement
,Polygon Visualization
,Camera Calibration
,Buffer
,Mask Visualization
,SIFT
,Bounding Box Visualization
,Image Threshold
,Keypoint Visualization
,Ellipse Visualization
,Crop Visualization
,Color Visualization
,Pixelate Visualization
,Image Slicer
,Google Gemini
,Dynamic Crop
,Image Contours
,Absolute Static Crop
,OpenAI
,Halo Visualization
,Dot Visualization
- outputs:
Florence-2 Model
,Model Monitoring Inference Aggregator
,Label Visualization
,Florence-2 Model
,Triangle Visualization
,Model Comparison Visualization
,Detections Transformation
,Line Counter Visualization
,Circle Visualization
,Detections Stitch
,Trace Visualization
,Byte Tracker
,Multi-Label Classification Model
,Object Detection Model
,Path Deviation
,Detections Consensus
,Velocity
,Gaze Detection
,Detection Offset
,Reference Path Visualization
,Detections Filter
,Polygon Visualization
,Roboflow Dataset Upload
,Segment Anything 2 Model
,Time in Zone
,Detections Merge
,Roboflow Custom Metadata
,Single-Label Classification Model
,Path Deviation
,Keypoint Visualization
,Ellipse Visualization
,Crop Visualization
,Color Visualization
,Dynamic Crop
,Instance Segmentation Model
,Multi-Label Classification Model
,Dot Visualization
,Instance Segmentation Model
,Roboflow Dataset Upload
,Keypoint Detection Model
,Keypoint Detection Model
,Stability AI Inpainting
,Line Counter
,Single-Label Classification Model
,Template Matching
,Corner Visualization
,Overlap Filter
,Background Color Visualization
,Polygon Zone Visualization
,Byte Tracker
,Perspective Correction
,Line Counter
,Blur Visualization
,Dynamic Zone
,Classification Label Visualization
,Time in Zone
,Slack Notification
,SIFT Comparison
,Byte Tracker
,Detections Stabilizer
,Size Measurement
,Webhook Sink
,Mask Visualization
,Bounding Box Visualization
,Distance Measurement
,Pixelate Visualization
,Twilio SMS Notification
,Email Notification
,PTZ Tracking (ONVIF)
.md),Object Detection Model
,Stitch OCR Detections
,Detections Classes Replacement
,Halo Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector
in version v2
has.
Bindings
-
input
image
(image
): The image which was the base to generate VLM prediction.vlm_output
(language_model_output
): The string with raw classification prediction to parse..classes
(list_of_values
): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status
(boolean
): Boolean flag.predictions
(object_detection_prediction
): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id
(inference_id
): Inference identifier.
Example JSON definition of step VLM as Detector
in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}
v1¶
Class: VLMAsDetectorBlockV1
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_status
is setTrue
whenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/vlm_as_detector@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector
in version v1
.
- inputs:
Stability AI Inpainting
,Florence-2 Model
,Label Visualization
,Depth Estimation
,Corner Visualization
,Triangle Visualization
,Florence-2 Model
,Background Color Visualization
,Image Blur
,Polygon Zone Visualization
,Model Comparison Visualization
,Line Counter Visualization
,Camera Focus
,Circle Visualization
,Perspective Correction
,Relative Static Crop
,Grid Visualization
,Stability AI Image Generation
,Trace Visualization
,Image Slicer
,Clip Comparison
,Blur Visualization
,Dynamic Zone
,Classification Label Visualization
,Image Convert Grayscale
,Image Preprocessing
,Clip Comparison
,Dimension Collapse
,SIFT Comparison
,OpenAI
,Stitch Images
,Reference Path Visualization
,Stability AI Outpainting
,Llama 3.2 Vision
,Anthropic Claude
,Size Measurement
,Polygon Visualization
,Camera Calibration
,Buffer
,Mask Visualization
,SIFT
,Bounding Box Visualization
,Image Threshold
,Keypoint Visualization
,Ellipse Visualization
,Crop Visualization
,Color Visualization
,Pixelate Visualization
,Image Slicer
,Google Gemini
,Dynamic Crop
,Image Contours
,Absolute Static Crop
,OpenAI
,Halo Visualization
,Dot Visualization
- outputs:
Florence-2 Model
,Model Monitoring Inference Aggregator
,Label Visualization
,Florence-2 Model
,Triangle Visualization
,CogVLM
,Image Blur
,Model Comparison Visualization
,Cache Set
,Detections Transformation
,Line Counter Visualization
,Circle Visualization
,Detections Stitch
,Trace Visualization
,Byte Tracker
,Multi-Label Classification Model
,Object Detection Model
,Path Deviation
,Detections Consensus
,Velocity
,Gaze Detection
,Detection Offset
,Reference Path Visualization
,Llama 3.2 Vision
,Detections Filter
,Polygon Visualization
,Roboflow Dataset Upload
,Segment Anything 2 Model
,Time in Zone
,Detections Merge
,Roboflow Custom Metadata
,Single-Label Classification Model
,Path Deviation
,Image Threshold
,CLIP Embedding Model
,Keypoint Visualization
,Ellipse Visualization
,Crop Visualization
,Color Visualization
,Local File Sink
,Google Gemini
,Dynamic Crop
,OpenAI
,Instance Segmentation Model
,Multi-Label Classification Model
,Dot Visualization
,Instance Segmentation Model
,Roboflow Dataset Upload
,Keypoint Detection Model
,Keypoint Detection Model
,Stability AI Inpainting
,Line Counter
,Google Vision OCR
,Single-Label Classification Model
,Template Matching
,Corner Visualization
,Overlap Filter
,Background Color Visualization
,Polygon Zone Visualization
,Byte Tracker
,Stability AI Image Generation
,Perspective Correction
,Cache Get
,Line Counter
,OpenAI
,Clip Comparison
,Blur Visualization
,Dynamic Zone
,Classification Label Visualization
,Time in Zone
,Image Preprocessing
,Slack Notification
,SIFT Comparison
,Byte Tracker
,Detections Stabilizer
,OpenAI
,Pixel Color Count
,YOLO-World Model
,Stability AI Outpainting
,Perception Encoder Embedding Model
,Size Measurement
,Anthropic Claude
,Webhook Sink
,Mask Visualization
,Bounding Box Visualization
,Distance Measurement
,Pixelate Visualization
,Twilio SMS Notification
,Email Notification
,PTZ Tracking (ONVIF)
.md),Object Detection Model
,Stitch OCR Detections
,Detections Classes Replacement
,Halo Visualization
,LMM For Classification
,LMM
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector
in version v1
has.
Bindings
-
input
image
(image
): The image which was the base to generate VLM prediction.vlm_output
(language_model_output
): The string with raw classification prediction to parse..classes
(list_of_values
): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status
(boolean
): Boolean flag.predictions
(object_detection_prediction
): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id
(string
): String value.
Example JSON definition of step VLM as Detector
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}