VLM as Detector¶
v2¶
Class: VLMAsDetectorBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector in version v2.
- inputs:
Keypoint Visualization,Blur Visualization,Camera Focus,Background Subtraction,Dot Visualization,Line Counter Visualization,Size Measurement,Crop Visualization,Stitch Images,Dynamic Crop,Circle Visualization,Florence-2 Model,Camera Calibration,Absolute Static Crop,Mask Visualization,Google Gemini,Ellipse Visualization,Image Contours,Anthropic Claude,Color Visualization,Image Preprocessing,Halo Visualization,Model Comparison Visualization,SIFT Comparison,Llama 3.2 Vision,OpenAI,OpenAI,Motion Detection,Florence-2 Model,Trace Visualization,Depth Estimation,Perspective Correction,Clip Comparison,Dynamic Zone,Clip Comparison,Grid Visualization,Stability AI Outpainting,Contrast Equalization,Morphological Transformation,Google Gemini,QR Code Generator,Polygon Zone Visualization,Reference Path Visualization,Buffer,Stability AI Image Generation,Stability AI Inpainting,Bounding Box Visualization,Image Slicer,Corner Visualization,Polygon Visualization,Image Blur,Image Threshold,Relative Static Crop,Image Slicer,Pixelate Visualization,Triangle Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Label Visualization,Image Convert Grayscale,SIFT,Icon Visualization,Background Color Visualization,Classification Label Visualization - outputs:
Blur Visualization,Line Counter Visualization,Detections Stitch,Velocity,Instance Segmentation Model,Line Counter,Time in Zone,Multi-Label Classification Model,Instance Segmentation Model,Multi-Label Classification Model,Dynamic Crop,Circle Visualization,Webhook Sink,Mask Visualization,Single-Label Classification Model,Ellipse Visualization,Email Notification,Color Visualization,Keypoint Detection Model,Detections Classes Replacement,Detections Consensus,SAM 3,SAM 3,Roboflow Custom Metadata,Twilio SMS Notification,Florence-2 Model,Trace Visualization,Line Counter,Perspective Correction,Dynamic Zone,Detections Transformation,Polygon Zone Visualization,Reference Path Visualization,PTZ Tracking (ONVIF).md),Roboflow Dataset Upload,Stability AI Inpainting,Stitch OCR Detections,Bounding Box Visualization,Corner Visualization,Polygon Visualization,Detections Merge,Slack Notification,Path Deviation,Detections Combine,Triangle Visualization,Model Monitoring Inference Aggregator,Pixelate Visualization,Icon Visualization,Time in Zone,Background Color Visualization,Classification Label Visualization,Overlap Filter,Keypoint Visualization,Size Measurement,Dot Visualization,Object Detection Model,Email Notification,Crop Visualization,Object Detection Model,Time in Zone,Template Matching,Florence-2 Model,Keypoint Detection Model,Single-Label Classification Model,Roboflow Dataset Upload,Halo Visualization,Detections Filter,Model Comparison Visualization,SIFT Comparison,Distance Measurement,Path Deviation,Motion Detection,Byte Tracker,Segment Anything 2 Model,Byte Tracker,Byte Tracker,Detections Stabilizer,Detection Offset,Label Visualization,Gaze Detection
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector in version v2 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id(inference_id): Inference identifier.
Example JSON definition of step VLM as Detector in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}
v1¶
Class: VLMAsDetectorBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector in version v1.
- inputs:
Keypoint Visualization,Blur Visualization,Camera Focus,Background Subtraction,Dot Visualization,Line Counter Visualization,Size Measurement,Crop Visualization,Stitch Images,Dynamic Crop,Circle Visualization,Florence-2 Model,Camera Calibration,Absolute Static Crop,Mask Visualization,Google Gemini,Ellipse Visualization,Image Contours,Anthropic Claude,Color Visualization,Image Preprocessing,Halo Visualization,Model Comparison Visualization,SIFT Comparison,Llama 3.2 Vision,OpenAI,OpenAI,Motion Detection,Florence-2 Model,Trace Visualization,Depth Estimation,Perspective Correction,Clip Comparison,Dynamic Zone,Clip Comparison,Grid Visualization,Stability AI Outpainting,Contrast Equalization,Morphological Transformation,Google Gemini,QR Code Generator,Polygon Zone Visualization,Reference Path Visualization,Buffer,Stability AI Image Generation,Stability AI Inpainting,Bounding Box Visualization,Image Slicer,Corner Visualization,Polygon Visualization,Image Blur,Image Threshold,Relative Static Crop,Image Slicer,Pixelate Visualization,Triangle Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Label Visualization,Image Convert Grayscale,SIFT,Icon Visualization,Background Color Visualization,Classification Label Visualization - outputs:
Velocity,Line Counter,Multi-Label Classification Model,Instance Segmentation Model,Multi-Label Classification Model,Dynamic Crop,Single-Label Classification Model,Anthropic Claude,Image Preprocessing,Detections Classes Replacement,OpenAI,Line Counter,Twilio SMS Notification,Florence-2 Model,Dynamic Zone,Contrast Equalization,Polygon Zone Visualization,Reference Path Visualization,Roboflow Dataset Upload,Stitch OCR Detections,Polygon Visualization,CogVLM,Path Deviation,Detections Combine,Background Color Visualization,Keypoint Visualization,Size Measurement,Dot Visualization,Pixel Color Count,Object Detection Model,Object Detection Model,Time in Zone,Florence-2 Model,LMM For Classification,Keypoint Detection Model,Single-Label Classification Model,Roboflow Dataset Upload,YOLO-World Model,Halo Visualization,Detections Filter,Model Comparison Visualization,CLIP Embedding Model,SIFT Comparison,Distance Measurement,Path Deviation,Motion Detection,Byte Tracker,Seg Preview,Segment Anything 2 Model,Byte Tracker,Detections Stabilizer,Image Blur,Cache Get,Anthropic Claude,Blur Visualization,Line Counter Visualization,Moondream2,Detections Stitch,Instance Segmentation Model,LMM,Time in Zone,Circle Visualization,Webhook Sink,Mask Visualization,Google Gemini,Ellipse Visualization,Email Notification,Color Visualization,Keypoint Detection Model,Detections Consensus,Llama 3.2 Vision,SAM 3,OpenAI,SAM 3,Roboflow Custom Metadata,Trace Visualization,Perspective Correction,Perception Encoder Embedding Model,Stability AI Outpainting,Google Gemini,Detections Transformation,QR Code Generator,PTZ Tracking (ONVIF).md),Stability AI Inpainting,Bounding Box Visualization,Corner Visualization,Local File Sink,Detections Merge,Slack Notification,Triangle Visualization,Model Monitoring Inference Aggregator,Pixelate Visualization,Icon Visualization,Time in Zone,Overlap Filter,Classification Label Visualization,Email Notification,Crop Visualization,Template Matching,OpenAI,SAM 3,Clip Comparison,Cache Set,Morphological Transformation,Google Vision OCR,Stability AI Image Generation,Byte Tracker,Image Threshold,Detection Offset,OpenAI,Label Visualization,Gaze Detection
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector in version v1 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id(string): String value.
Example JSON definition of step VLM as Detector in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}