VLM as Detector¶
v2¶
Class: VLMAsDetectorBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector in version v2.
- inputs:
QR Code Generator,Image Convert Grayscale,Google Gemini,Dynamic Crop,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Camera Focus,Keypoint Visualization,Trace Visualization,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Triangle Visualization,Polygon Zone Visualization,Halo Visualization,Stability AI Image Generation,Florence-2 Model,Circle Visualization,Google Gemini,Motion Detection,Clip Comparison,Camera Focus,Anthropic Claude,Perspective Correction,Reference Path Visualization,Corner Visualization,Color Visualization,Image Slicer,OpenAI,Camera Calibration,Image Blur,Buffer,Dot Visualization,Image Threshold,Morphological Transformation,Label Visualization,Background Color Visualization,Classification Label Visualization,Mask Visualization,Dynamic Zone,Detections List Roll-Up,Pixelate Visualization,Absolute Static Crop,Size Measurement,Grid Visualization,Contrast Equalization,Image Preprocessing,Google Gemini,Relative Static Crop,Stability AI Inpainting,Image Contours,Line Counter Visualization,Stitch Images,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,SIFT Comparison,Depth Estimation,Florence-2 Model,Background Subtraction,Image Slicer - outputs:
Detections Combine,Detections Stitch,Byte Tracker,Velocity,Dynamic Crop,Blur Visualization,Detection Offset,Bounding Box Visualization,Slack Notification,Keypoint Visualization,Trace Visualization,Instance Segmentation Model,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,Triangle Visualization,Polygon Zone Visualization,SAM 3,Halo Visualization,Distance Measurement,Byte Tracker,Time in Zone,Path Deviation,Florence-2 Model,Single-Label Classification Model,Circle Visualization,Email Notification,Detections Stabilizer,Motion Detection,Camera Focus,Detections Classes Replacement,Detections Filter,Object Detection Model,Instance Segmentation Model,Perspective Correction,Reference Path Visualization,Twilio SMS/MMS Notification,Color Visualization,Corner Visualization,Multi-Label Classification Model,Stitch OCR Detections,Line Counter,Detections Transformation,Roboflow Custom Metadata,Dot Visualization,Model Monitoring Inference Aggregator,Label Visualization,Background Color Visualization,Time in Zone,Classification Label Visualization,Byte Tracker,Keypoint Detection Model,Roboflow Dataset Upload,Path Deviation,Mask Visualization,Dynamic Zone,Detections List Roll-Up,Pixelate Visualization,PTZ Tracking (ONVIF).md),Line Counter,Keypoint Detection Model,Detections Consensus,Size Measurement,Webhook Sink,Stability AI Inpainting,Template Matching,Line Counter Visualization,Crop Visualization,Icon Visualization,SIFT Comparison,Gaze Detection,Twilio SMS Notification,Single-Label Classification Model,Time in Zone,Florence-2 Model,SAM 3,Object Detection Model,Multi-Label Classification Model,Segment Anything 2 Model,Detections Merge,Roboflow Dataset Upload,Overlap Filter,Email Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector in version v2 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id(inference_id): Inference identifier.
Example JSON definition of step VLM as Detector in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}
v1¶
Class: VLMAsDetectorBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector in version v1.
- inputs:
QR Code Generator,Image Convert Grayscale,Google Gemini,Dynamic Crop,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Camera Focus,Keypoint Visualization,Trace Visualization,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Triangle Visualization,Polygon Zone Visualization,Halo Visualization,Stability AI Image Generation,Florence-2 Model,Circle Visualization,Google Gemini,Motion Detection,Clip Comparison,Camera Focus,Anthropic Claude,Perspective Correction,Reference Path Visualization,Corner Visualization,Color Visualization,Image Slicer,OpenAI,Camera Calibration,Image Blur,Buffer,Dot Visualization,Image Threshold,Morphological Transformation,Label Visualization,Background Color Visualization,Classification Label Visualization,Mask Visualization,Dynamic Zone,Detections List Roll-Up,Pixelate Visualization,Absolute Static Crop,Size Measurement,Grid Visualization,Contrast Equalization,Image Preprocessing,Google Gemini,Relative Static Crop,Stability AI Inpainting,Image Contours,Line Counter Visualization,Stitch Images,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,SIFT Comparison,Depth Estimation,Florence-2 Model,Background Subtraction,Image Slicer - outputs:
QR Code Generator,Google Gemini,Velocity,Blur Visualization,Detection Offset,Bounding Box Visualization,Stability AI Outpainting,Trace Visualization,Instance Segmentation Model,Pixel Color Count,Ellipse Visualization,Model Comparison Visualization,OpenAI,Triangle Visualization,SAM 3,Distance Measurement,Byte Tracker,Stability AI Image Generation,Path Deviation,Florence-2 Model,CLIP Embedding Model,Single-Label Classification Model,Email Notification,Google Gemini,Camera Focus,Twilio SMS/MMS Notification,Color Visualization,Corner Visualization,Multi-Label Classification Model,OpenAI,Line Counter,Detections Transformation,SAM 3,Roboflow Custom Metadata,Dot Visualization,Image Threshold,Model Monitoring Inference Aggregator,Time in Zone,Classification Label Visualization,Byte Tracker,Roboflow Dataset Upload,Mask Visualization,Detections List Roll-Up,Pixelate Visualization,Line Counter,Keypoint Detection Model,Detections Consensus,Size Measurement,Webhook Sink,Stability AI Inpainting,Template Matching,Line Counter Visualization,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Time in Zone,LMM For Classification,SAM 3,Object Detection Model,CogVLM,Roboflow Dataset Upload,Overlap Filter,Cache Get,Detections Combine,Detections Stitch,Seg Preview,Byte Tracker,YOLO-World Model,Dynamic Crop,Slack Notification,Keypoint Visualization,Polygon Visualization,Cache Set,Anthropic Claude,Local File Sink,Polygon Zone Visualization,Halo Visualization,LMM,Time in Zone,Circle Visualization,Detections Stabilizer,Google Vision OCR,Motion Detection,Clip Comparison,Detections Classes Replacement,Anthropic Claude,Detections Filter,Object Detection Model,Instance Segmentation Model,Perspective Correction,Perception Encoder Embedding Model,Reference Path Visualization,Stitch OCR Detections,Image Blur,Morphological Transformation,Label Visualization,Background Color Visualization,Keypoint Detection Model,Path Deviation,Dynamic Zone,PTZ Tracking (ONVIF).md),Moondream2,Image Preprocessing,Contrast Equalization,Google Gemini,OpenAI,SIFT Comparison,Gaze Detection,Depth Estimation,Twilio SMS Notification,Single-Label Classification Model,Florence-2 Model,Multi-Label Classification Model,Detections Merge,Segment Anything 2 Model,Email Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector in version v1 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id(string): String value.
Example JSON definition of step VLM as Detector in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}