VLM as Detector¶
v2¶
Class: VLMAsDetectorBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector in version v2.
- inputs:
Size Measurement,Absolute Static Crop,Relative Static Crop,Polygon Visualization,Keypoint Visualization,Clip Comparison,Llama 3.2 Vision,Icon Visualization,Dimension Collapse,Blur Visualization,Trace Visualization,Color Visualization,Image Contours,Clip Comparison,Polygon Zone Visualization,Camera Focus,Halo Visualization,Bounding Box Visualization,SIFT,Camera Calibration,Dynamic Zone,Triangle Visualization,Classification Label Visualization,Background Color Visualization,Dynamic Crop,Dot Visualization,Pixelate Visualization,Stability AI Inpainting,Image Threshold,Reference Path Visualization,Corner Visualization,Buffer,Ellipse Visualization,OpenAI,Image Slicer,Stitch Images,Crop Visualization,Morphological Transformation,Grid Visualization,Image Preprocessing,Mask Visualization,Line Counter Visualization,SIFT Comparison,OpenAI,Florence-2 Model,QR Code Generator,Depth Estimation,Image Slicer,Google Gemini,Perspective Correction,Image Convert Grayscale,Stability AI Image Generation,Contrast Equalization,Label Visualization,Anthropic Claude,Image Blur,Model Comparison Visualization,Circle Visualization,Stability AI Outpainting,Florence-2 Model - outputs:
Size Measurement,Keypoint Visualization,Byte Tracker,Object Detection Model,Slack Notification,Trace Visualization,Color Visualization,Instance Segmentation Model,Polygon Zone Visualization,Halo Visualization,Dynamic Zone,Triangle Visualization,Single-Label Classification Model,Segment Anything 2 Model,Stability AI Inpainting,Reference Path Visualization,Corner Visualization,Detections Classes Replacement,Ellipse Visualization,Gaze Detection,Single-Label Classification Model,Time in Zone,Detection Offset,Roboflow Custom Metadata,Line Counter Visualization,Florence-2 Model,Roboflow Dataset Upload,Stitch OCR Detections,Path Deviation,Label Visualization,PTZ Tracking (ONVIF).md),Model Comparison Visualization,Multi-Label Classification Model,Multi-Label Classification Model,Roboflow Dataset Upload,Template Matching,Line Counter,Line Counter,Model Monitoring Inference Aggregator,Polygon Visualization,Time in Zone,Instance Segmentation Model,Path Deviation,Velocity,Detections Stabilizer,Icon Visualization,Time in Zone,Blur Visualization,Object Detection Model,Overlap Filter,Twilio SMS Notification,Bounding Box Visualization,Classification Label Visualization,Background Color Visualization,Webhook Sink,Dynamic Crop,Dot Visualization,Pixelate Visualization,Detections Consensus,Byte Tracker,Email Notification,Detections Combine,Crop Visualization,Detections Filter,Keypoint Detection Model,SIFT Comparison,Mask Visualization,Detections Merge,Detections Transformation,Perspective Correction,Keypoint Detection Model,Distance Measurement,Circle Visualization,Detections Stitch,Florence-2 Model,Byte Tracker
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector in version v2 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id(inference_id): Inference identifier.
Example JSON definition of step VLM as Detector in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}
v1¶
Class: VLMAsDetectorBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector in version v1.
- inputs:
Size Measurement,Absolute Static Crop,Relative Static Crop,Polygon Visualization,Keypoint Visualization,Clip Comparison,Llama 3.2 Vision,Icon Visualization,Dimension Collapse,Blur Visualization,Trace Visualization,Color Visualization,Image Contours,Clip Comparison,Polygon Zone Visualization,Camera Focus,Halo Visualization,Bounding Box Visualization,SIFT,Camera Calibration,Dynamic Zone,Triangle Visualization,Classification Label Visualization,Background Color Visualization,Dynamic Crop,Dot Visualization,Pixelate Visualization,Stability AI Inpainting,Image Threshold,Reference Path Visualization,Corner Visualization,Buffer,Ellipse Visualization,OpenAI,Image Slicer,Stitch Images,Crop Visualization,Morphological Transformation,Grid Visualization,Image Preprocessing,Mask Visualization,Line Counter Visualization,SIFT Comparison,OpenAI,Florence-2 Model,QR Code Generator,Depth Estimation,Image Slicer,Google Gemini,Perspective Correction,Image Convert Grayscale,Stability AI Image Generation,Contrast Equalization,Label Visualization,Anthropic Claude,Image Blur,Model Comparison Visualization,Circle Visualization,Stability AI Outpainting,Florence-2 Model - outputs:
LMM For Classification,Trace Visualization,Color Visualization,Instance Segmentation Model,Polygon Zone Visualization,Halo Visualization,Triangle Visualization,Single-Label Classification Model,Image Threshold,Detections Classes Replacement,Gaze Detection,Single-Label Classification Model,Detection Offset,Morphological Transformation,Roboflow Custom Metadata,Image Preprocessing,Cache Set,Line Counter Visualization,Stitch OCR Detections,Path Deviation,PTZ Tracking (ONVIF).md),Model Comparison Visualization,Multi-Label Classification Model,Multi-Label Classification Model,Roboflow Dataset Upload,Template Matching,Line Counter,Path Deviation,Polygon Visualization,Detections Stabilizer,Llama 3.2 Vision,Icon Visualization,Local File Sink,Twilio SMS Notification,Classification Label Visualization,Background Color Visualization,Webhook Sink,Dynamic Crop,Pixelate Visualization,Detections Consensus,Email Notification,Crop Visualization,Keypoint Detection Model,Mask Visualization,Detections Merge,Detections Transformation,Perspective Correction,Keypoint Detection Model,Distance Measurement,Circle Visualization,Stability AI Outpainting,Byte Tracker,Size Measurement,Keypoint Visualization,Byte Tracker,Object Detection Model,Google Vision OCR,Slack Notification,Dynamic Zone,Segment Anything 2 Model,Stability AI Inpainting,Reference Path Visualization,Corner Visualization,Ellipse Visualization,OpenAI,Time in Zone,CogVLM,OpenAI,YOLO-World Model,Florence-2 Model,Roboflow Dataset Upload,Label Visualization,OpenAI,Line Counter,Model Monitoring Inference Aggregator,Time in Zone,Velocity,Instance Segmentation Model,Time in Zone,Blur Visualization,Clip Comparison,Object Detection Model,Overlap Filter,Moondream2,Bounding Box Visualization,Dot Visualization,Byte Tracker,CLIP Embedding Model,Detections Combine,Detections Filter,Perception Encoder Embedding Model,SIFT Comparison,Pixel Color Count,QR Code Generator,Google Gemini,Stability AI Image Generation,Contrast Equalization,Anthropic Claude,Image Blur,Cache Get,LMM,Detections Stitch,Florence-2 Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector in version v1 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id(string): String value.
Example JSON definition of step VLM as Detector in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}