VLM as Detector¶
v2¶
Class: VLMAsDetectorBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector in version v2.
- inputs:
Grid Visualization,Clip Comparison,Line Counter Visualization,SIFT,Icon Visualization,Stability AI Inpainting,Circle Visualization,Polygon Zone Visualization,OpenAI,QR Code Generator,Image Slicer,Dot Visualization,Blur Visualization,Camera Calibration,Perspective Correction,Clip Comparison,Anthropic Claude,Background Color Visualization,OpenAI,Mask Visualization,Camera Focus,Florence-2 Model,Llama 3.2 Vision,Dynamic Crop,Crop Visualization,Classification Label Visualization,Stability AI Outpainting,Image Preprocessing,Trace Visualization,Color Visualization,Morphological Transformation,Image Threshold,Depth Estimation,Relative Static Crop,Triangle Visualization,Reference Path Visualization,Ellipse Visualization,Stability AI Image Generation,Model Comparison Visualization,Dimension Collapse,Polygon Visualization,Corner Visualization,Halo Visualization,Image Slicer,Stitch Images,Image Blur,Absolute Static Crop,SIFT Comparison,Florence-2 Model,Bounding Box Visualization,Google Gemini,Size Measurement,Image Contours,Pixelate Visualization,Dynamic Zone,Buffer,Label Visualization,Keypoint Visualization,Contrast Equalization,Image Convert Grayscale - outputs:
Detections Filter,Detections Stitch,Detections Classes Replacement,Circle Visualization,Time in Zone,Template Matching,Roboflow Dataset Upload,Model Monitoring Inference Aggregator,Path Deviation,Dot Visualization,Gaze Detection,Overlap Filter,Single-Label Classification Model,Slack Notification,Blur Visualization,Perspective Correction,Roboflow Dataset Upload,Background Color Visualization,Florence-2 Model,Object Detection Model,Keypoint Detection Model,Dynamic Crop,Crop Visualization,Trace Visualization,Velocity,Keypoint Detection Model,Triangle Visualization,Reference Path Visualization,Model Comparison Visualization,Detection Offset,Polygon Visualization,Corner Visualization,Florence-2 Model,Size Measurement,SIFT Comparison,Bounding Box Visualization,Line Counter,Byte Tracker,Dynamic Zone,Stitch OCR Detections,Keypoint Visualization,Detections Transformation,Multi-Label Classification Model,Byte Tracker,Detections Combine,Line Counter Visualization,Distance Measurement,Icon Visualization,Stability AI Inpainting,Polygon Zone Visualization,Line Counter,Webhook Sink,Instance Segmentation Model,Time in Zone,Mask Visualization,Twilio SMS Notification,Detections Consensus,Classification Label Visualization,Multi-Label Classification Model,Time in Zone,Color Visualization,Detections Stabilizer,Instance Segmentation Model,Ellipse Visualization,Segment Anything 2 Model,Email Notification,Halo Visualization,Roboflow Custom Metadata,Object Detection Model,PTZ Tracking (ONVIF).md),Pixelate Visualization,Path Deviation,Label Visualization,Detections Merge,Byte Tracker,Single-Label Classification Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector in version v2 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id(inference_id): Inference identifier.
Example JSON definition of step VLM as Detector in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}
v1¶
Class: VLMAsDetectorBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector in version v1.
- inputs:
Grid Visualization,Clip Comparison,Line Counter Visualization,SIFT,Icon Visualization,Stability AI Inpainting,Circle Visualization,Polygon Zone Visualization,OpenAI,QR Code Generator,Image Slicer,Dot Visualization,Blur Visualization,Camera Calibration,Perspective Correction,Clip Comparison,Anthropic Claude,Background Color Visualization,OpenAI,Mask Visualization,Camera Focus,Florence-2 Model,Llama 3.2 Vision,Dynamic Crop,Crop Visualization,Classification Label Visualization,Stability AI Outpainting,Image Preprocessing,Trace Visualization,Color Visualization,Morphological Transformation,Image Threshold,Depth Estimation,Relative Static Crop,Triangle Visualization,Reference Path Visualization,Ellipse Visualization,Stability AI Image Generation,Model Comparison Visualization,Dimension Collapse,Polygon Visualization,Corner Visualization,Halo Visualization,Image Slicer,Stitch Images,Image Blur,Absolute Static Crop,SIFT Comparison,Florence-2 Model,Bounding Box Visualization,Google Gemini,Size Measurement,Image Contours,Pixelate Visualization,Dynamic Zone,Buffer,Label Visualization,Keypoint Visualization,Contrast Equalization,Image Convert Grayscale - outputs:
CLIP Embedding Model,Detections Stitch,Circle Visualization,Time in Zone,Model Monitoring Inference Aggregator,QR Code Generator,Dot Visualization,Gaze Detection,Overlap Filter,Single-Label Classification Model,Slack Notification,Blur Visualization,Florence-2 Model,Object Detection Model,Keypoint Detection Model,Llama 3.2 Vision,Crop Visualization,Velocity,Image Threshold,Triangle Visualization,Reference Path Visualization,Model Comparison Visualization,Cache Set,Corner Visualization,Size Measurement,Image Blur,SIFT Comparison,Bounding Box Visualization,Dynamic Zone,Stitch OCR Detections,Detections Transformation,Multi-Label Classification Model,YOLO-World Model,Distance Measurement,Icon Visualization,Stability AI Inpainting,Google Vision OCR,Polygon Zone Visualization,Webhook Sink,CogVLM,Time in Zone,Cache Get,Stability AI Outpainting,Classification Label Visualization,Multi-Label Classification Model,Morphological Transformation,Detections Stabilizer,Instance Segmentation Model,Segment Anything 2 Model,Local File Sink,Roboflow Custom Metadata,PTZ Tracking (ONVIF).md),Pixelate Visualization,Pixel Color Count,Path Deviation,Byte Tracker,Single-Label Classification Model,Detections Filter,Detections Classes Replacement,Path Deviation,Template Matching,Roboflow Dataset Upload,Perception Encoder Embedding Model,Perspective Correction,Roboflow Dataset Upload,Anthropic Claude,Background Color Visualization,OpenAI,Dynamic Crop,Trace Visualization,Keypoint Detection Model,Detection Offset,Polygon Visualization,Florence-2 Model,Moondream2,Line Counter,Byte Tracker,Keypoint Visualization,Byte Tracker,Detections Combine,Clip Comparison,Line Counter Visualization,OpenAI,Line Counter,Instance Segmentation Model,Mask Visualization,Twilio SMS Notification,Detections Consensus,LMM,Image Preprocessing,Time in Zone,Color Visualization,LMM For Classification,Ellipse Visualization,Stability AI Image Generation,Email Notification,Halo Visualization,Google Gemini,Object Detection Model,Label Visualization,Detections Merge,OpenAI,Contrast Equalization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector in version v1 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id(string): String value.
Example JSON definition of step VLM as Detector in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}