VLM as Detector¶
v2¶
Class: VLMAsDetectorBlockV2
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_status
is setTrue
whenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/vlm_as_detector@v2
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector
in version v2
.
- inputs:
Circle Visualization
,Background Color Visualization
,Corner Visualization
,Bounding Box Visualization
,Line Counter Visualization
,Image Preprocessing
,Trace Visualization
,Label Visualization
,Clip Comparison
,Polygon Zone Visualization
,Camera Focus
,Image Slicer
,Image Slicer
,Image Blur
,Anthropic Claude
,Crop Visualization
,Dot Visualization
,Google Gemini
,Relative Static Crop
,Model Comparison Visualization
,Stability AI Inpainting
,Dimension Collapse
,Pixelate Visualization
,Perspective Correction
,OpenAI
,Image Convert Grayscale
,Absolute Static Crop
,Mask Visualization
,Stability AI Image Generation
,Color Visualization
,Image Threshold
,Clip Comparison
,Dynamic Crop
,Halo Visualization
,Polygon Visualization
,Florence-2 Model
,Image Contours
,Dynamic Zone
,Buffer
,Camera Calibration
,SIFT
,Reference Path Visualization
,Florence-2 Model
,Classification Label Visualization
,Triangle Visualization
,SIFT Comparison
,Llama 3.2 Vision
,Keypoint Visualization
,Grid Visualization
,Ellipse Visualization
,Stitch Images
,Size Measurement
,Blur Visualization
- outputs:
Circle Visualization
,Background Color Visualization
,Corner Visualization
,Twilio SMS Notification
,Slack Notification
,Polygon Zone Visualization
,Dot Visualization
,Path Deviation
,Detections Merge
,Detection Offset
,Roboflow Dataset Upload
,Single-Label Classification Model
,Pixelate Visualization
,Line Counter
,Detections Consensus
,Gaze Detection
,Distance Measurement
,Webhook Sink
,Color Visualization
,Halo Visualization
,Polygon Visualization
,Detections Classes Replacement
,Instance Segmentation Model
,Email Notification
,Object Detection Model
,Classification Label Visualization
,Single-Label Classification Model
,Roboflow Dataset Upload
,Byte Tracker
,Ellipse Visualization
,Size Measurement
,Bounding Box Visualization
,Object Detection Model
,Line Counter Visualization
,Keypoint Detection Model
,Trace Visualization
,Label Visualization
,Detections Transformation
,Crop Visualization
,Detections Stitch
,Model Comparison Visualization
,Stitch OCR Detections
,Perspective Correction
,Byte Tracker
,Path Deviation
,Mask Visualization
,Time in Zone
,Detections Filter
,Time in Zone
,Dynamic Crop
,Template Matching
,Byte Tracker
,Florence-2 Model
,Instance Segmentation Model
,Keypoint Detection Model
,Reference Path Visualization
,Multi-Label Classification Model
,Florence-2 Model
,Triangle Visualization
,Model Monitoring Inference Aggregator
,Velocity
,SIFT Comparison
,Keypoint Visualization
,Multi-Label Classification Model
,Roboflow Custom Metadata
,Line Counter
,Segment Anything 2 Model
,Detections Stabilizer
,Blur Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector
in version v2
has.
Bindings
-
input
image
(image
): The image which was the base to generate VLM prediction.vlm_output
(language_model_output
): The string with raw classification prediction to parse..classes
(list_of_values
): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status
(boolean
): Boolean flag.predictions
(object_detection_prediction
): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id
(inference_id
): Inference identifier.
Example JSON definition of step VLM as Detector
in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}
v1¶
Class: VLMAsDetectorBlockV1
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_status
is setTrue
whenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/vlm_as_detector@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector
in version v1
.
- inputs:
Circle Visualization
,Background Color Visualization
,Corner Visualization
,Bounding Box Visualization
,Line Counter Visualization
,Image Preprocessing
,Trace Visualization
,Label Visualization
,Clip Comparison
,Polygon Zone Visualization
,Camera Focus
,Image Slicer
,Image Slicer
,Image Blur
,Anthropic Claude
,Crop Visualization
,Dot Visualization
,Google Gemini
,Relative Static Crop
,Model Comparison Visualization
,Stability AI Inpainting
,Dimension Collapse
,Pixelate Visualization
,Perspective Correction
,OpenAI
,Image Convert Grayscale
,Absolute Static Crop
,Mask Visualization
,Stability AI Image Generation
,Color Visualization
,Image Threshold
,Clip Comparison
,Dynamic Crop
,Halo Visualization
,Polygon Visualization
,Florence-2 Model
,Image Contours
,Dynamic Zone
,Buffer
,Camera Calibration
,SIFT
,Reference Path Visualization
,Florence-2 Model
,Classification Label Visualization
,Triangle Visualization
,SIFT Comparison
,Llama 3.2 Vision
,Keypoint Visualization
,Grid Visualization
,Ellipse Visualization
,Stitch Images
,Size Measurement
,Blur Visualization
- outputs:
Circle Visualization
,Background Color Visualization
,Corner Visualization
,Twilio SMS Notification
,Slack Notification
,LMM
,Polygon Zone Visualization
,Image Blur
,Cache Set
,Dot Visualization
,Path Deviation
,Detections Merge
,Detection Offset
,Google Gemini
,Roboflow Dataset Upload
,Single-Label Classification Model
,Stability AI Inpainting
,Pixelate Visualization
,Line Counter
,OpenAI
,Detections Consensus
,Gaze Detection
,Distance Measurement
,Stability AI Image Generation
,Webhook Sink
,Color Visualization
,Image Threshold
,Halo Visualization
,Polygon Visualization
,Detections Classes Replacement
,Instance Segmentation Model
,CogVLM
,Email Notification
,Object Detection Model
,Classification Label Visualization
,Single-Label Classification Model
,Llama 3.2 Vision
,Google Vision OCR
,Roboflow Dataset Upload
,Byte Tracker
,Ellipse Visualization
,Size Measurement
,Pixel Color Count
,Cache Get
,Bounding Box Visualization
,Object Detection Model
,Line Counter Visualization
,Image Preprocessing
,Keypoint Detection Model
,Trace Visualization
,Label Visualization
,Local File Sink
,Detections Transformation
,Anthropic Claude
,Crop Visualization
,Detections Stitch
,YOLO-World Model
,Model Comparison Visualization
,Stitch OCR Detections
,Perspective Correction
,Byte Tracker
,OpenAI
,Path Deviation
,Mask Visualization
,Time in Zone
,Detections Filter
,Clip Comparison
,Time in Zone
,Dynamic Crop
,Template Matching
,Byte Tracker
,Florence-2 Model
,Instance Segmentation Model
,Keypoint Detection Model
,Reference Path Visualization
,Multi-Label Classification Model
,Florence-2 Model
,Triangle Visualization
,Model Monitoring Inference Aggregator
,CLIP Embedding Model
,Velocity
,SIFT Comparison
,Keypoint Visualization
,Multi-Label Classification Model
,LMM For Classification
,Roboflow Custom Metadata
,Line Counter
,Segment Anything 2 Model
,Detections Stabilizer
,Blur Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector
in version v1
has.
Bindings
-
input
image
(image
): The image which was the base to generate VLM prediction.vlm_output
(language_model_output
): The string with raw classification prediction to parse..classes
(list_of_values
): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status
(boolean
): Boolean flag.predictions
(object_detection_prediction
): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id
(string
): String value.
Example JSON definition of step VLM as Detector
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}