VLM as Detector¶
v2¶
Class: VLMAsDetectorBlockV2
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_status
is setTrue
whenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/vlm_as_detector@v2
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector
in version v2
.
- inputs:
Reference Path Visualization
,Blur Visualization
,Pixelate Visualization
,Anthropic Claude
,Classification Label Visualization
,Llama 3.2 Vision
,Dimension Collapse
,Background Color Visualization
,Dynamic Crop
,Keypoint Visualization
,Camera Focus
,Mask Visualization
,Clip Comparison
,Image Slicer
,Buffer
,Absolute Static Crop
,Stability AI Image Generation
,Florence-2 Model
,Image Blur
,Florence-2 Model
,Circle Visualization
,Grid Visualization
,Clip Comparison
,Crop Visualization
,Image Convert Grayscale
,Image Threshold
,Trace Visualization
,Polygon Visualization
,Triangle Visualization
,Stability AI Inpainting
,Halo Visualization
,Dot Visualization
,Polygon Zone Visualization
,Google Gemini
,Dynamic Zone
,Size Measurement
,OpenAI
,Camera Calibration
,SIFT
,Corner Visualization
,Image Contours
,Model Comparison Visualization
,Stitch Images
,Bounding Box Visualization
,Line Counter Visualization
,Image Slicer
,Perspective Correction
,Image Preprocessing
,SIFT Comparison
,Label Visualization
,Relative Static Crop
,Color Visualization
,Ellipse Visualization
- outputs:
Multi-Label Classification Model
,Single-Label Classification Model
,Classification Label Visualization
,Webhook Sink
,Background Color Visualization
,Dynamic Crop
,Mask Visualization
,Twilio SMS Notification
,Segment Anything 2 Model
,Detection Offset
,Model Monitoring Inference Aggregator
,Florence-2 Model
,Roboflow Dataset Upload
,Line Counter
,Circle Visualization
,Crop Visualization
,Template Matching
,Multi-Label Classification Model
,Path Deviation
,Stitch OCR Detections
,Velocity
,Detections Stitch
,Detections Stabilizer
,Path Deviation
,Line Counter
,Time in Zone
,Model Comparison Visualization
,Bounding Box Visualization
,Keypoint Detection Model
,SIFT Comparison
,Perspective Correction
,Color Visualization
,Slack Notification
,Ellipse Visualization
,Reference Path Visualization
,Blur Visualization
,Pixelate Visualization
,Email Notification
,Instance Segmentation Model
,Keypoint Visualization
,Time in Zone
,Single-Label Classification Model
,Byte Tracker
,Florence-2 Model
,Detections Filter
,Detections Transformation
,Trace Visualization
,Polygon Visualization
,Triangle Visualization
,Detections Consensus
,Keypoint Detection Model
,Halo Visualization
,Dot Visualization
,Polygon Zone Visualization
,Detections Merge
,Size Measurement
,Instance Segmentation Model
,Roboflow Custom Metadata
,Detections Classes Replacement
,Gaze Detection
,Object Detection Model
,Corner Visualization
,Line Counter Visualization
,Roboflow Dataset Upload
,Byte Tracker
,Byte Tracker
,Label Visualization
,Distance Measurement
,Object Detection Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector
in version v2
has.
Bindings
-
input
image
(image
): The image which was the base to generate VLM prediction.vlm_output
(language_model_output
): The string with raw classification prediction to parse..classes
(list_of_values
): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status
(boolean
): Boolean flag.predictions
(object_detection_prediction
): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id
(inference_id
): Inference identifier.
Example JSON definition of step VLM as Detector
in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}
v1¶
Class: VLMAsDetectorBlockV1
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags
Example
{"my": "json"}
Details regarding block behavior:
-
error_status
is setTrue
whenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/vlm_as_detector@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
model_type |
str |
Type of the model that generated prediction. | ❌ |
task_type |
str |
Task type to performed by model.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Detector
in version v1
.
- inputs:
Reference Path Visualization
,Blur Visualization
,Pixelate Visualization
,Anthropic Claude
,Classification Label Visualization
,Llama 3.2 Vision
,Dimension Collapse
,Background Color Visualization
,Dynamic Crop
,Keypoint Visualization
,Camera Focus
,Mask Visualization
,Clip Comparison
,Image Slicer
,Buffer
,Absolute Static Crop
,Stability AI Image Generation
,Florence-2 Model
,Image Blur
,Florence-2 Model
,Circle Visualization
,Grid Visualization
,Clip Comparison
,Crop Visualization
,Image Convert Grayscale
,Image Threshold
,Trace Visualization
,Polygon Visualization
,Triangle Visualization
,Stability AI Inpainting
,Halo Visualization
,Dot Visualization
,Polygon Zone Visualization
,Google Gemini
,Dynamic Zone
,Size Measurement
,OpenAI
,Camera Calibration
,SIFT
,Corner Visualization
,Image Contours
,Model Comparison Visualization
,Stitch Images
,Bounding Box Visualization
,Line Counter Visualization
,Image Slicer
,Perspective Correction
,Image Preprocessing
,SIFT Comparison
,Label Visualization
,Relative Static Crop
,Color Visualization
,Ellipse Visualization
- outputs:
Multi-Label Classification Model
,Single-Label Classification Model
,Classification Label Visualization
,Webhook Sink
,Background Color Visualization
,Dynamic Crop
,Cache Get
,Mask Visualization
,Clip Comparison
,Twilio SMS Notification
,Google Vision OCR
,Segment Anything 2 Model
,Detection Offset
,Model Monitoring Inference Aggregator
,Florence-2 Model
,Stability AI Image Generation
,LMM For Classification
,Image Blur
,Cache Set
,Roboflow Dataset Upload
,Line Counter
,CogVLM
,Circle Visualization
,Crop Visualization
,Template Matching
,Multi-Label Classification Model
,Path Deviation
,OpenAI
,Stitch OCR Detections
,Velocity
,Detections Stitch
,OpenAI
,Pixel Color Count
,Detections Stabilizer
,Path Deviation
,Line Counter
,Time in Zone
,Model Comparison Visualization
,Bounding Box Visualization
,Keypoint Detection Model
,SIFT Comparison
,Perspective Correction
,Color Visualization
,Slack Notification
,Ellipse Visualization
,Reference Path Visualization
,Blur Visualization
,Pixelate Visualization
,Anthropic Claude
,Email Notification
,LMM
,Llama 3.2 Vision
,Instance Segmentation Model
,Keypoint Visualization
,Time in Zone
,Single-Label Classification Model
,Byte Tracker
,Florence-2 Model
,Detections Filter
,Detections Transformation
,YOLO-World Model
,Trace Visualization
,Image Threshold
,Polygon Visualization
,Triangle Visualization
,Detections Consensus
,Keypoint Detection Model
,Stability AI Inpainting
,Halo Visualization
,Dot Visualization
,Polygon Zone Visualization
,Google Gemini
,Detections Merge
,CLIP Embedding Model
,Local File Sink
,Size Measurement
,Instance Segmentation Model
,Roboflow Custom Metadata
,Detections Classes Replacement
,Gaze Detection
,Object Detection Model
,Corner Visualization
,Line Counter Visualization
,Roboflow Dataset Upload
,Byte Tracker
,Image Preprocessing
,Byte Tracker
,Label Visualization
,Distance Measurement
,Object Detection Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Detector
in version v1
has.
Bindings
-
input
image
(image
): The image which was the base to generate VLM prediction.vlm_output
(language_model_output
): The string with raw classification prediction to parse..classes
(list_of_values
): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status
(boolean
): Boolean flag.predictions
(object_detection_prediction
): Prediction with detected bounding boxes in form of sv.Detections(...) object.inference_id
(string
): String value.
Example JSON definition of step VLM as Detector
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_detector@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
],
"model_type": [
"google-gemini",
"anthropic-claude",
"florence-2"
],
"task_type": "<block_does_not_provide_example>"
}