VLM as Classifier¶
v2¶
Class: VLMAsClassifierBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_classifier.v2.VLMAsClassifierBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to classification prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags (very common for GPT responses)
Example:
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_classifier@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Classifier in version v2.
- inputs:
QR Code Generator,Image Convert Grayscale,Google Gemini,Dynamic Crop,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Camera Focus,Keypoint Visualization,Trace Visualization,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Triangle Visualization,Polygon Zone Visualization,Halo Visualization,Stability AI Image Generation,Florence-2 Model,Circle Visualization,Google Gemini,Motion Detection,Clip Comparison,Camera Focus,Anthropic Claude,Perspective Correction,Reference Path Visualization,Corner Visualization,Color Visualization,Image Slicer,OpenAI,Camera Calibration,Image Blur,Buffer,Dot Visualization,Image Threshold,Morphological Transformation,Label Visualization,Background Color Visualization,Classification Label Visualization,Mask Visualization,Dynamic Zone,Detections List Roll-Up,Pixelate Visualization,Absolute Static Crop,Size Measurement,Grid Visualization,Contrast Equalization,Image Preprocessing,Google Gemini,Relative Static Crop,Stability AI Inpainting,Image Contours,Line Counter Visualization,Stitch Images,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,SIFT Comparison,Depth Estimation,Florence-2 Model,Background Subtraction,Image Slicer - outputs:
Twilio SMS/MMS Notification,Color Visualization,Corner Visualization,Multi-Label Classification Model,Roboflow Custom Metadata,Dot Visualization,Model Monitoring Inference Aggregator,Blur Visualization,Label Visualization,Background Color Visualization,Bounding Box Visualization,Time in Zone,Classification Label Visualization,Slack Notification,Keypoint Visualization,Trace Visualization,Keypoint Detection Model,Roboflow Dataset Upload,Instance Segmentation Model,Polygon Visualization,Mask Visualization,Dynamic Zone,Pixelate Visualization,PTZ Tracking (ONVIF).md),Ellipse Visualization,Keypoint Detection Model,Model Comparison Visualization,Detections Consensus,Triangle Visualization,Webhook Sink,Polygon Zone Visualization,Stability AI Inpainting,SAM 3,Halo Visualization,Line Counter Visualization,Template Matching,Crop Visualization,Time in Zone,Reference Path Visualization,Single-Label Classification Model,Icon Visualization,Circle Visualization,Email Notification,SIFT Comparison,Gaze Detection,Motion Detection,Twilio SMS Notification,Single-Label Classification Model,Segment Anything 2 Model,Time in Zone,Detections Classes Replacement,SAM 3,Object Detection Model,Object Detection Model,Instance Segmentation Model,Multi-Label Classification Model,Perspective Correction,Roboflow Dataset Upload,Email Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Classifier in version v2 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(classification_prediction): Predictions from classifier.inference_id(inference_id): Inference identifier.
Example JSON definition of step VLM as Classifier in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_classifier@v2",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
]
}
v1¶
Class: VLMAsClassifierBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_classifier.v1.VLMAsClassifierBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to classification prediction and returned as block output.
Accepted formats:
-
valid JSON strings
-
JSON documents wrapped with Markdown tags (very common for GPT responses)
Example:
{"my": "json"}
Details regarding block behavior:
-
error_statusis setTruewhenever parsing cannot be completed -
in case of multiple markdown blocks with raw JSON content - only first will be parsed
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_classifier@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all classes used by the model, required to generate mapping between class name and class id.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM as Classifier in version v1.
- inputs:
QR Code Generator,Image Convert Grayscale,Google Gemini,Dynamic Crop,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Camera Focus,Keypoint Visualization,Trace Visualization,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Triangle Visualization,Polygon Zone Visualization,Halo Visualization,Stability AI Image Generation,Florence-2 Model,Circle Visualization,Google Gemini,Motion Detection,Clip Comparison,Camera Focus,Anthropic Claude,Perspective Correction,Reference Path Visualization,Corner Visualization,Color Visualization,Image Slicer,OpenAI,Camera Calibration,Image Blur,Buffer,Dot Visualization,Image Threshold,Morphological Transformation,Label Visualization,Background Color Visualization,Classification Label Visualization,Mask Visualization,Dynamic Zone,Detections List Roll-Up,Pixelate Visualization,Absolute Static Crop,Size Measurement,Grid Visualization,Contrast Equalization,Image Preprocessing,Google Gemini,Relative Static Crop,Stability AI Inpainting,Image Contours,Line Counter Visualization,Stitch Images,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,SIFT Comparison,Depth Estimation,Florence-2 Model,Background Subtraction,Image Slicer - outputs:
QR Code Generator,Google Gemini,Blur Visualization,Bounding Box Visualization,Stability AI Outpainting,Trace Visualization,Instance Segmentation Model,Pixel Color Count,Ellipse Visualization,Model Comparison Visualization,OpenAI,Triangle Visualization,SAM 3,Distance Measurement,Stability AI Image Generation,Path Deviation,CLIP Embedding Model,Florence-2 Model,Single-Label Classification Model,Email Notification,Google Gemini,Twilio SMS/MMS Notification,Color Visualization,Corner Visualization,Multi-Label Classification Model,OpenAI,Line Counter,SAM 3,Roboflow Custom Metadata,Dot Visualization,Image Threshold,Model Monitoring Inference Aggregator,Time in Zone,Classification Label Visualization,Roboflow Dataset Upload,Mask Visualization,Pixelate Visualization,Line Counter,Keypoint Detection Model,Detections Consensus,Size Measurement,Webhook Sink,Stability AI Inpainting,Template Matching,Line Counter Visualization,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Time in Zone,LMM For Classification,SAM 3,Object Detection Model,CogVLM,Roboflow Dataset Upload,Cache Get,Detections Stitch,Seg Preview,YOLO-World Model,Dynamic Crop,Slack Notification,Keypoint Visualization,Polygon Visualization,Cache Set,Anthropic Claude,Local File Sink,Polygon Zone Visualization,Halo Visualization,LMM,Time in Zone,Circle Visualization,Google Vision OCR,Motion Detection,Clip Comparison,Detections Classes Replacement,Anthropic Claude,Object Detection Model,Instance Segmentation Model,Perspective Correction,Perception Encoder Embedding Model,Reference Path Visualization,Stitch OCR Detections,Image Blur,Morphological Transformation,Label Visualization,Background Color Visualization,Keypoint Detection Model,Path Deviation,Dynamic Zone,PTZ Tracking (ONVIF).md),Moondream2,Image Preprocessing,Contrast Equalization,Google Gemini,OpenAI,SIFT Comparison,Gaze Detection,Depth Estimation,Twilio SMS Notification,Single-Label Classification Model,Florence-2 Model,Multi-Label Classification Model,Segment Anything 2 Model,Email Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM as Classifier in version v1 has.
Bindings
-
input
image(image): The image which was the base to generate VLM prediction.vlm_output(language_model_output): The string with raw classification prediction to parse..classes(list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
-
output
error_status(boolean): Boolean flag.predictions(classification_prediction): Predictions from classifier.inference_id(string): String value.
Example JSON definition of step VLM as Classifier in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_classifier@v1",
"image": "$inputs.image",
"vlm_output": [
"$steps.lmm.output"
],
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"class_a",
"class_b"
]
]
}