VLM As Classifier¶
v2¶
Class: VLMAsClassifierBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_classifier.v2.VLMAsClassifierBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Parse JSON strings from Visual Language Models (VLMs) and Large Language Models (LLMs) into standardized classification prediction format by extracting class predictions, mapping class names to class IDs, handling both single-class and multi-label formats, and converting VLM/LLM text outputs into workflow-compatible classification results for VLM-based classification, LLM classification parsing, and text-to-classification conversion workflows.
How This Block Works¶
This block converts VLM/LLM text outputs containing classification predictions into standardized classification prediction format. The block:
- Receives image and VLM output string containing classification results in JSON format
- Parses JSON content from VLM output:
Handles Markdown-wrapped JSON:
- Searches for JSON wrapped in Markdown code blocks (json ...)
- This format is common in LLM/VLM responses
- If multiple markdown JSON blocks are found, only the first block is parsed
- Extracts JSON content from within markdown tags
Handles raw JSON strings: - If no markdown blocks are found, attempts to parse the entire string as JSON - Supports standard JSON format strings 3. Detects classification format and parses accordingly:
Single-Class Classification Format: - Detects format containing "class_name" and "confidence" fields - Extracts the predicted class name and confidence score - Creates classification prediction with single top class - Maps class name to class ID using provided classes list
Multi-Label Classification Format:
- Detects format containing "predicted_classes" array
- Extracts all predicted classes with their confidence scores
- Handles duplicate classes by taking maximum confidence
- Maps all class names to class IDs using provided classes list
4. Creates class name to class ID mapping:
- Uses the provided classes list to create index mapping (class_name → class_id)
- Maps classes in order (first class = ID 0, second = ID 1, etc.)
- Classes not in the provided list get class_id = -1
5. Normalizes confidence scores:
- Scales confidence values to valid range [0.0, 1.0]
- Clamps values outside the range to 0.0 or 1.0
6. Constructs classification prediction:
- Includes image dimensions (width, height) from input image
- For single-class: includes "top" class, confidence, and predictions array
- For multi-label: includes "predicted_classes" list and predictions dictionary
- Includes inference_id and parent_id for tracking
- Formats prediction in standard classification prediction format
7. Handles errors:
- Sets error_status to True if JSON parsing fails
- Sets error_status to True if classification format cannot be determined
- Returns None for predictions when errors occur
- Always includes inference_id for tracking
8. Returns classification prediction:
- Outputs predictions in standard classification format (compatible with classification blocks)
- Outputs error_status indicating parsing success/failure
- Outputs inference_id with specific type for tracking and lineage
The block enables using VLMs/LLMs for classification by converting their text-based JSON outputs into standardized classification predictions that can be used in workflows like any other classification model output.
Common Use Cases¶
- VLM-Based Classification: Use Visual Language Models for image classification by parsing VLM outputs into classification predictions (e.g., classify images with VLMs, use GPT-4V for classification, parse Claude Vision classifications), enabling VLM classification workflows
- LLM Classification Parsing: Parse LLM text outputs containing classification results into standardized format (e.g., parse GPT classification outputs, convert LLM predictions to classification format, use LLMs for classification), enabling LLM classification workflows
- Text-to-Classification Conversion: Convert text-based classification outputs from models into workflow-compatible classification predictions (e.g., convert text predictions to classification format, parse text-based classifications, convert model outputs to classifications), enabling text-to-classification workflows
- Multi-Format Classification Support: Handle both single-class and multi-label classification formats from VLM/LLM outputs (e.g., support single-label VLM classifications, support multi-label VLM classifications, handle different classification formats), enabling flexible classification workflows
- VLM Integration: Integrate VLM outputs into classification workflows (e.g., use VLMs in classification pipelines, integrate VLM predictions with classification blocks, combine VLM and traditional classification), enabling VLM integration workflows
- Flexible Classification Sources: Enable classification from various model types that output text/JSON (e.g., use any text-output model for classification, convert model outputs to classifications, parse various classification formats), enabling flexible classification workflows
Connecting to Other Blocks¶
This block receives images and VLM outputs and produces classification predictions:
- After VLM/LLM blocks to parse classification outputs into standard format (e.g., VLM output to classification, LLM output to classification, parse model outputs), enabling VLM-to-classification workflows
- Before classification-based blocks to use parsed classifications (e.g., use parsed classifications in workflows, provide classifications to downstream blocks, use VLM classifications with classification blocks), enabling classification-to-workflow workflows
- Before filtering blocks to filter based on VLM classifications (e.g., filter by VLM classification results, use parsed classifications for filtering, apply filters to VLM predictions), enabling classification-to-filter workflows
- Before analytics blocks to analyze VLM classification results (e.g., analyze VLM classifications, perform analytics on parsed classifications, track VLM classification metrics), enabling classification analytics workflows
- Before visualization blocks to display VLM classification results (e.g., visualize VLM classifications, display parsed classification predictions, show VLM classification outputs), enabling classification visualization workflows
- In workflow outputs to provide VLM classifications as final output (e.g., VLM classification outputs, parsed classification results, VLM-based classification outputs), enabling classification output workflows
Version Differences¶
This version (v2) includes the following enhancements over v1:
- Improved Type System: The
inference_idoutput now usesINFERENCE_ID_KINDinstead of genericSTRING_KIND, providing better type safety and semantic clarity for inference ID values in the workflow type system
Requirements¶
This block requires an image input (for metadata and dimensions) and a VLM output string containing JSON classification data. The JSON can be raw JSON or wrapped in Markdown code blocks (json ...). The block supports two JSON formats: single-class (with "class_name" and "confidence" fields) and multi-label (with "predicted_classes" array). The classes parameter must contain a list of all class names used by the model to generate class_id mappings. Classes are mapped to IDs by index (first class = 0, second = 1, etc.). Classes not in the list get class_id = -1. Confidence scores are normalized to [0.0, 1.0] range. The block outputs classification predictions in standard format (compatible with classification blocks), error_status (boolean), and inference_id (INFERENCE_ID_KIND) for tracking.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_classifier@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all class names used by the classification model, in order. Required to generate mapping between class names (from VLM output) and class IDs (for classification format). Classes are mapped to IDs by index: first class = ID 0, second = ID 1, etc. Classes from VLM output that are not in this list get class_id = -1. Should match the classes the VLM was asked to classify.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM As Classifier in version v2.
- inputs:
Halo Visualization,GLM-OCR,Image Threshold,Stitch Images,Morphological Transformation,Classification Label Visualization,Crop Visualization,Icon Visualization,Stability AI Outpainting,Blur Visualization,Reference Path Visualization,MoonshotAI Kimi,OpenAI,Google Gemini,Anthropic Claude,Camera Focus,QR Code Generator,Size Measurement,Model Comparison Visualization,Florence-2 Model,Trace Visualization,Ellipse Visualization,Anthropic Claude,Dot Visualization,Perspective Correction,Label Visualization,Image Convert Grayscale,Florence-2 Model,Text Display,Llama 3.2 Vision,Qwen-VL,PLC ModbusTCP,Image Blur,Absolute Static Crop,SIFT,Google Gemini,Dimension Collapse,Qwen 3.5 API,Qwen 3.6 API,Triangle Visualization,Camera Focus,Contrast Equalization,Polygon Visualization,OpenAI,Heatmap Visualization,Clip Comparison,Google Gemma API,Detections List Roll-Up,Contrast Enhancement,Google Gemini,PLC EthernetIP,Halo Visualization,Color Visualization,Morphological Transformation,MoonshotAI Kimi,Llama 3.2 Vision,Buffer,Polygon Visualization,Image Stack,Mask Visualization,Anthropic Claude,Stability AI Inpainting,Keypoint Visualization,Background Subtraction,Image Slicer,Image Contours,Line Counter Visualization,Image Preprocessing,Dynamic Crop,Depth Estimation,Bounding Box Visualization,Motion Detection,Clip Comparison,Corner Visualization,Polygon Zone Visualization,Camera Calibration,Grid Visualization,Stability AI Image Generation,Dynamic Zone,OpenAI,Circle Visualization,Image Slicer,Relative Static Crop,OpenAI-Compatible LLM,OpenRouter,SIFT Comparison,Pixelate Visualization,Background Color Visualization,Google Gemma - outputs:
Halo Visualization,SAM 3 Interactive,Template Matching,Twilio SMS/MMS Notification,Classification Label Visualization,Crop Visualization,Icon Visualization,Blur Visualization,Reference Path Visualization,Single-Label Classification Model,Google Gemini,Detections Classes Replacement,Single-Label Classification Model,Webhook Sink,Instance Segmentation Model,Instance Segmentation Model,Model Comparison Visualization,MQTT Writer,Trace Visualization,Ellipse Visualization,Object Detection Model,Keypoint Detection Model,BoT-SORT Tracker,Dot Visualization,Perspective Correction,Label Visualization,Instance Segmentation Model,Text Display,Roboflow Dataset Upload,Keypoint Detection Model,Gaze Detection,Keypoint Detection Model,SAM 3,Triangle Visualization,Time in Zone,Polygon Visualization,Heatmap Visualization,Multi-Label Classification Model,Halo Visualization,Color Visualization,Event Writer,Multi-Label Classification Model,Image Stack,Email Notification,Polygon Visualization,Mask Visualization,Stability AI Inpainting,Time in Zone,Microsoft SQL Server Sink,Roboflow Asset Library Attributes,PTZ Tracking (ONVIF),Keypoint Visualization,Multi-Label Classification Model,Roboflow Vision Events,Twilio SMS Notification,Email Notification,Line Counter Visualization,Detections Consensus,Object Detection Model,OPC UA Writer Sink,SAM 3,Bounding Box Visualization,Motion Detection,Roboflow Dataset Upload,Polygon Zone Visualization,Camera Calibration,Segment Anything 2 Model,Corner Visualization,Circle Visualization,Time in Zone,Single-Label Classification Model,Roboflow Custom Metadata,Instance Segmentation Model,Model Monitoring Inference Aggregator,Slack Notification,Object Detection Model,SIFT Comparison,Pixelate Visualization,Background Color Visualization,Dynamic Zone
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM As Classifier in version v2 has.
Bindings
-
input
image(image): Input image that was used to generate the VLM prediction. Used to extract image dimensions (width, height) and metadata (parent_id) for the classification prediction. The same image that was provided to the VLM/LLM block should be used here to maintain consistency..vlm_output(language_model_output): String output from a VLM or LLM block containing classification prediction in JSON format. Can be raw JSON string (e.g., '{"class_name": "dog", "confidence": 0.95}') or JSON wrapped in Markdown code blocks (e.g.,json {...}). Supports two formats: single-class (with 'class_name' and 'confidence' fields) or multi-label (with 'predicted_classes' array). If multiple markdown blocks exist, only the first is parsed..classes(list_of_values): List of all class names used by the classification model, in order. Required to generate mapping between class names (from VLM output) and class IDs (for classification format). Classes are mapped to IDs by index: first class = ID 0, second = ID 1, etc. Classes from VLM output that are not in this list get class_id = -1. Should match the classes the VLM was asked to classify..
-
output
error_status(boolean): Boolean flag.predictions(classification_prediction): Predictions from classifier.inference_id(inference_id): Inference identifier.
Example JSON definition of step VLM As Classifier in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_classifier@v2",
"image": "$inputs.image",
"vlm_output": "$steps.lmm.output",
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"dog",
"cat",
"bird"
],
[
"class_a",
"class_b"
]
]
}
v1¶
Class: VLMAsClassifierBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.formatters.vlm_as_classifier.v1.VLMAsClassifierBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Parse JSON strings from Visual Language Models (VLMs) and Large Language Models (LLMs) into standardized classification prediction format by extracting class predictions, mapping class names to class IDs, handling both single-class and multi-label formats, and converting VLM/LLM text outputs into workflow-compatible classification results for VLM-based classification, LLM classification parsing, and text-to-classification conversion workflows.
How This Block Works¶
This block converts VLM/LLM text outputs containing classification predictions into standardized classification prediction format. The block:
- Receives image and VLM output string containing classification results in JSON format
- Parses JSON content from VLM output:
Handles Markdown-wrapped JSON:
- Searches for JSON wrapped in Markdown code blocks (json ...)
- This format is common in LLM/VLM responses
- If multiple markdown JSON blocks are found, only the first block is parsed
- Extracts JSON content from within markdown tags
Handles raw JSON strings: - If no markdown blocks are found, attempts to parse the entire string as JSON - Supports standard JSON format strings 3. Detects classification format and parses accordingly:
Single-Class Classification Format: - Detects format containing "class_name" and "confidence" fields - Extracts the predicted class name and confidence score - Creates classification prediction with single top class - Maps class name to class ID using provided classes list
Multi-Label Classification Format:
- Detects format containing "predicted_classes" array
- Extracts all predicted classes with their confidence scores
- Handles duplicate classes by taking maximum confidence
- Maps all class names to class IDs using provided classes list
4. Creates class name to class ID mapping:
- Uses the provided classes list to create index mapping (class_name → class_id)
- Maps classes in order (first class = ID 0, second = ID 1, etc.)
- Classes not in the provided list get class_id = -1
5. Normalizes confidence scores:
- Scales confidence values to valid range [0.0, 1.0]
- Clamps values outside the range to 0.0 or 1.0
6. Constructs classification prediction:
- Includes image dimensions (width, height) from input image
- For single-class: includes "top" class, confidence, and predictions array
- For multi-label: includes "predicted_classes" list and predictions dictionary
- Includes inference_id and parent_id for tracking
- Formats prediction in standard classification prediction format
7. Handles errors:
- Sets error_status to True if JSON parsing fails
- Sets error_status to True if classification format cannot be determined
- Returns None for predictions when errors occur
- Always includes inference_id for tracking
8. Returns classification prediction:
- Outputs predictions in standard classification format (compatible with classification blocks)
- Outputs error_status indicating parsing success/failure
- Outputs inference_id for tracking and lineage
The block enables using VLMs/LLMs for classification by converting their text-based JSON outputs into standardized classification predictions that can be used in workflows like any other classification model output.
Common Use Cases¶
- VLM-Based Classification: Use Visual Language Models for image classification by parsing VLM outputs into classification predictions (e.g., classify images with VLMs, use GPT-4V for classification, parse Claude Vision classifications), enabling VLM classification workflows
- LLM Classification Parsing: Parse LLM text outputs containing classification results into standardized format (e.g., parse GPT classification outputs, convert LLM predictions to classification format, use LLMs for classification), enabling LLM classification workflows
- Text-to-Classification Conversion: Convert text-based classification outputs from models into workflow-compatible classification predictions (e.g., convert text predictions to classification format, parse text-based classifications, convert model outputs to classifications), enabling text-to-classification workflows
- Multi-Format Classification Support: Handle both single-class and multi-label classification formats from VLM/LLM outputs (e.g., support single-label VLM classifications, support multi-label VLM classifications, handle different classification formats), enabling flexible classification workflows
- VLM Integration: Integrate VLM outputs into classification workflows (e.g., use VLMs in classification pipelines, integrate VLM predictions with classification blocks, combine VLM and traditional classification), enabling VLM integration workflows
- Flexible Classification Sources: Enable classification from various model types that output text/JSON (e.g., use any text-output model for classification, convert model outputs to classifications, parse various classification formats), enabling flexible classification workflows
Connecting to Other Blocks¶
This block receives images and VLM outputs and produces classification predictions:
- After VLM/LLM blocks to parse classification outputs into standard format (e.g., VLM output to classification, LLM output to classification, parse model outputs), enabling VLM-to-classification workflows
- Before classification-based blocks to use parsed classifications (e.g., use parsed classifications in workflows, provide classifications to downstream blocks, use VLM classifications with classification blocks), enabling classification-to-workflow workflows
- Before filtering blocks to filter based on VLM classifications (e.g., filter by VLM classification results, use parsed classifications for filtering, apply filters to VLM predictions), enabling classification-to-filter workflows
- Before analytics blocks to analyze VLM classification results (e.g., analyze VLM classifications, perform analytics on parsed classifications, track VLM classification metrics), enabling classification analytics workflows
- Before visualization blocks to display VLM classification results (e.g., visualize VLM classifications, display parsed classification predictions, show VLM classification outputs), enabling classification visualization workflows
- In workflow outputs to provide VLM classifications as final output (e.g., VLM classification outputs, parsed classification results, VLM-based classification outputs), enabling classification output workflows
Requirements¶
This block requires an image input (for metadata and dimensions) and a VLM output string containing JSON classification data. The JSON can be raw JSON or wrapped in Markdown code blocks (json ...). The block supports two JSON formats: single-class (with "class_name" and "confidence" fields) and multi-label (with "predicted_classes" array). The classes parameter must contain a list of all class names used by the model to generate class_id mappings. Classes are mapped to IDs by index (first class = 0, second = 1, etc.). Classes not in the list get class_id = -1. Confidence scores are normalized to [0.0, 1.0] range. The block outputs classification predictions in standard format (compatible with classification blocks), error_status (boolean), and inference_id (string) for tracking.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/vlm_as_classifier@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
classes |
List[str] |
List of all class names used by the classification model, in order. Required to generate mapping between class names (from VLM output) and class IDs (for classification format). Classes are mapped to IDs by index: first class = ID 0, second = ID 1, etc. Classes from VLM output that are not in this list get class_id = -1. Should match the classes the VLM was asked to classify.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to VLM As Classifier in version v1.
- inputs:
Halo Visualization,GLM-OCR,Image Threshold,Stitch Images,Morphological Transformation,Classification Label Visualization,Crop Visualization,Icon Visualization,Stability AI Outpainting,Blur Visualization,Reference Path Visualization,MoonshotAI Kimi,OpenAI,Google Gemini,Anthropic Claude,Camera Focus,QR Code Generator,Size Measurement,Model Comparison Visualization,Florence-2 Model,Trace Visualization,Ellipse Visualization,Anthropic Claude,Dot Visualization,Perspective Correction,Label Visualization,Image Convert Grayscale,Florence-2 Model,Text Display,Llama 3.2 Vision,Qwen-VL,PLC ModbusTCP,Image Blur,Absolute Static Crop,SIFT,Google Gemini,Dimension Collapse,Qwen 3.5 API,Qwen 3.6 API,Triangle Visualization,Camera Focus,Contrast Equalization,Polygon Visualization,OpenAI,Heatmap Visualization,Clip Comparison,Google Gemma API,Detections List Roll-Up,Contrast Enhancement,Google Gemini,PLC EthernetIP,Halo Visualization,Color Visualization,Morphological Transformation,MoonshotAI Kimi,Llama 3.2 Vision,Buffer,Polygon Visualization,Image Stack,Mask Visualization,Anthropic Claude,Stability AI Inpainting,Keypoint Visualization,Background Subtraction,Image Slicer,Image Contours,Line Counter Visualization,Image Preprocessing,Dynamic Crop,Depth Estimation,Bounding Box Visualization,Motion Detection,Clip Comparison,Corner Visualization,Polygon Zone Visualization,Camera Calibration,Grid Visualization,Stability AI Image Generation,Dynamic Zone,OpenAI,Circle Visualization,Image Slicer,Relative Static Crop,OpenAI-Compatible LLM,OpenRouter,SIFT Comparison,Pixelate Visualization,Background Color Visualization,Google Gemma - outputs:
Template Matching,Morphological Transformation,Classification Label Visualization,Crop Visualization,Stability AI Outpainting,Blur Visualization,Reference Path Visualization,OpenAI,YOLO-World Model,Detections Classes Replacement,Anthropic Claude,Instance Segmentation Model,Size Measurement,Model Comparison Visualization,Florence-2 Model,Trace Visualization,Label Visualization,Florence-2 Model,Text Display,Qwen-VL,Llama 3.2 Vision,Keypoint Detection Model,Image Blur,Gaze Detection,Keypoint Detection Model,LMM,Qwen 3.5 API,Qwen 3.6 API,Line Counter,Multi-Label Classification Model,Detections Stitch,Clip Comparison,Google Gemma API,Halo Visualization,Color Visualization,Stitch OCR Detections,MoonshotAI Kimi,Morphological Transformation,Event Writer,Stability AI Inpainting,Cache Set,Time in Zone,Microsoft SQL Server Sink,Roboflow Asset Library Attributes,OpenAI,Roboflow Vision Events,CogVLM,Detections Consensus,Object Detection Model,OPC UA Writer Sink,Semantic Segmentation Model,Dynamic Crop,Path Deviation,Bounding Box Visualization,Qwen3.5-VL,SAM 3,Cache Get,OpenAI,Time in Zone,Single-Label Classification Model,Slack Notification,OpenRouter,SIFT Comparison,Pixelate Visualization,Google Vision OCR,SAM3 Video Tracker,Dynamic Zone,Google Gemma,Halo Visualization,CLIP Embedding Model,Stitch OCR Detections,GLM-OCR,Image Threshold,SAM 3 Interactive,Twilio SMS/MMS Notification,Icon Visualization,MoonshotAI Kimi,Single-Label Classification Model,Google Gemini,Single-Label Classification Model,Webhook Sink,Instance Segmentation Model,QR Code Generator,Path Deviation,MQTT Writer,Ellipse Visualization,Object Detection Model,Anthropic Claude,Keypoint Detection Model,BoT-SORT Tracker,Dot Visualization,Perspective Correction,Instance Segmentation Model,Seg Preview,Roboflow Dataset Upload,Google Gemini,Local File Sink,SAM 3,Triangle Visualization,Time in Zone,Contrast Equalization,Polygon Visualization,OpenAI,Heatmap Visualization,Perception Encoder Embedding Model,Google Gemini,LMM For Classification,Llama 3.2 Vision,Multi-Label Classification Model,Image Stack,Email Notification,Polygon Visualization,Mask Visualization,Anthropic Claude,Distance Measurement,PTZ Tracking (ONVIF),Keypoint Visualization,Multi-Label Classification Model,Twilio SMS Notification,Email Notification,Line Counter Visualization,Image Preprocessing,SAM 3,Depth Estimation,Pixel Color Count,Motion Detection,Current Time,Roboflow Dataset Upload,Polygon Zone Visualization,Camera Calibration,Segment Anything 2 Model,Corner Visualization,Moondream2,Stability AI Image Generation,S3 Sink,Circle Visualization,Roboflow Custom Metadata,Instance Segmentation Model,Model Monitoring Inference Aggregator,OpenAI-Compatible LLM,Object Detection Model,Background Color Visualization,Line Counter
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
VLM As Classifier in version v1 has.
Bindings
-
input
image(image): Input image that was used to generate the VLM prediction. Used to extract image dimensions (width, height) and metadata (parent_id) for the classification prediction. The same image that was provided to the VLM/LLM block should be used here to maintain consistency..vlm_output(language_model_output): String output from a VLM or LLM block containing classification prediction in JSON format. Can be raw JSON string (e.g., '{"class_name": "dog", "confidence": 0.95}') or JSON wrapped in Markdown code blocks (e.g.,json {...}). Supports two formats: single-class (with 'class_name' and 'confidence' fields) or multi-label (with 'predicted_classes' array). If multiple markdown blocks exist, only the first is parsed..classes(list_of_values): List of all class names used by the classification model, in order. Required to generate mapping between class names (from VLM output) and class IDs (for classification format). Classes are mapped to IDs by index: first class = ID 0, second = ID 1, etc. Classes from VLM output that are not in this list get class_id = -1. Should match the classes the VLM was asked to classify..
-
output
error_status(boolean): Boolean flag.predictions(classification_prediction): Predictions from classifier.inference_id(string): String value.
Example JSON definition of step VLM As Classifier in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/vlm_as_classifier@v1",
"image": "$inputs.image",
"vlm_output": "$steps.lmm.output",
"classes": [
"$steps.lmm.classes",
"$inputs.classes",
[
"dog",
"cat",
"bird"
],
[
"class_a",
"class_b"
]
]
}