LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/lmm@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v .. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM
in version v1
.
- inputs:
Webhook Sink
,Single-Label Classification Model
,Image Preprocessing
,Mask Visualization
,Object Detection Model
,Depth Estimation
,Image Convert Grayscale
,Trace Visualization
,Polygon Zone Visualization
,Triangle Visualization
,Image Slicer
,Color Visualization
,Florence-2 Model
,Image Slicer
,Llama 3.2 Vision
,LMM For Classification
,Stitch OCR Detections
,Pixelate Visualization
,CSV Formatter
,Email Notification
,EasyOCR
,Image Blur
,Background Color Visualization
,Stability AI Inpainting
,Classification Label Visualization
,OpenAI
,Grid Visualization
,QR Code Generator
,LMM
,Roboflow Dataset Upload
,Morphological Transformation
,Slack Notification
,Keypoint Detection Model
,Model Monitoring Inference Aggregator
,CogVLM
,Twilio SMS Notification
,Google Vision OCR
,Camera Focus
,Local File Sink
,SIFT Comparison
,Dynamic Crop
,Stability AI Image Generation
,Dot Visualization
,Bounding Box Visualization
,Reference Path Visualization
,Crop Visualization
,Stability AI Outpainting
,Relative Static Crop
,Circle Visualization
,SIFT
,Perspective Correction
,Model Comparison Visualization
,Camera Calibration
,Label Visualization
,Image Threshold
,Line Counter Visualization
,Roboflow Custom Metadata
,Florence-2 Model
,Blur Visualization
,Keypoint Visualization
,OCR Model
,Contrast Equalization
,Instance Segmentation Model
,Corner Visualization
,OpenAI
,Google Gemini
,Icon Visualization
,Roboflow Dataset Upload
,Ellipse Visualization
,Halo Visualization
,Image Contours
,Polygon Visualization
,VLM as Detector
,Multi-Label Classification Model
,Stitch Images
,OpenAI
,Clip Comparison
,VLM as Classifier
,Anthropic Claude
,Absolute Static Crop
- outputs:
Velocity
,Cache Set
,Single-Label Classification Model
,Image Preprocessing
,Mask Visualization
,Object Detection Model
,VLM as Detector
,CLIP Embedding Model
,Image Convert Grayscale
,Dominant Color
,Color Visualization
,Line Counter
,Image Slicer
,Multi-Label Classification Model
,Llama 3.2 Vision
,Cache Get
,Path Deviation
,LMM For Classification
,Segment Anything 2 Model
,Email Notification
,Continue If
,Line Counter
,Qwen2.5-VL
,CSV Formatter
,Overlap Filter
,Image Blur
,Background Color Visualization
,Instance Segmentation Model
,Stability AI Inpainting
,Classification Label Visualization
,OpenAI
,QR Code Generator
,LMM
,Clip Comparison
,Slack Notification
,Detections Classes Replacement
,Model Monitoring Inference Aggregator
,Identify Changes
,Object Detection Model
,Google Vision OCR
,YOLO-World Model
,Camera Focus
,Local File Sink
,SIFT Comparison
,Stability AI Image Generation
,Bounding Box Visualization
,Gaze Detection
,Crop Visualization
,Identify Outliers
,Data Aggregator
,Relative Static Crop
,Circle Visualization
,Label Visualization
,Camera Calibration
,Roboflow Custom Metadata
,Contrast Equalization
,Moondream2
,SmolVLM2
,Corner Visualization
,Google Gemini
,Icon Visualization
,Ellipse Visualization
,Dynamic Zone
,Image Contours
,Polygon Visualization
,VLM as Detector
,Detections Combine
,Stitch Images
,Dimension Collapse
,OpenAI
,Detections Filter
,First Non Empty Or Default
,QR Code Detection
,Anthropic Claude
,Pixel Color Count
,Absolute Static Crop
,Webhook Sink
,Cosine Similarity
,Detections Merge
,Depth Estimation
,Trace Visualization
,Perception Encoder Embedding Model
,Polygon Zone Visualization
,Bounding Rectangle
,Triangle Visualization
,Image Slicer
,Delta Filter
,Florence-2 Model
,Byte Tracker
,Stitch OCR Detections
,Pixelate Visualization
,VLM as Classifier
,EasyOCR
,Template Matching
,Buffer
,Path Deviation
,Barcode Detection
,Grid Visualization
,JSON Parser
,SIFT Comparison
,Roboflow Dataset Upload
,Byte Tracker
,Morphological Transformation
,Property Definition
,Keypoint Detection Model
,Size Measurement
,CogVLM
,Expression
,Twilio SMS Notification
,Time in Zone
,Dynamic Crop
,PTZ Tracking (ONVIF)
.md),Dot Visualization
,Reference Path Visualization
,Distance Measurement
,Rate Limiter
,Detections Consensus
,Stability AI Outpainting
,Time in Zone
,SIFT
,Perspective Correction
,Model Comparison Visualization
,Image Threshold
,Line Counter Visualization
,Florence-2 Model
,Blur Visualization
,Detection Offset
,Keypoint Visualization
,OCR Model
,Instance Segmentation Model
,Detections Transformation
,OpenAI
,Single-Label Classification Model
,Detections Stabilizer
,Roboflow Dataset Upload
,Halo Visualization
,Byte Tracker
,Detections Stitch
,Multi-Label Classification Model
,Clip Comparison
,VLM as Classifier
,Time in Zone
,Keypoint Detection Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Holds unconstrained text prompt to LMM mode.lmm_type
(string
): Type of LMM to be used.remote_api_key
(Union[string
,secret
]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v
..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step LMM
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}