LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/lmm@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v .. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM
in version v1
.
- inputs:
Model Monitoring Inference Aggregator
,Bounding Box Visualization
,Llama 3.2 Vision
,Twilio SMS Notification
,Stability AI Outpainting
,Image Threshold
,Model Comparison Visualization
,SIFT Comparison
,LMM
,Image Slicer
,Corner Visualization
,Background Color Visualization
,Image Contours
,CogVLM
,Mask Visualization
,QR Code Generator
,Classification Label Visualization
,Trace Visualization
,Polygon Visualization
,Perspective Correction
,Florence-2 Model
,Local File Sink
,Grid Visualization
,Clip Comparison
,Instance Segmentation Model
,Image Convert Grayscale
,LMM For Classification
,Dot Visualization
,Google Gemini
,Relative Static Crop
,Ellipse Visualization
,Keypoint Detection Model
,Object Detection Model
,Halo Visualization
,Polygon Zone Visualization
,Icon Visualization
,Triangle Visualization
,Crop Visualization
,Slack Notification
,Pixelate Visualization
,CSV Formatter
,Stitch Images
,SIFT
,Color Visualization
,Stitch OCR Detections
,Single-Label Classification Model
,Email Notification
,Blur Visualization
,Anthropic Claude
,Camera Focus
,Absolute Static Crop
,Label Visualization
,Florence-2 Model
,Line Counter Visualization
,Multi-Label Classification Model
,Reference Path Visualization
,Camera Calibration
,Image Blur
,Dynamic Crop
,OpenAI
,VLM as Classifier
,Roboflow Dataset Upload
,Circle Visualization
,Webhook Sink
,OCR Model
,Image Slicer
,Depth Estimation
,OpenAI
,OpenAI
,Image Preprocessing
,Stability AI Inpainting
,Keypoint Visualization
,Roboflow Dataset Upload
,Stability AI Image Generation
,VLM as Detector
,Roboflow Custom Metadata
,Google Vision OCR
- outputs:
Llama 3.2 Vision
,Bounding Box Visualization
,Twilio SMS Notification
,Stability AI Outpainting
,Overlap Filter
,Model Comparison Visualization
,Image Slicer
,Background Color Visualization
,Mask Visualization
,QR Code Generator
,Classification Label Visualization
,Buffer
,Polygon Visualization
,Instance Segmentation Model
,Florence-2 Model
,Local File Sink
,Grid Visualization
,Clip Comparison
,Instance Segmentation Model
,LMM For Classification
,Dot Visualization
,Pixel Color Count
,Keypoint Detection Model
,Halo Visualization
,VLM as Detector
,Icon Visualization
,Crop Visualization
,Slack Notification
,Pixelate Visualization
,CLIP Embedding Model
,Data Aggregator
,Single-Label Classification Model
,SIFT
,Stitch OCR Detections
,QR Code Detection
,JSON Parser
,Cosine Similarity
,SmolVLM2
,Camera Focus
,Florence-2 Model
,Line Counter Visualization
,Dynamic Crop
,Byte Tracker
,Roboflow Dataset Upload
,Segment Anything 2 Model
,Cache Set
,Webhook Sink
,YOLO-World Model
,Identify Outliers
,Byte Tracker
,Image Slicer
,OpenAI
,Image Preprocessing
,Stability AI Inpainting
,Keypoint Visualization
,Identify Changes
,Barcode Detection
,Multi-Label Classification Model
,Stability AI Image Generation
,Detections Filter
,Model Monitoring Inference Aggregator
,Keypoint Detection Model
,Detections Merge
,First Non Empty Or Default
,Moondream2
,Image Threshold
,Detections Stabilizer
,SIFT Comparison
,LMM
,Gaze Detection
,Corner Visualization
,Distance Measurement
,Dimension Collapse
,Time in Zone
,Image Contours
,CogVLM
,Rate Limiter
,Property Definition
,Detections Transformation
,Trace Visualization
,Perspective Correction
,Path Deviation
,Continue If
,SIFT Comparison
,Byte Tracker
,Image Convert Grayscale
,PTZ Tracking (ONVIF)
.md),Line Counter
,Google Gemini
,Ellipse Visualization
,Object Detection Model
,Relative Static Crop
,Dynamic Zone
,Time in Zone
,Polygon Zone Visualization
,Triangle Visualization
,Size Measurement
,Single-Label Classification Model
,CSV Formatter
,Path Deviation
,Stitch Images
,Color Visualization
,Email Notification
,Blur Visualization
,VLM as Classifier
,Expression
,Anthropic Claude
,Absolute Static Crop
,Label Visualization
,Detections Consensus
,Detection Offset
,Cache Get
,Multi-Label Classification Model
,Reference Path Visualization
,Velocity
,Camera Calibration
,Image Blur
,Qwen2.5-VL
,OpenAI
,VLM as Classifier
,Circle Visualization
,Template Matching
,Dominant Color
,Delta Filter
,OCR Model
,Depth Estimation
,Bounding Rectangle
,OpenAI
,Line Counter
,Detections Classes Replacement
,Roboflow Dataset Upload
,Clip Comparison
,Object Detection Model
,VLM as Detector
,Detections Stitch
,Perception Encoder Embedding Model
,Roboflow Custom Metadata
,Google Vision OCR
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Holds unconstrained text prompt to LMM mode.lmm_type
(string
): Type of LMM to be used.remote_api_key
(Union[string
,secret
]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v
..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step LMM
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}