LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/lmm@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v .. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM
in version v1
.
- inputs:
Stability AI Outpainting
,CogVLM
,Image Slicer
,Pixelate Visualization
,Webhook Sink
,Image Threshold
,Blur Visualization
,VLM as Classifier
,Twilio SMS Notification
,Image Slicer
,Image Blur
,Camera Calibration
,Morphological Transformation
,OpenAI
,Polygon Visualization
,Dot Visualization
,Florence-2 Model
,Single-Label Classification Model
,Background Color Visualization
,VLM as Detector
,Stability AI Image Generation
,Multi-Label Classification Model
,Roboflow Dataset Upload
,Corner Visualization
,Halo Visualization
,CSV Formatter
,Mask Visualization
,Trace Visualization
,OCR Model
,Model Monitoring Inference Aggregator
,Camera Focus
,Color Visualization
,Clip Comparison
,LMM
,Ellipse Visualization
,Model Comparison Visualization
,Triangle Visualization
,Image Preprocessing
,Anthropic Claude
,Roboflow Custom Metadata
,SIFT
,Depth Estimation
,Email Notification
,Dynamic Crop
,Line Counter Visualization
,Crop Visualization
,Image Contours
,Google Gemini
,Grid Visualization
,Contrast Equalization
,Instance Segmentation Model
,Google Vision OCR
,OpenAI
,SIFT Comparison
,Classification Label Visualization
,Roboflow Dataset Upload
,Stitch Images
,Keypoint Visualization
,Absolute Static Crop
,OpenAI
,Perspective Correction
,Polygon Zone Visualization
,Keypoint Detection Model
,Llama 3.2 Vision
,Stitch OCR Detections
,Image Convert Grayscale
,QR Code Generator
,Local File Sink
,Circle Visualization
,Slack Notification
,Icon Visualization
,Object Detection Model
,Stability AI Inpainting
,Florence-2 Model
,Bounding Box Visualization
,LMM For Classification
,EasyOCR
,Label Visualization
,Reference Path Visualization
,Relative Static Crop
- outputs:
Size Measurement
,CogVLM
,Image Slicer
,Pixelate Visualization
,Webhook Sink
,Clip Comparison
,Image Threshold
,SmolVLM2
,Twilio SMS Notification
,Image Slicer
,Moondream2
,Image Blur
,Detections Consensus
,Morphological Transformation
,Polygon Visualization
,Single-Label Classification Model
,Florence-2 Model
,Time in Zone
,Single-Label Classification Model
,Background Color Visualization
,VLM as Detector
,Stability AI Image Generation
,Overlap Filter
,Line Counter
,Cache Set
,Detections Merge
,Detections Classes Replacement
,Trace Visualization
,Qwen2.5-VL
,LMM
,Detections Stitch
,Instance Segmentation Model
,Ellipse Visualization
,Velocity
,Object Detection Model
,First Non Empty Or Default
,Time in Zone
,Triangle Visualization
,Image Preprocessing
,Anthropic Claude
,Template Matching
,Keypoint Detection Model
,SIFT
,Depth Estimation
,Email Notification
,Gaze Detection
,Property Definition
,VLM as Detector
,Crop Visualization
,Image Contours
,Segment Anything 2 Model
,Contrast Equalization
,Instance Segmentation Model
,Google Vision OCR
,Time in Zone
,OpenAI
,SIFT Comparison
,Detections Combine
,Barcode Detection
,Roboflow Dataset Upload
,CLIP Embedding Model
,Classification Label Visualization
,Keypoint Visualization
,OpenAI
,Perspective Correction
,Pixel Color Count
,Byte Tracker
,Bounding Rectangle
,Polygon Zone Visualization
,Llama 3.2 Vision
,Dynamic Zone
,Image Convert Grayscale
,Detections Stabilizer
,Circle Visualization
,Icon Visualization
,Stability AI Inpainting
,Detections Transformation
,Path Deviation
,Dominant Color
,Rate Limiter
,EasyOCR
,Google Gemini
,Relative Static Crop
,Expression
,Buffer
,Stability AI Outpainting
,VLM as Classifier
,SIFT Comparison
,Blur Visualization
,Cache Get
,VLM as Classifier
,Byte Tracker
,Camera Calibration
,Delta Filter
,OpenAI
,Dot Visualization
,PTZ Tracking (ONVIF)
.md),Identify Changes
,Distance Measurement
,Multi-Label Classification Model
,Roboflow Dataset Upload
,Corner Visualization
,Halo Visualization
,CSV Formatter
,Mask Visualization
,OCR Model
,Model Monitoring Inference Aggregator
,Camera Focus
,Color Visualization
,Clip Comparison
,Identify Outliers
,Continue If
,Cosine Similarity
,Model Comparison Visualization
,Roboflow Custom Metadata
,Data Aggregator
,Detections Filter
,Line Counter Visualization
,Dynamic Crop
,Multi-Label Classification Model
,Dimension Collapse
,Perception Encoder Embedding Model
,Grid Visualization
,QR Code Detection
,Stitch Images
,Byte Tracker
,Absolute Static Crop
,Path Deviation
,Keypoint Detection Model
,Stitch OCR Detections
,YOLO-World Model
,QR Code Generator
,Local File Sink
,Slack Notification
,Object Detection Model
,Florence-2 Model
,Bounding Box Visualization
,LMM For Classification
,Detection Offset
,JSON Parser
,Label Visualization
,Reference Path Visualization
,Line Counter
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Holds unconstrained text prompt to LMM mode.lmm_type
(string
): Type of LMM to be used.remote_api_key
(Union[secret
,string
]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v
..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step LMM
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}