LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/lmm@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v .. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM
in version v1
.
- inputs:
Circle Visualization
,Polygon Zone Visualization
,Image Slicer
,Reference Path Visualization
,Image Contours
,Dynamic Crop
,Blur Visualization
,Stability AI Inpainting
,SIFT Comparison
,OpenAI
,LMM For Classification
,Instance Segmentation Model
,Absolute Static Crop
,Trace Visualization
,Classification Label Visualization
,Crop Visualization
,Camera Calibration
,Stitch Images
,Line Counter Visualization
,Email Notification
,Corner Visualization
,VLM as Classifier
,Google Vision OCR
,Multi-Label Classification Model
,Perspective Correction
,Object Detection Model
,Triangle Visualization
,Image Blur
,Clip Comparison
,Webhook Sink
,Twilio SMS Notification
,VLM as Detector
,Image Preprocessing
,CSV Formatter
,Pixelate Visualization
,Anthropic Claude
,Background Color Visualization
,Roboflow Custom Metadata
,Polygon Visualization
,Stability AI Outpainting
,Grid Visualization
,CogVLM
,Color Visualization
,Google Gemini
,Model Monitoring Inference Aggregator
,Relative Static Crop
,Bounding Box Visualization
,Llama 3.2 Vision
,Halo Visualization
,Florence-2 Model
,Image Convert Grayscale
,Camera Focus
,OpenAI
,Ellipse Visualization
,Single-Label Classification Model
,OpenAI
,Mask Visualization
,Roboflow Dataset Upload
,Model Comparison Visualization
,Dot Visualization
,Stability AI Image Generation
,Depth Estimation
,Keypoint Visualization
,LMM
,Keypoint Detection Model
,OCR Model
,Stitch OCR Detections
,Image Threshold
,Roboflow Dataset Upload
,Florence-2 Model
,SIFT
,Local File Sink
,Label Visualization
,Image Slicer
,Slack Notification
- outputs:
Detections Consensus
,Circle Visualization
,Single-Label Classification Model
,Image Contours
,Dynamic Crop
,Size Measurement
,Distance Measurement
,VLM as Detector
,Perception Encoder Embedding Model
,Detections Filter
,Gaze Detection
,OpenAI
,Absolute Static Crop
,Property Definition
,Crop Visualization
,Segment Anything 2 Model
,Camera Calibration
,Stitch Images
,Corner Visualization
,Time in Zone
,VLM as Classifier
,Google Vision OCR
,Dimension Collapse
,Triangle Visualization
,Pixel Color Count
,Cache Get
,Clip Comparison
,Twilio SMS Notification
,Image Preprocessing
,CSV Formatter
,Pixelate Visualization
,Keypoint Detection Model
,Buffer
,Background Color Visualization
,Polygon Visualization
,Stability AI Outpainting
,Identify Changes
,Color Visualization
,Dominant Color
,Delta Filter
,Model Monitoring Inference Aggregator
,Continue If
,Relative Static Crop
,SmolVLM2
,YOLO-World Model
,Halo Visualization
,Camera Focus
,Template Matching
,Clip Comparison
,OpenAI
,Ellipse Visualization
,OpenAI
,First Non Empty Or Default
,Dynamic Zone
,Detections Transformation
,Byte Tracker
,Keypoint Visualization
,Barcode Detection
,Time in Zone
,Roboflow Dataset Upload
,Local File Sink
,Image Threshold
,SIFT
,Detections Merge
,Label Visualization
,Image Slicer
,Cosine Similarity
,Detections Stitch
,Polygon Zone Visualization
,Reference Path Visualization
,Image Slicer
,Qwen2.5-VL
,Blur Visualization
,Detection Offset
,Cache Set
,Stability AI Inpainting
,SIFT Comparison
,LMM For Classification
,Instance Segmentation Model
,Trace Visualization
,Classification Label Visualization
,Line Counter Visualization
,Email Notification
,CLIP Embedding Model
,Path Deviation
,Line Counter
,Moondream2
,Multi-Label Classification Model
,Perspective Correction
,Object Detection Model
,Path Deviation
,Image Blur
,Detections Classes Replacement
,Webhook Sink
,VLM as Detector
,Anthropic Claude
,Expression
,JSON Parser
,Roboflow Custom Metadata
,Byte Tracker
,Grid Visualization
,CogVLM
,Google Gemini
,Bounding Rectangle
,Bounding Box Visualization
,Llama 3.2 Vision
,Identify Outliers
,Multi-Label Classification Model
,Rate Limiter
,Florence-2 Model
,Image Convert Grayscale
,QR Code Detection
,Single-Label Classification Model
,Data Aggregator
,Mask Visualization
,Roboflow Dataset Upload
,Object Detection Model
,Model Comparison Visualization
,Line Counter
,Dot Visualization
,Stability AI Image Generation
,Depth Estimation
,Instance Segmentation Model
,Velocity
,Byte Tracker
,LMM
,VLM as Classifier
,Keypoint Detection Model
,OCR Model
,Stitch OCR Detections
,SIFT Comparison
,Florence-2 Model
,PTZ Tracking (ONVIF)
.md),Detections Stabilizer
,Overlap Filter
,Slack Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Holds unconstrained text prompt to LMM mode.lmm_type
(string
): Type of LMM to be used.remote_api_key
(Union[secret
,string
]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v
..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step LMM
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}