LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/lmm@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v .. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM
in version v1
.
- inputs:
Image Contours
,Stability AI Inpainting
,Corner Visualization
,CSV Formatter
,Google Gemini
,Line Counter Visualization
,Reference Path Visualization
,Keypoint Detection Model
,Model Monitoring Inference Aggregator
,Florence-2 Model
,Circle Visualization
,OCR Model
,Llama 3.2 Vision
,Relative Static Crop
,Roboflow Dataset Upload
,Image Convert Grayscale
,Pixelate Visualization
,Model Comparison Visualization
,Trace Visualization
,LMM
,Twilio SMS Notification
,Roboflow Dataset Upload
,Depth Estimation
,Label Visualization
,Classification Label Visualization
,Blur Visualization
,OpenAI
,Color Visualization
,Bounding Box Visualization
,Anthropic Claude
,Ellipse Visualization
,Instance Segmentation Model
,Polygon Zone Visualization
,Object Detection Model
,Roboflow Custom Metadata
,Image Slicer
,Image Slicer
,Crop Visualization
,Perspective Correction
,Halo Visualization
,Dot Visualization
,Mask Visualization
,Keypoint Visualization
,Local File Sink
,Absolute Static Crop
,Stitch OCR Detections
,Image Blur
,OpenAI
,VLM as Classifier
,Clip Comparison
,Triangle Visualization
,Background Color Visualization
,SIFT Comparison
,Florence-2 Model
,Camera Calibration
,Google Vision OCR
,Image Threshold
,Single-Label Classification Model
,Image Preprocessing
,OpenAI
,CogVLM
,Slack Notification
,VLM as Detector
,Stability AI Image Generation
,SIFT
,Grid Visualization
,Camera Focus
,Stitch Images
,Stability AI Outpainting
,Polygon Visualization
,Multi-Label Classification Model
,Webhook Sink
,Dynamic Crop
,Email Notification
,LMM For Classification
- outputs:
Image Contours
,Line Counter
,Stability AI Inpainting
,Corner Visualization
,CSV Formatter
,Google Gemini
,Model Monitoring Inference Aggregator
,Florence-2 Model
,Expression
,YOLO-World Model
,SIFT Comparison
,Property Definition
,JSON Parser
,PTZ Tracking (ONVIF)
.md),Image Convert Grayscale
,Byte Tracker
,Pixelate Visualization
,Dominant Color
,Model Comparison Visualization
,Trace Visualization
,Detections Stitch
,Path Deviation
,Twilio SMS Notification
,Velocity
,Bounding Rectangle
,Label Visualization
,Classification Label Visualization
,Blur Visualization
,Moondream2
,Bounding Box Visualization
,Template Matching
,Anthropic Claude
,Pixel Color Count
,Detection Offset
,Polygon Zone Visualization
,VLM as Classifier
,Gaze Detection
,Image Slicer
,Image Slicer
,Cache Get
,Crop Visualization
,Perspective Correction
,Dot Visualization
,QR Code Detection
,Mask Visualization
,Local File Sink
,Absolute Static Crop
,Image Blur
,OpenAI
,Path Deviation
,Detections Merge
,Distance Measurement
,Triangle Visualization
,Background Color Visualization
,CLIP Embedding Model
,SIFT Comparison
,Florence-2 Model
,Camera Calibration
,Image Threshold
,Single-Label Classification Model
,Buffer
,OpenAI
,Rate Limiter
,CogVLM
,Keypoint Detection Model
,SIFT
,Grid Visualization
,Stitch Images
,Identify Outliers
,Stability AI Outpainting
,Size Measurement
,Polygon Visualization
,Barcode Detection
,Byte Tracker
,Delta Filter
,LMM For Classification
,Detections Consensus
,Time in Zone
,Line Counter Visualization
,Reference Path Visualization
,Keypoint Detection Model
,Continue If
,Circle Visualization
,Llama 3.2 Vision
,OCR Model
,Relative Static Crop
,Dimension Collapse
,Roboflow Dataset Upload
,Dynamic Zone
,Detections Classes Replacement
,Multi-Label Classification Model
,Object Detection Model
,LMM
,Roboflow Dataset Upload
,Depth Estimation
,Perception Encoder Embedding Model
,OpenAI
,Color Visualization
,Instance Segmentation Model
,Ellipse Visualization
,Detections Transformation
,Object Detection Model
,Roboflow Custom Metadata
,VLM as Detector
,Halo Visualization
,Cache Set
,Keypoint Visualization
,Detections Stabilizer
,SmolVLM2
,Line Counter
,Stitch OCR Detections
,Clip Comparison
,Cosine Similarity
,VLM as Classifier
,Qwen2.5-VL
,Clip Comparison
,Byte Tracker
,Identify Changes
,Instance Segmentation Model
,Data Aggregator
,Overlap Filter
,Segment Anything 2 Model
,Google Vision OCR
,Image Preprocessing
,Slack Notification
,VLM as Detector
,Stability AI Image Generation
,First Non Empty Or Default
,Camera Focus
,Multi-Label Classification Model
,Webhook Sink
,Dynamic Crop
,Time in Zone
,Email Notification
,Single-Label Classification Model
,Detections Filter
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Holds unconstrained text prompt to LMM mode.lmm_type
(string
): Type of LMM to be used.remote_api_key
(Union[secret
,string
]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v
..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step LMM
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}