LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/lmm@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v .. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM
in version v1
.
- inputs:
Image Slicer
,Stability AI Inpainting
,Clip Comparison
,Perspective Correction
,Object Detection Model
,Roboflow Custom Metadata
,SIFT Comparison
,Grid Visualization
,Ellipse Visualization
,SIFT
,CogVLM
,VLM as Detector
,Image Contours
,OpenAI
,Absolute Static Crop
,Camera Focus
,Polygon Visualization
,Trace Visualization
,Multi-Label Classification Model
,Dot Visualization
,Google Vision OCR
,Polygon Zone Visualization
,Roboflow Dataset Upload
,Classification Label Visualization
,Corner Visualization
,Llama 3.2 Vision
,Dynamic Crop
,Reference Path Visualization
,Label Visualization
,Mask Visualization
,Triangle Visualization
,Line Counter Visualization
,Model Monitoring Inference Aggregator
,Blur Visualization
,Anthropic Claude
,Webhook Sink
,Instance Segmentation Model
,Slack Notification
,Stitch OCR Detections
,Pixelate Visualization
,Relative Static Crop
,Twilio SMS Notification
,VLM as Classifier
,Roboflow Dataset Upload
,Google Gemini
,Model Comparison Visualization
,Halo Visualization
,Crop Visualization
,Image Blur
,Circle Visualization
,Keypoint Detection Model
,Image Preprocessing
,Background Color Visualization
,Florence-2 Model
,Bounding Box Visualization
,Florence-2 Model
,Local File Sink
,Image Slicer
,LMM For Classification
,Stitch Images
,Stability AI Image Generation
,Image Threshold
,OCR Model
,LMM
,Keypoint Visualization
,Email Notification
,Color Visualization
,Single-Label Classification Model
,CSV Formatter
,Image Convert Grayscale
,OpenAI
- outputs:
Cache Set
,Object Detection Model
,Object Detection Model
,Detection Offset
,CogVLM
,Ellipse Visualization
,Grid Visualization
,SIFT
,Camera Focus
,Property Definition
,CLIP Embedding Model
,Dot Visualization
,Google Vision OCR
,Clip Comparison
,Identify Changes
,Polygon Zone Visualization
,Gaze Detection
,Classification Label Visualization
,Corner Visualization
,Dynamic Crop
,Label Visualization
,Detections Stabilizer
,Triangle Visualization
,Dynamic Zone
,Dominant Color
,Time in Zone
,Barcode Detection
,Blur Visualization
,Line Counter
,Instance Segmentation Model
,Webhook Sink
,Cosine Similarity
,Path Deviation
,Relative Static Crop
,Detections Consensus
,Twilio SMS Notification
,Crop Visualization
,Qwen2.5-VL
,Distance Measurement
,Circle Visualization
,Velocity
,Keypoint Detection Model
,QR Code Detection
,Size Measurement
,Single-Label Classification Model
,Bounding Box Visualization
,LMM For Classification
,Image Threshold
,Detections Stitch
,OCR Model
,Keypoint Visualization
,Single-Label Classification Model
,Detections Classes Replacement
,Polygon Visualization
,Segment Anything 2 Model
,Cache Get
,Image Slicer
,Stability AI Inpainting
,Clip Comparison
,Perspective Correction
,Roboflow Custom Metadata
,SIFT Comparison
,VLM as Detector
,Image Contours
,Multi-Label Classification Model
,OpenAI
,Absolute Static Crop
,Trace Visualization
,Multi-Label Classification Model
,VLM as Detector
,Identify Outliers
,Roboflow Dataset Upload
,First Non Empty Or Default
,VLM as Classifier
,Llama 3.2 Vision
,Byte Tracker
,Line Counter
,Reference Path Visualization
,Mask Visualization
,Line Counter Visualization
,Template Matching
,Detections Transformation
,Model Monitoring Inference Aggregator
,Anthropic Claude
,SIFT Comparison
,Time in Zone
,Instance Segmentation Model
,Slack Notification
,Detections Filter
,Stitch OCR Detections
,Pixelate Visualization
,Delta Filter
,Dimension Collapse
,VLM as Classifier
,Roboflow Dataset Upload
,Keypoint Detection Model
,Google Gemini
,Rate Limiter
,Model Comparison Visualization
,Halo Visualization
,JSON Parser
,Byte Tracker
,Expression
,Image Blur
,Buffer
,Image Preprocessing
,Background Color Visualization
,Continue If
,Bounding Rectangle
,Pixel Color Count
,Florence-2 Model
,Florence-2 Model
,Local File Sink
,Byte Tracker
,Image Slicer
,Stitch Images
,Stability AI Image Generation
,LMM
,Email Notification
,Color Visualization
,Path Deviation
,YOLO-World Model
,Data Aggregator
,CSV Formatter
,Image Convert Grayscale
,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Holds unconstrained text prompt to LMM mode.lmm_type
(string
): Type of LMM to be used.remote_api_key
(Union[secret
,string
]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v
..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step LMM
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}