LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
Corner Visualization,Image Convert Grayscale,Label Visualization,Image Slicer,Image Blur,CSV Formatter,SIFT Comparison,Florence-2 Model,Google Gemini,OCR Model,Ellipse Visualization,Halo Visualization,Single-Label Classification Model,Webhook Sink,Contrast Equalization,Stability AI Outpainting,Camera Focus,Model Comparison Visualization,Stitch Images,Polygon Visualization,Object Detection Model,Stability AI Inpainting,Reference Path Visualization,OpenAI,OpenAI,Slack Notification,Circle Visualization,Background Subtraction,Stability AI Image Generation,Roboflow Dataset Upload,Icon Visualization,LMM For Classification,VLM as Classifier,Twilio SMS/MMS Notification,Model Monitoring Inference Aggregator,Color Visualization,Clip Comparison,Mask Visualization,Roboflow Dataset Upload,Anthropic Claude,Image Slicer,Pixelate Visualization,OpenAI,Email Notification,Image Contours,Google Gemini,Text Display,Blur Visualization,Stitch OCR Detections,Roboflow Custom Metadata,Triangle Visualization,Google Vision OCR,Relative Static Crop,Camera Focus,Classification Label Visualization,Multi-Label Classification Model,Image Threshold,LMM,Camera Calibration,Dot Visualization,Anthropic Claude,Background Color Visualization,Stitch OCR Detections,Polygon Zone Visualization,Keypoint Visualization,Grid Visualization,Dynamic Crop,Anthropic Claude,Keypoint Detection Model,Trace Visualization,Crop Visualization,Absolute Static Crop,Line Counter Visualization,Florence-2 Model,Google Gemini,Twilio SMS Notification,Image Preprocessing,Instance Segmentation Model,SIFT,Perspective Correction,Email Notification,Halo Visualization,EasyOCR,Local File Sink,Depth Estimation,CogVLM,Morphological Transformation,Polygon Visualization,OpenAI,QR Code Generator,Llama 3.2 Vision,VLM as Detector,Bounding Box Visualization - outputs:
Halo Visualization,Perception Encoder Embedding Model,Camera Focus,Detection Offset,Detection Event Log,Stability AI Inpainting,Reference Path Visualization,Slack Notification,Circle Visualization,Stability AI Image Generation,Roboflow Dataset Upload,LMM For Classification,YOLO-World Model,Cache Get,Clip Comparison,Pixelate Visualization,Byte Tracker,CLIP Embedding Model,Email Notification,Image Contours,VLM as Detector,Camera Focus,Byte Tracker,LMM,Dot Visualization,Dimension Collapse,Trace Visualization,Crop Visualization,Absolute Static Crop,Google Gemini,Segment Anything 2 Model,Byte Tracker,Image Preprocessing,Instance Segmentation Model,Perspective Correction,Cosine Similarity,Halo Visualization,SIFT Comparison,Depth Estimation,OpenAI,Continue If,QR Code Generator,Cache Set,Bounding Box Visualization,Corner Visualization,Data Aggregator,Label Visualization,Florence-2 Model,SIFT Comparison,Google Gemini,OCR Model,Contrast Equalization,Stability AI Outpainting,Qwen3-VL,Stitch Images,Distance Measurement,Polygon Visualization,Detections Stabilizer,OpenAI,Path Deviation,Twilio SMS/MMS Notification,SAM 3,Mask Visualization,Time in Zone,Image Slicer,Template Matching,OpenAI,Dynamic Zone,Roboflow Custom Metadata,Triangle Visualization,Google Vision OCR,Detections Combine,Image Threshold,PTZ Tracking (ONVIF).md),Time in Zone,Polygon Zone Visualization,Keypoint Detection Model,SAM 3,Line Counter Visualization,Florence-2 Model,Detections Stitch,Moondream2,Twilio SMS Notification,SIFT,Morphological Transformation,Single-Label Classification Model,Velocity,VLM as Detector,Image Convert Grayscale,Image Slicer,SmolVLM2,Image Blur,Ellipse Visualization,Line Counter,OpenAI,Background Subtraction,VLM as Classifier,Pixel Color Count,Detections Merge,Barcode Detection,Rate Limiter,Anthropic Claude,Line Counter,Buffer,JSON Parser,Stitch OCR Detections,Relative Static Crop,Detections Consensus,Delta Filter,Multi-Label Classification Model,Anthropic Claude,Stitch OCR Detections,Qwen2.5-VL,Dominant Color,Keypoint Visualization,Anthropic Claude,Detections Transformation,First Non Empty Or Default,Expression,Overlap Filter,Gaze Detection,Identify Changes,Email Notification,Motion Detection,VLM as Classifier,Path Deviation,Local File Sink,EasyOCR,CogVLM,Polygon Visualization,QR Code Detection,Property Definition,Size Measurement,Clip Comparison,CSV Formatter,Webhook Sink,Single-Label Classification Model,Model Comparison Visualization,Detections Filter,Object Detection Model,Detections List Roll-Up,Icon Visualization,Model Monitoring Inference Aggregator,Object Detection Model,Multi-Label Classification Model,Color Visualization,Roboflow Dataset Upload,Detections Classes Replacement,Bounding Rectangle,Instance Segmentation Model,Keypoint Detection Model,Google Gemini,Text Display,Blur Visualization,Identify Outliers,SAM 3,Classification Label Visualization,Camera Calibration,Background Color Visualization,Seg Preview,Grid Visualization,Dynamic Crop,Time in Zone,Llama 3.2 Vision
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[string,secret]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}