LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
Stitch OCR Detections,Image Threshold,Stitch Images,OpenAI,Mask Visualization,Circle Visualization,EasyOCR,Crop Visualization,Multi-Label Classification Model,Stability AI Outpainting,QR Code Generator,Text Display,Anthropic Claude,Relative Static Crop,Clip Comparison,OpenAI,Local File Sink,Stability AI Image Generation,Google Gemini,Image Slicer,Keypoint Detection Model,VLM As Detector,VLM As Classifier,Google Gemini,Qwen3.5-VL,Object Detection Model,Slack Notification,Florence-2 Model,Ellipse Visualization,Dot Visualization,Halo Visualization,Anthropic Claude,Model Comparison Visualization,OpenAI,Corner Visualization,Email Notification,Absolute Static Crop,Image Contours,Classification Label Visualization,Instance Segmentation Model,Roboflow Dataset Upload,Depth Estimation,Contrast Equalization,Image Slicer,Camera Focus,Label Visualization,Stitch OCR Detections,Llama 3.2 Vision,Grid Visualization,Background Subtraction,Camera Focus,Polygon Zone Visualization,Color Visualization,Heatmap Visualization,SIFT,OpenAI,CogVLM,Florence-2 Model,Model Monitoring Inference Aggregator,Bounding Box Visualization,Polygon Visualization,Roboflow Dataset Upload,Pixelate Visualization,Roboflow Custom Metadata,Image Blur,SIFT Comparison,Morphological Transformation,Dynamic Crop,Stability AI Inpainting,Background Color Visualization,Webhook Sink,LMM,Line Counter Visualization,Icon Visualization,Image Preprocessing,Twilio SMS Notification,Blur Visualization,CSV Formatter,Triangle Visualization,Google Vision OCR,Polygon Visualization,Google Gemini,OCR Model,Anthropic Claude,Trace Visualization,Email Notification,Twilio SMS/MMS Notification,Image Convert Grayscale,Reference Path Visualization,Single-Label Classification Model,Halo Visualization,Camera Calibration,LMM For Classification,Perspective Correction,Keypoint Visualization - outputs:
Moondream2,Image Threshold,Stitch Images,Byte Tracker,Size Measurement,Keypoint Detection Model,Path Deviation,QR Code Generator,VLM As Classifier,Google Gemini,Qwen3.5-VL,Slack Notification,OpenAI,Motion Detection,Email Notification,Rate Limiter,Instance Segmentation Model,Roboflow Dataset Upload,Depth Estimation,Label Visualization,Polygon Zone Visualization,Camera Focus,OpenAI,Dimension Collapse,Template Matching,Background Color Visualization,Object Detection Model,Clip Comparison,SIFT Comparison,Image Preprocessing,CSV Formatter,Twilio SMS/MMS Notification,Reference Path Visualization,LMM For Classification,Florence-2 Model,Perspective Correction,Stitch OCR Detections,OpenAI,Time in Zone,Circle Visualization,Seg Preview,Stability AI Outpainting,Text Display,Line Counter,Detections Combine,Local File Sink,Google Gemini,Image Slicer,Ellipse Visualization,Byte Tracker,Halo Visualization,Model Comparison Visualization,Absolute Static Crop,Classification Label Visualization,Dominant Color,Image Slicer,Detections Stitch,Camera Focus,Barcode Detection,Time in Zone,Background Subtraction,Cosine Similarity,SAM 3,Heatmap Visualization,SIFT,CogVLM,Line Counter,Cache Set,Bounding Box Visualization,Polygon Visualization,Roboflow Custom Metadata,Pixelate Visualization,Pixel Color Count,SIFT Comparison,Detections Classes Replacement,Morphological Transformation,Perception Encoder Embedding Model,Detection Offset,Detection Event Log,VLM As Classifier,SmolVLM2,Twilio SMS Notification,Google Gemini,Detections Merge,Velocity,VLM As Detector,Keypoint Visualization,Multi-Label Classification Model,Mask Visualization,Instance Segmentation Model,Crop Visualization,Detections Stabilizer,Clip Comparison,Continue If,Property Definition,Segment Anything 2 Model,Stability AI Image Generation,VLM As Detector,Overlap Filter,Object Detection Model,Dot Visualization,Detections List Roll-Up,Contrast Equalization,Cache Get,Stitch OCR Detections,Llama 3.2 Vision,Detections Filter,Color Visualization,Florence-2 Model,Model Monitoring Inference Aggregator,JSON Parser,Dynamic Crop,Line Counter Visualization,PTZ Tracking (ONVIF),Blur Visualization,Triangle Visualization,Gaze Detection,OCR Model,Trace Visualization,Email Notification,CLIP Embedding Model,Byte Tracker,Image Convert Grayscale,First Non Empty Or Default,Expression,YOLO-World Model,Single-Label Classification Model,EasyOCR,Detections Consensus,Multi-Label Classification Model,SAM 3,Detections Transformation,Anthropic Claude,Path Deviation,QR Code Detection,Relative Static Crop,OpenAI,Keypoint Detection Model,Bounding Rectangle,Distance Measurement,Anthropic Claude,Corner Visualization,Buffer,Identify Outliers,Image Contours,Grid Visualization,Qwen2.5-VL,Identify Changes,Roboflow Dataset Upload,Image Blur,Webhook Sink,Stability AI Inpainting,LMM,Icon Visualization,Qwen3-VL,Google Vision OCR,Data Aggregator,Polygon Visualization,Anthropic Claude,Mask Area Measurement,SAM 3,Time in Zone,Single-Label Classification Model,Delta Filter,Dynamic Zone,Halo Visualization,Camera Calibration
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[secret,string]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}