LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
Clip Comparison,Florence-2 Model,Morphological Transformation,Google Gemini,LMM,Instance Segmentation Model,Polygon Zone Visualization,Email Notification,Keypoint Visualization,Roboflow Custom Metadata,Camera Focus,Anthropic Claude,Multi-Label Classification Model,Image Threshold,LMM For Classification,Keypoint Detection Model,Anthropic Claude,Email Notification,Reference Path Visualization,Stitch OCR Detections,Camera Focus,Image Slicer,Stability AI Image Generation,Stability AI Outpainting,Stitch Images,Blur Visualization,OpenAI,Roboflow Dataset Upload,Depth Estimation,Google Gemini,CogVLM,Image Preprocessing,Local File Sink,Florence-2 Model,Image Convert Grayscale,Dynamic Crop,Dot Visualization,Triangle Visualization,OCR Model,Crop Visualization,Twilio SMS Notification,Perspective Correction,Twilio SMS/MMS Notification,EasyOCR,Grid Visualization,Google Gemini,Trace Visualization,QR Code Generator,Pixelate Visualization,OpenAI,Camera Calibration,Roboflow Dataset Upload,Webhook Sink,Single-Label Classification Model,Object Detection Model,VLM as Detector,Background Subtraction,Bounding Box Visualization,Contrast Equalization,Halo Visualization,Model Comparison Visualization,Label Visualization,Slack Notification,OpenAI,Circle Visualization,Image Contours,Background Color Visualization,Image Blur,Mask Visualization,VLM as Classifier,Google Vision OCR,Llama 3.2 Vision,Color Visualization,Corner Visualization,Classification Label Visualization,OpenAI,Line Counter Visualization,Ellipse Visualization,Icon Visualization,Model Monitoring Inference Aggregator,Image Slicer,Absolute Static Crop,Polygon Visualization,SIFT Comparison,Stability AI Inpainting,Relative Static Crop,SIFT,CSV Formatter,Text Display - outputs:
Clip Comparison,Morphological Transformation,Email Notification,Motion Detection,Detections Stitch,Anthropic Claude,Pixel Color Count,Detections Merge,Keypoint Detection Model,Reference Path Visualization,Stitch OCR Detections,Camera Focus,Expression,Stability AI Image Generation,Stitch Images,Stability AI Outpainting,Time in Zone,Rate Limiter,Bounding Rectangle,Roboflow Dataset Upload,Depth Estimation,Detections Transformation,CogVLM,Local File Sink,Identify Outliers,JSON Parser,SAM 3,Dynamic Crop,Time in Zone,Perception Encoder Embedding Model,Moondream2,Dot Visualization,Triangle Visualization,Cosine Similarity,Crop Visualization,PTZ Tracking (ONVIF).md),Twilio SMS Notification,Twilio SMS/MMS Notification,Perspective Correction,EasyOCR,Dimension Collapse,First Non Empty Or Default,Pixelate Visualization,Detections Consensus,OpenAI,Roboflow Dataset Upload,Buffer,Single-Label Classification Model,Object Detection Model,Barcode Detection,SIFT Comparison,Cache Set,Contrast Equalization,Byte Tracker,Halo Visualization,Model Comparison Visualization,Slack Notification,Byte Tracker,Dynamic Zone,Cache Get,Qwen2.5-VL,Image Contours,Image Blur,Background Color Visualization,Mask Visualization,Google Vision OCR,Path Deviation,Corner Visualization,Color Visualization,Clip Comparison,Template Matching,Line Counter Visualization,Icon Visualization,Ellipse Visualization,Velocity,Image Slicer,Detections Stabilizer,Absolute Static Crop,Stability AI Inpainting,SAM 3,Distance Measurement,Relative Static Crop,SIFT,CSV Formatter,Detections Filter,Blur Visualization,Multi-Label Classification Model,Instance Segmentation Model,Florence-2 Model,Google Gemini,LMM,Instance Segmentation Model,Polygon Zone Visualization,Keypoint Visualization,Roboflow Custom Metadata,Camera Focus,Multi-Label Classification Model,Detection Offset,Image Threshold,LMM For Classification,Anthropic Claude,Delta Filter,Email Notification,Gaze Detection,Overlap Filter,Property Definition,Image Slicer,SmolVLM2,OpenAI,Detection Event Log,YOLO-World Model,Google Gemini,Image Preprocessing,Florence-2 Model,VLM as Detector,Image Convert Grayscale,Time in Zone,Byte Tracker,OCR Model,Seg Preview,Path Deviation,Continue If,SAM 3,Detections List Roll-Up,Grid Visualization,Google Gemini,Line Counter,Object Detection Model,Trace Visualization,QR Code Generator,CLIP Embedding Model,Camera Calibration,Webhook Sink,QR Code Detection,VLM as Detector,Data Aggregator,Background Subtraction,Bounding Box Visualization,Label Visualization,OpenAI,Circle Visualization,VLM as Classifier,Dominant Color,Size Measurement,Llama 3.2 Vision,Classification Label Visualization,Single-Label Classification Model,Segment Anything 2 Model,OpenAI,Detections Combine,Detections Classes Replacement,Model Monitoring Inference Aggregator,Line Counter,VLM as Classifier,Polygon Visualization,SIFT Comparison,Keypoint Detection Model,Qwen3-VL,Identify Changes,Text Display
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[string,secret]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}