LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
Polygon Zone Visualization,Relative Static Crop,Stitch OCR Detections,CSV Formatter,Image Preprocessing,Webhook Sink,EasyOCR,OCR Model,Stitch OCR Detections,Google Gemini,Roboflow Dataset Upload,Contrast Equalization,Corner Visualization,Triangle Visualization,Text Display,Slack Notification,Image Slicer,Dynamic Crop,Bounding Box Visualization,Stitch Images,SIFT,CogVLM,Email Notification,LMM For Classification,Halo Visualization,Icon Visualization,Morphological Transformation,Anthropic Claude,Object Detection Model,QR Code Generator,Grid Visualization,Heatmap Visualization,Google Vision OCR,Image Blur,Florence-2 Model,Instance Segmentation Model,Google Gemini,Model Comparison Visualization,Email Notification,OpenAI,Twilio SMS Notification,Blur Visualization,Absolute Static Crop,Google Gemini,Multi-Label Classification Model,Halo Visualization,Roboflow Dataset Upload,Single-Label Classification Model,Local File Sink,Dot Visualization,Stability AI Outpainting,Florence-2 Model,OpenAI,VLM As Classifier,Image Contours,Stability AI Image Generation,Ellipse Visualization,Pixelate Visualization,Reference Path Visualization,Keypoint Visualization,Crop Visualization,Twilio SMS/MMS Notification,Circle Visualization,Roboflow Custom Metadata,Background Color Visualization,Trace Visualization,Camera Focus,Color Visualization,Polygon Visualization,VLM As Detector,OpenAI,Polygon Visualization,Anthropic Claude,Image Threshold,Camera Focus,Keypoint Detection Model,Mask Visualization,Llama 3.2 Vision,Model Monitoring Inference Aggregator,Stability AI Inpainting,Background Subtraction,Classification Label Visualization,Image Slicer,OpenAI,Line Counter Visualization,Perspective Correction,Clip Comparison,Camera Calibration,Depth Estimation,Label Visualization,SIFT Comparison,Image Convert Grayscale,Anthropic Claude,LMM - outputs:
Motion Detection,Webhook Sink,Qwen2.5-VL,Path Deviation,Triangle Visualization,Slack Notification,Bounding Box Visualization,Stitch Images,Email Notification,Icon Visualization,Anthropic Claude,Mask Area Measurement,Heatmap Visualization,Grid Visualization,Property Definition,Detections Filter,Model Comparison Visualization,Cache Get,Absolute Static Crop,Byte Tracker,Delta Filter,Halo Visualization,Local File Sink,Detections Merge,Stability AI Outpainting,Clip Comparison,Detection Event Log,Dynamic Zone,SAM 3,Time in Zone,Seg Preview,Keypoint Visualization,Roboflow Custom Metadata,Template Matching,Instance Segmentation Model,Camera Focus,Line Counter,Detections Consensus,CLIP Embedding Model,Velocity,Polygon Visualization,VLM As Detector,Time in Zone,Line Counter,Classification Label Visualization,Clip Comparison,Identify Outliers,Polygon Zone Visualization,Relative Static Crop,EasyOCR,Continue If,Detections Transformation,Dimension Collapse,Contrast Equalization,Corner Visualization,SIFT Comparison,CogVLM,Expression,Object Detection Model,Google Vision OCR,Florence-2 Model,Google Gemini,Blur Visualization,Cosine Similarity,JSON Parser,Google Gemini,Multi-Label Classification Model,Roboflow Dataset Upload,Dot Visualization,Dominant Color,Image Contours,SAM 3,Stability AI Image Generation,First Non Empty Or Default,Pixel Color Count,Multi-Label Classification Model,Twilio SMS/MMS Notification,Circle Visualization,Trace Visualization,Object Detection Model,Segment Anything 2 Model,Cache Set,VLM As Detector,Stability AI Inpainting,Buffer,Camera Calibration,Anthropic Claude,LMM,PTZ Tracking (ONVIF).md),Image Preprocessing,Detections Classes Replacement,Stitch OCR Detections,Google Gemini,Detections Stitch,Barcode Detection,LMM For Classification,Morphological Transformation,Bounding Rectangle,Email Notification,OpenAI,Twilio SMS Notification,Distance Measurement,Path Deviation,Perception Encoder Embedding Model,Detections List Roll-Up,OpenAI,Florence-2 Model,VLM As Classifier,SmolVLM2,Background Color Visualization,Anthropic Claude,Camera Focus,Keypoint Detection Model,Mask Visualization,Llama 3.2 Vision,Detection Offset,Byte Tracker,OpenAI,Line Counter Visualization,Single-Label Classification Model,Keypoint Detection Model,SIFT Comparison,Image Convert Grayscale,Detections Stabilizer,Stitch OCR Detections,CSV Formatter,Byte Tracker,Data Aggregator,OCR Model,Roboflow Dataset Upload,Text Display,Time in Zone,Image Slicer,Dynamic Crop,Detections Combine,SIFT,Identify Changes,Halo Visualization,Overlap Filter,Size Measurement,QR Code Generator,Gaze Detection,Image Blur,Instance Segmentation Model,QR Code Detection,Single-Label Classification Model,VLM As Classifier,Ellipse Visualization,Pixelate Visualization,Reference Path Visualization,Moondream2,YOLO-World Model,Crop Visualization,Rate Limiter,Color Visualization,Qwen3-VL,OpenAI,Polygon Visualization,Image Threshold,Model Monitoring Inference Aggregator,Background Subtraction,Image Slicer,Perspective Correction,Depth Estimation,Label Visualization,SAM 3
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[string,secret]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}