LMM¶
Deprecated
This block is deprecated and may be removed in a future release.
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
Icon Visualization,Roboflow Dataset Upload,Slack Notification,Label Visualization,Instance Segmentation Model,Object Detection Model,Dot Visualization,Camera Calibration,Polygon Zone Visualization,SIFT,Trace Visualization,Morphological Transformation,Perspective Correction,Roboflow Custom Metadata,Florence-2 Model,Anthropic Claude,Relative Static Crop,Image Slicer,Image Threshold,Keypoint Visualization,Absolute Static Crop,Blur Visualization,Circle Visualization,Email Notification,Camera Focus,Crop Visualization,Email Notification,Classification Label Visualization,EasyOCR,OpenAI,Google Gemini,OpenAI,Polygon Visualization,OpenAI,Stitch OCR Detections,Color Visualization,Local File Sink,Twilio SMS Notification,CSV Formatter,Twilio SMS/MMS Notification,Image Contours,Model Monitoring Inference Aggregator,Keypoint Detection Model,Anthropic Claude,Roboflow Vision Events,OpenAI,Google Gemini,Image Convert Grayscale,Llama 3.2 Vision,Clip Comparison,Triangle Visualization,Stability AI Inpainting,LMM For Classification,Background Color Visualization,Stitch OCR Detections,Depth Estimation,S3 Sink,Model Comparison Visualization,Qwen3.5-VL,CogVLM,Image Blur,Stitch Images,Anthropic Claude,Contrast Equalization,Google Gemini,Corner Visualization,Halo Visualization,Stability AI Image Generation,Florence-2 Model,Reference Path Visualization,LMM,Dynamic Crop,Line Counter Visualization,Roboflow Dataset Upload,Heatmap Visualization,Text Display,Multi-Label Classification Model,VLM As Classifier,VLM As Detector,Grid Visualization,Polygon Visualization,Camera Focus,Single-Label Classification Model,Webhook Sink,Image Slicer,Image Preprocessing,SIFT Comparison,Bounding Box Visualization,Stability AI Outpainting,Halo Visualization,OCR Model,Background Subtraction,GLM-OCR,QR Code Generator,Pixelate Visualization,Ellipse Visualization,Google Vision OCR,Mask Visualization - outputs:
Icon Visualization,Instance Segmentation Model,SAM 3,Dynamic Zone,Semantic Segmentation Model,Delta Filter,Relative Static Crop,Image Threshold,Keypoint Visualization,Blur Visualization,Circle Visualization,Detections Merge,Classification Label Visualization,OpenAI,Google Gemini,Identify Outliers,Twilio SMS Notification,CSV Formatter,Model Monitoring Inference Aggregator,Keypoint Detection Model,Path Deviation,Detections Filter,Inner Workflow,Model Comparison Visualization,Byte Tracker,Detections Classes Replacement,Florence-2 Model,Barcode Detection,Grid Visualization,Mask Visualization,Seg Preview,Bounding Rectangle,Qwen3-VL,Detection Event Log,Object Detection Model,Object Detection Model,Detections Stitch,SIFT,Morphological Transformation,Detections Combine,Cache Get,Anthropic Claude,Rate Limiter,Image Slicer,Absolute Static Crop,EasyOCR,Gaze Detection,Color Visualization,Local File Sink,Line Counter,Byte Tracker,Continue If,Mask Area Measurement,OpenAI,Single-Label Classification Model,Clip Comparison,Background Color Visualization,Stitch OCR Detections,Identify Changes,CogVLM,Qwen2.5-VL,Stitch Images,Dominant Color,Contrast Equalization,Velocity,Stability AI Image Generation,Reference Path Visualization,QR Code Detection,Heatmap Visualization,Text Display,VLM As Classifier,Camera Focus,Single-Label Classification Model,Image Preprocessing,SIFT Comparison,Bounding Box Visualization,Background Subtraction,QR Code Generator,Path Deviation,Florence-2 Model,Moondream2,Slack Notification,Label Visualization,Multi-Label Classification Model,Dot Visualization,Trace Visualization,Camera Calibration,Time in Zone,Roboflow Custom Metadata,PTZ Tracking (ONVIF),Overlap Filter,Template Matching,Single-Label Classification Model,Keypoint Detection Model,Crop Visualization,Email Notification,OpenAI,YOLO-World Model,Twilio SMS/MMS Notification,Anthropic Claude,Image Convert Grayscale,Time in Zone,Distance Measurement,Stability AI Inpainting,S3 Sink,Depth Estimation,SAM2 Video Tracker,Motion Detection,Google Gemini,Detections List Roll-Up,CLIP Embedding Model,Size Measurement,LMM,Dynamic Crop,Detection Offset,SAM 3,Perception Encoder Embedding Model,Clip Comparison,SIFT Comparison,Detections Stabilizer,Byte Tracker,VLM As Detector,Multi-Label Classification Model,Property Definition,Polygon Visualization,Webhook Sink,Semantic Segmentation Model,Image Slicer,SORT Tracker,OC-SORT Tracker,OCR Model,GLM-OCR,Roboflow Dataset Upload,Object Detection Model,Polygon Zone Visualization,Instance Segmentation Model,Perspective Correction,SmolVLM2,Cache Set,Data Aggregator,Email Notification,Camera Focus,SAM 3,Polygon Visualization,OpenAI,VLM As Classifier,Stitch OCR Detections,Image Contours,Roboflow Vision Events,Llama 3.2 Vision,First Non Empty Or Default,VLM As Detector,Instance Segmentation Model,Cosine Similarity,Time in Zone,JSON Parser,Triangle Visualization,LMM For Classification,Pixel Color Count,Expression,Line Counter,Qwen3.5-VL,Anthropic Claude,Image Blur,Corner Visualization,Halo Visualization,Detections Consensus,Buffer,Line Counter Visualization,Multi-Label Classification Model,ByteTrack Tracker,Keypoint Detection Model,Roboflow Dataset Upload,Segment Anything 2 Model,Detections Transformation,Stability AI Outpainting,Halo Visualization,Dimension Collapse,Pixelate Visualization,Google Vision OCR,Ellipse Visualization,Google Gemini
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[string,secret]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}