LMM¶
Deprecated
This block is deprecated and may be removed in a future release.
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
Google Gemini,Google Vision OCR,Mask Visualization,Google Gemini,Image Convert Grayscale,Florence-2 Model,VLM As Detector,EasyOCR,Single-Label Classification Model,Image Blur,GLM-OCR,Depth Estimation,Multi-Label Classification Model,Crop Visualization,Florence-2 Model,Model Monitoring Inference Aggregator,Corner Visualization,Anthropic Claude,Keypoint Visualization,Google Gemini,Clip Comparison,VLM As Classifier,Triangle Visualization,OpenAI,Stability AI Inpainting,Stitch Images,Keypoint Detection Model,Llama 3.2 Vision,Perspective Correction,Stability AI Image Generation,Local File Sink,Reference Path Visualization,Halo Visualization,Trace Visualization,Grid Visualization,Polygon Visualization,Polygon Zone Visualization,Stitch OCR Detections,CSV Formatter,CogVLM,Stitch OCR Detections,Halo Visualization,SIFT Comparison,Label Visualization,Image Slicer,Dynamic Crop,Anthropic Claude,Image Threshold,Line Counter Visualization,OpenAI,Email Notification,Slack Notification,Webhook Sink,Stability AI Outpainting,Instance Segmentation Model,Heatmap Visualization,Ellipse Visualization,LMM For Classification,Camera Focus,Icon Visualization,SIFT,OpenAI,Roboflow Dataset Upload,Background Color Visualization,Text Display,LMM,Qwen3.5-VL,Image Contours,Blur Visualization,Contrast Equalization,QR Code Generator,Dot Visualization,Background Subtraction,Roboflow Dataset Upload,Color Visualization,Roboflow Custom Metadata,Twilio SMS Notification,Absolute Static Crop,Relative Static Crop,Bounding Box Visualization,OpenAI,S3 Sink,OCR Model,Polygon Visualization,Twilio SMS/MMS Notification,Camera Calibration,Model Comparison Visualization,Camera Focus,Image Slicer,Circle Visualization,Email Notification,Classification Label Visualization,Image Preprocessing,Object Detection Model,Anthropic Claude,Morphological Transformation,Pixelate Visualization - outputs:
Google Gemini,Mask Visualization,VLM As Detector,Identify Outliers,Image Blur,Depth Estimation,Byte Tracker,Cache Get,Keypoint Visualization,SORT Tracker,OpenAI,Stability AI Inpainting,Keypoint Detection Model,YOLO-World Model,Reference Path Visualization,Buffer,Grid Visualization,Polygon Visualization,Polygon Zone Visualization,PTZ Tracking (ONVIF),Stitch OCR Detections,CLIP Embedding Model,Continue If,SIFT Comparison,Detections Combine,Dynamic Crop,Byte Tracker,OpenAI,Email Notification,Webhook Sink,Instance Segmentation Model,Roboflow Dataset Upload,Time in Zone,Keypoint Detection Model,Detection Offset,Contrast Equalization,Dot Visualization,Background Subtraction,Roboflow Custom Metadata,Twilio SMS/MMS Notification,Polygon Visualization,Detections Classes Replacement,Detections Transformation,Single-Label Classification Model,Detections Merge,Anthropic Claude,Path Deviation,Google Vision OCR,Detections Filter,Image Convert Grayscale,EasyOCR,GLM-OCR,Dominant Color,Multi-Label Classification Model,Semantic Segmentation Model,Detections Stabilizer,OC-SORT Tracker,Delta Filter,VLM As Classifier,Clip Comparison,VLM As Classifier,Triangle Visualization,Stitch Images,Llama 3.2 Vision,QR Code Detection,CSV Formatter,CogVLM,Detections Stitch,Detections List Roll-Up,Size Measurement,Label Visualization,Bounding Rectangle,Moondream2,Stability AI Outpainting,Heatmap Visualization,LMM For Classification,First Non Empty Or Default,SIFT Comparison,Velocity,LMM,Qwen3.5-VL,Dynamic Zone,Detection Event Log,Color Visualization,Absolute Static Crop,Byte Tracker,Rate Limiter,Line Counter,Image Slicer,Classification Label Visualization,Image Preprocessing,Local File Sink,Morphological Transformation,Dimension Collapse,Single-Label Classification Model,Cache Set,Clip Comparison,Crop Visualization,SmolVLM2,Data Aggregator,Property Definition,Model Monitoring Inference Aggregator,Multi-Label Classification Model,Anthropic Claude,Google Gemini,Seg Preview,Perspective Correction,Trace Visualization,Stitch OCR Detections,JSON Parser,SAM 3,Image Slicer,Qwen3-VL,Distance Measurement,Image Threshold,SAM 3,Line Counter,Pixel Color Count,Camera Focus,Qwen2.5-VL,Blur Visualization,QR Code Generator,Bounding Box Visualization,S3 Sink,Relative Static Crop,Camera Calibration,Overlap Filter,Email Notification,Circle Visualization,Object Detection Model,Pixelate Visualization,Google Gemini,Path Deviation,Florence-2 Model,VLM As Detector,Florence-2 Model,Corner Visualization,Detections Consensus,Expression,Stability AI Image Generation,Halo Visualization,Motion Detection,Time in Zone,Object Detection Model,Template Matching,Identify Changes,Halo Visualization,Perception Encoder Embedding Model,Anthropic Claude,Cosine Similarity,Instance Segmentation Model,Line Counter Visualization,Slack Notification,Time in Zone,Ellipse Visualization,Icon Visualization,OpenAI,SIFT,Background Color Visualization,Text Display,Gaze Detection,Image Contours,Mask Area Measurement,Barcode Detection,Roboflow Dataset Upload,Twilio SMS Notification,OpenAI,OCR Model,ByteTrack Tracker,SAM 3,Segment Anything 2 Model,Model Comparison Visualization,Camera Focus
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[secret,string]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}