LMM¶
Deprecated
This block is deprecated and may be removed in a future release.
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
requires_internet— air-gapped / offline deployments - This block depends on a service that is not reachable from fully offline / air-gapped deployments.
-
hard— runtimehosted_serverless; executionremote - LMM_ENABLED=False on Roboflow Hosted Serverless: the /llm_v1 endpoint is not registered, so run_remotely() returns 404.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
VLM As Classifier,MoonshotAI Kimi,Stability AI Image Generation,Trace Visualization,Anthropic Claude,Icon Visualization,SIFT Comparison,Morphological Transformation,Color Visualization,LMM For Classification,Perspective Correction,Corner Visualization,Roboflow Custom Metadata,Halo Visualization,Qwen-VL,Email Notification,Halo Visualization,Google Gemma,Background Color Visualization,Ellipse Visualization,Email Notification,Twilio SMS/MMS Notification,Text Display,Polygon Visualization,Crop Visualization,Absolute Static Crop,Image Preprocessing,Model Monitoring Inference Aggregator,Relative Static Crop,OpenRouter,OpenAI,Florence-2 Model,OpenAI,Heatmap Visualization,OCR Model,Blur Visualization,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,Google Gemini,Clip Comparison,Google Gemini,Background Subtraction,Keypoint Visualization,CSV Formatter,Webhook Sink,Stitch Images,Florence-2 Model,Current Time,Contrast Equalization,OpenAI,VLM As Detector,Google Gemini,Triangle Visualization,Slack Notification,SIFT,Local File Sink,Image Contours,Keypoint Detection Model,GLM-OCR,Roboflow Asset Library Attributes,Image Slicer,Polygon Zone Visualization,Contrast Enhancement,Google Gemma API,Stitch OCR Detections,Image Threshold,Line Counter Visualization,Camera Calibration,QR Code Generator,S3 Sink,Microsoft SQL Server Sink,Google Vision OCR,Twilio SMS Notification,Image Blur,Morphological Transformation,Camera Focus,Roboflow Vision Events,Stability AI Inpainting,Classification Label Visualization,Stitch OCR Detections,Event Writer,Grid Visualization,Qwen3.5-VL,Mask Visualization,Llama 3.2 Vision,Reference Path Visualization,Image Slicer,Label Visualization,OPC UA Writer Sink,Dot Visualization,Dynamic Crop,Circle Visualization,Llama 3.2 Vision,Camera Focus,OpenAI-Compatible LLM,MoonshotAI Kimi,Single-Label Classification Model,CogVLM,Qwen 3.6 API,Bounding Box Visualization,Multi-Label Classification Model,LMM,OpenAI,Image Convert Grayscale,Roboflow Visual Search,EasyOCR,Roboflow Dataset Upload,Pixelate Visualization,Roboflow Dataset Upload,PLC Writer,Qwen 3.5 API,Anthropic Claude,Object Detection Model,MQTT Writer,Polygon Visualization,Model Comparison Visualization - outputs:
Image Stack,Anthropic Claude,Per-Class Confidence Filter,Color Visualization,Single-Label Classification Model,Perspective Correction,Corner Visualization,Roboflow Custom Metadata,Halo Visualization,Dynamic Zone,Qwen-VL,Keypoint Detection Model,JSON Parser,Email Notification,Object Detection Model,Background Color Visualization,Email Notification,Text Display,Image Preprocessing,Template Matching,Relative Static Crop,Florence-2 Model,VLM As Detector,OpenAI,OCR Model,Blur Visualization,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,PLC EthernetIP,Buffer,Webhook Sink,Byte Tracker,Contrast Equalization,Mask Edge Snap,Moondream2,Line Counter,VLM As Detector,Google Gemini,Triangle Visualization,Overlap Filter,Time in Zone,Inner Workflow,First Non Empty Or Default,Detections Stabilizer,Keypoint Detection Model,VLM As Classifier,Roboflow Asset Library Attributes,Polygon Zone Visualization,Google Gemma API,Contrast Enhancement,Line Counter Visualization,Image Threshold,Distance Measurement,Camera Calibration,Detection Offset,ByteTrack Tracker,Expression,S3 Sink,Microsoft SQL Server Sink,Twilio SMS Notification,Detections Combine,Morphological Transformation,Camera Focus,Size Measurement,Delta Filter,PTZ Tracking (ONVIF),Stability AI Inpainting,Classification Label Visualization,Stitch OCR Detections,Event Writer,Mask Visualization,Dominant Color,Byte Tracker,Rate Limiter,Switch Case,Reference Path Visualization,Image Slicer,Identify Outliers,Byte Tracker,OPC UA Writer Sink,Dot Visualization,Cache Set,Identify Changes,Dynamic Crop,Path Deviation,Llama 3.2 Vision,BoT-SORT Tracker,Gaze Detection,Segment Anything 2 Model,OpenAI-Compatible LLM,Single-Label Classification Model,Overlap Analysis,Qwen3.5,QR Code Detection,Object Detection Model,Qwen 3.6 API,Detections Consensus,Multi-Label Classification Model,OpenAI,SAM 3,PLC Reader,Image Convert Grayscale,Instance Segmentation Model,Roboflow Dataset Upload,SAM 3,Detections Classes Replacement,Instance Segmentation Model,Roboflow Dataset Upload,PLC Writer,Qwen 3.5 API,OC-SORT Tracker,Seg Preview,VLM As Classifier,Line Counter,MoonshotAI Kimi,Stability AI Image Generation,Trace Visualization,Path Deviation,Qwen2.5-VL,Icon Visualization,SIFT Comparison,Morphological Transformation,SmolVLM2,LMM For Classification,Clip Comparison,Detections Merge,Halo Visualization,Data Aggregator,Google Gemma,Ellipse Visualization,Twilio SMS/MMS Notification,Polygon Visualization,Crop Visualization,Absolute Static Crop,Model Monitoring Inference Aggregator,OpenRouter,OpenAI,PLC ModbusTCP,Motion Detection,Heatmap Visualization,Detections Filter,Perception Encoder Embedding Model,Barcode Detection,Dimension Collapse,YOLO-World Model,Google Gemini,Clip Comparison,Google Gemini,Background Subtraction,Keypoint Visualization,CSV Formatter,Stitch Images,Florence-2 Model,Current Time,Detections List Roll-Up,OpenAI,Qwen3-VL,Slack Notification,CLIP Embedding Model,SIFT,Multi-Label Classification Model,Local File Sink,Cosine Similarity,Image Contours,Pixel Color Count,GLM-OCR,Image Slicer,Time in Zone,Semantic Segmentation Model,Stitch OCR Detections,Semantic Segmentation Model,Multi-Label Classification Model,QR Code Generator,Detection Event Log,Detections Transformation,Mask Area Measurement,Google Vision OCR,Image Blur,Property Definition,Roboflow Vision Events,SAM2 Video Tracker,Bounding Rectangle,Qwen3.5-VL,Grid Visualization,Llama 3.2 Vision,Velocity,Label Visualization,SIFT Comparison,Detections Stitch,Circle Visualization,SAM3 Video Tracker,Camera Focus,MoonshotAI Kimi,CogVLM,SAM 3 Interactive,Bounding Box Visualization,LMM,Continue If,Roboflow Visual Search,EasyOCR,Cache Get,Instance Segmentation Model,Pixelate Visualization,Keypoint Detection Model,SORT Tracker,Track Class Lock,Anthropic Claude,Object Detection Model,Time in Zone,MQTT Writer,Polygon Visualization,SAM 3,Model Comparison Visualization,Single-Label Classification Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[secret,string]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}