LMM¶
Deprecated
This block is deprecated and may be removed in a future release.
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
requires_internet— air-gapped / offline deployments - This block depends on a service that is not reachable from fully offline / air-gapped deployments.
-
hard— runtimehosted_serverless; executionremote - LMM_ENABLED=False on Roboflow Hosted Serverless: the /llm_v1 endpoint is not registered, so run_remotely() returns 404.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
Morphological Transformation,Image Preprocessing,Email Notification,VLM As Classifier,Halo Visualization,Morphological Transformation,Object Detection Model,Text Display,Image Threshold,Model Monitoring Inference Aggregator,Pixelate Visualization,Keypoint Detection Model,Qwen-VL,OpenAI,CogVLM,Crop Visualization,Dot Visualization,Google Vision OCR,Florence-2 Model,Roboflow Dataset Upload,Qwen3.5-VL,Roboflow Vision Events,Polygon Zone Visualization,Polygon Visualization,Absolute Static Crop,S3 Sink,Twilio SMS/MMS Notification,QR Code Generator,SIFT Comparison,Contrast Enhancement,Stitch OCR Detections,OCR Model,Color Visualization,Roboflow Dataset Upload,Roboflow Custom Metadata,LMM For Classification,OpenAI-Compatible LLM,Line Counter Visualization,Image Blur,Stability AI Inpainting,Blur Visualization,Current Time,Perspective Correction,Keypoint Visualization,Anthropic Claude,MQTT Writer,MoonshotAI Kimi,Google Gemini,Image Slicer,Depth Estimation,Ellipse Visualization,Google Gemma API,Slack Notification,Google Gemini,Bounding Box Visualization,Label Visualization,Stitch OCR Detections,Camera Focus,CSV Formatter,OpenAI,SIFT,Anthropic Claude,Image Convert Grayscale,Roboflow Asset Library Attributes,Florence-2 Model,EasyOCR,Twilio SMS Notification,Local File Sink,Icon Visualization,Triangle Visualization,Qwen 3.5 API,OpenRouter,OpenAI,Background Color Visualization,MoonshotAI Kimi,Google Gemini,Grid Visualization,Corner Visualization,Reference Path Visualization,Image Slicer,Single-Label Classification Model,Halo Visualization,Dynamic Crop,Webhook Sink,Stability AI Outpainting,Relative Static Crop,Anthropic Claude,Clip Comparison,Multi-Label Classification Model,OpenAI,Llama 3.2 Vision,Camera Calibration,Model Comparison Visualization,Trace Visualization,Google Gemma,OPC UA Writer Sink,Circle Visualization,Email Notification,LMM,Event Writer,Instance Segmentation Model,Contrast Equalization,Camera Focus,Heatmap Visualization,Background Subtraction,Image Contours,Qwen 3.6 API,GLM-OCR,VLM As Detector,Classification Label Visualization,Llama 3.2 Vision,Stitch Images,Mask Visualization,Microsoft SQL Server Sink,Stability AI Image Generation,Polygon Visualization - outputs:
Image Preprocessing,Detections Transformation,Object Detection Model,Time in Zone,Text Display,Template Matching,Image Threshold,Keypoint Detection Model,Time in Zone,OpenAI,Crop Visualization,Cosine Similarity,Florence-2 Model,Roboflow Dataset Upload,Qwen3.5-VL,Roboflow Vision Events,Polygon Zone Visualization,Qwen2.5-VL,Polygon Visualization,S3 Sink,Absolute Static Crop,QR Code Generator,SIFT Comparison,Cache Get,Stitch OCR Detections,OCR Model,Color Visualization,Gaze Detection,OpenAI-Compatible LLM,Stability AI Inpainting,Image Blur,Line Counter Visualization,Blur Visualization,SAM 3,Detection Offset,Anthropic Claude,MQTT Writer,Image Slicer,SAM 3,Detections Consensus,Detections Stitch,Ellipse Visualization,Google Gemma API,PLC ModbusTCP,Overlap Analysis,Rate Limiter,Image Stack,Google Gemini,Delta Filter,Cache Set,Keypoint Detection Model,Stitch OCR Detections,Camera Focus,CSV Formatter,Keypoint Detection Model,Perception Encoder Embedding Model,Image Convert Grayscale,Roboflow Asset Library Attributes,Seg Preview,Overlap Filter,Multi-Label Classification Model,Buffer,Triangle Visualization,Icon Visualization,Qwen 3.5 API,VLM As Classifier,Dominant Color,Instance Segmentation Model,Distance Measurement,Qwen3-VL,Background Color Visualization,Byte Tracker,Clip Comparison,Corner Visualization,Image Slicer,Single-Label Classification Model,Dynamic Crop,Stability AI Outpainting,Detection Event Log,VLM As Detector,Anthropic Claude,Clip Comparison,SORT Tracker,OpenAI,ByteTrack Tracker,Detections Combine,Trace Visualization,PTZ Tracking (ONVIF),Camera Calibration,Email Notification,Camera Focus,Background Subtraction,GLM-OCR,SAM2 Video Tracker,Qwen3.5,VLM As Detector,Llama 3.2 Vision,Property Definition,Stitch Images,Mask Visualization,Microsoft SQL Server Sink,Data Aggregator,Detections Classes Replacement,Morphological Transformation,Email Notification,VLM As Classifier,Morphological Transformation,Halo Visualization,Pixel Color Count,BoT-SORT Tracker,Model Monitoring Inference Aggregator,Pixelate Visualization,Qwen-VL,CogVLM,SAM 3,Dot Visualization,Google Vision OCR,PLC EthernetIP,Detections List Roll-Up,Detections Merge,Dimension Collapse,Mask Edge Snap,SIFT Comparison,Twilio SMS/MMS Notification,Contrast Enhancement,Per-Class Confidence Filter,Single-Label Classification Model,Dynamic Zone,Detections Filter,Byte Tracker,Roboflow Dataset Upload,Roboflow Custom Metadata,Bounding Rectangle,LMM For Classification,Detections Stabilizer,Object Detection Model,Path Deviation,Current Time,Perspective Correction,Keypoint Visualization,Byte Tracker,MoonshotAI Kimi,Google Gemini,Identify Changes,Depth Estimation,Object Detection Model,Slack Notification,Identify Outliers,Time in Zone,Inner Workflow,Label Visualization,Bounding Box Visualization,Size Measurement,Multi-Label Classification Model,OpenAI,SIFT,Anthropic Claude,Moondream2,OC-SORT Tracker,CLIP Embedding Model,Florence-2 Model,EasyOCR,YOLO-World Model,Segment Anything 2 Model,Twilio SMS Notification,Local File Sink,Single-Label Classification Model,Mask Area Measurement,Path Deviation,JSON Parser,OpenRouter,Instance Segmentation Model,OpenAI,MoonshotAI Kimi,Continue If,Google Gemini,Grid Visualization,Semantic Segmentation Model,Reference Path Visualization,SmolVLM2,Line Counter,Halo Visualization,First Non Empty Or Default,Webhook Sink,Instance Segmentation Model,Relative Static Crop,Expression,Multi-Label Classification Model,Llama 3.2 Vision,Barcode Detection,Velocity,Motion Detection,Model Comparison Visualization,Google Gemma,OPC UA Writer Sink,Line Counter,QR Code Detection,Circle Visualization,LMM,Event Writer,Instance Segmentation Model,Contrast Equalization,Heatmap Visualization,Qwen 3.6 API,Image Contours,Classification Label Visualization,Stability AI Image Generation,Semantic Segmentation Model,Polygon Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[string,secret]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}