LMM¶
Class: LMMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.lmm.v1.LMMBlockV1
Ask a question to a Large Multimodal Model (LMM) with an image and text.
You can specify arbitrary text prompts to an LMMBlock.
The LLMBlock supports two LMMs:
- OpenAI's GPT-4 with Vision;
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
If you want to classify an image into one or more categories, we recommend using the dedicated LMMForClassificationBlock.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/lmm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Holds unconstrained text prompt to LMM mode. | ✅ |
lmm_type |
str |
Type of LMM to be used. | ✅ |
lmm_config |
LMMConfig |
Configuration of LMM. | ❌ |
remote_api_key |
str |
Holds API key required to call LMM model - in current state of development, we require OpenAI key when lmm_type=gpt_4v.. |
✅ |
json_output |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to LMM in version v1.
- inputs:
Absolute Static Crop,Relative Static Crop,Keypoint Visualization,Object Detection Model,LMM For Classification,Google Vision OCR,Slack Notification,Trace Visualization,Color Visualization,Instance Segmentation Model,Polygon Zone Visualization,Camera Focus,Halo Visualization,CSV Formatter,OCR Model,Camera Calibration,VLM as Classifier,Triangle Visualization,Stability AI Inpainting,Image Threshold,Reference Path Visualization,Corner Visualization,Ellipse Visualization,OpenAI,Single-Label Classification Model,Morphological Transformation,Roboflow Custom Metadata,Grid Visualization,Image Preprocessing,CogVLM,Line Counter Visualization,OpenAI,Florence-2 Model,Roboflow Dataset Upload,Stitch OCR Detections,Label Visualization,Model Comparison Visualization,Multi-Label Classification Model,Roboflow Dataset Upload,OpenAI,Model Monitoring Inference Aggregator,Polygon Visualization,Llama 3.2 Vision,Icon Visualization,Local File Sink,Blur Visualization,Image Contours,Clip Comparison,VLM as Detector,Twilio SMS Notification,Bounding Box Visualization,SIFT,Classification Label Visualization,Background Color Visualization,Webhook Sink,Dynamic Crop,Dot Visualization,Pixelate Visualization,Email Notification,Image Slicer,Stitch Images,Crop Visualization,Mask Visualization,SIFT Comparison,QR Code Generator,Depth Estimation,Image Slicer,Google Gemini,Perspective Correction,Image Convert Grayscale,Stability AI Image Generation,Keypoint Detection Model,EasyOCR,Contrast Equalization,Anthropic Claude,Image Blur,Circle Visualization,Stability AI Outpainting,LMM,Florence-2 Model - outputs:
Absolute Static Crop,Relative Static Crop,Expression,LMM For Classification,VLM as Classifier,Trace Visualization,Color Visualization,Instance Segmentation Model,Seg Preview,Polygon Zone Visualization,Camera Focus,Halo Visualization,Identify Changes,Camera Calibration,VLM as Classifier,Triangle Visualization,Single-Label Classification Model,Image Threshold,Detections Classes Replacement,Gaze Detection,Single-Label Classification Model,Detection Offset,Morphological Transformation,Roboflow Custom Metadata,Image Preprocessing,Grid Visualization,Cache Set,Line Counter Visualization,SIFT Comparison,Stitch OCR Detections,Path Deviation,PTZ Tracking (ONVIF).md),Model Comparison Visualization,Multi-Label Classification Model,Multi-Label Classification Model,Roboflow Dataset Upload,Template Matching,Line Counter,Path Deviation,Polygon Visualization,Detections Stabilizer,Llama 3.2 Vision,Icon Visualization,Bounding Rectangle,Local File Sink,VLM as Detector,Twilio SMS Notification,SIFT,Cosine Similarity,Classification Label Visualization,Background Color Visualization,Webhook Sink,Dynamic Crop,Pixelate Visualization,Dominant Color,Detections Consensus,QR Code Detection,Email Notification,Buffer,Image Slicer,Crop Visualization,Keypoint Detection Model,Mask Visualization,Detections Merge,Barcode Detection,Detections Transformation,Depth Estimation,Rate Limiter,Perspective Correction,Keypoint Detection Model,Distance Measurement,EasyOCR,Circle Visualization,Stability AI Outpainting,Byte Tracker,Size Measurement,VLM as Detector,JSON Parser,Keypoint Visualization,Byte Tracker,Object Detection Model,Clip Comparison,SmolVLM2,Google Vision OCR,Slack Notification,Dimension Collapse,CSV Formatter,OCR Model,Dynamic Zone,Segment Anything 2 Model,Stability AI Inpainting,Reference Path Visualization,Corner Visualization,Ellipse Visualization,OpenAI,Time in Zone,CogVLM,OpenAI,YOLO-World Model,Continue If,Florence-2 Model,Roboflow Dataset Upload,Data Aggregator,Label Visualization,OpenAI,Line Counter,First Non Empty Or Default,Model Monitoring Inference Aggregator,Time in Zone,Velocity,Instance Segmentation Model,Time in Zone,Blur Visualization,Delta Filter,Image Contours,Clip Comparison,Identify Outliers,Object Detection Model,Overlap Filter,Moondream2,Bounding Box Visualization,Property Definition,Dot Visualization,Byte Tracker,CLIP Embedding Model,Detections Combine,Stitch Images,Detections Filter,Qwen2.5-VL,Perception Encoder Embedding Model,SIFT Comparison,Pixel Color Count,QR Code Generator,Google Gemini,Image Slicer,Stability AI Image Generation,Image Convert Grayscale,Contrast Equalization,Anthropic Claude,Image Blur,Cache Get,LMM,Detections Stitch,Florence-2 Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
LMM in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Holds unconstrained text prompt to LMM mode.lmm_type(string): Type of LMM to be used.remote_api_key(Union[secret,string]): Holds API key required to call LMM model - in current state of development, we require OpenAI key whenlmm_type=gpt_4v..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step LMM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/lmm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"lmm_type": "gpt_4v",
"lmm_config": {
"gpt_image_detail": "low",
"gpt_model_version": "gpt-4o",
"max_tokens": 200
},
"remote_api_key": "xxx-xxx",
"json_output": {
"count": "number of cats in the picture"
}
}