CogVLM¶
Class: CogVLMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.cog_vlm.v1.CogVLMBlockV1
CogVLM reached End Of Life
Due to dependencies conflicts with newer models and security vulnerabilities discovered in transformers
library patched in the versions of library incompatible with the model we announced End Of Life for CogVLM
support in inference, effective since release 0.38.0.
We are leaving this block in ecosystem until release 0.42.0 for clients to get informed about change that
was introduced.
Starting as of now, all Workflows using the block stop being functional (runtime error will be raised),
after inference release 0.42.0 - this block will be removed and Execution Engine will raise compilation
error seeing the block in Workflow definition.
Ask a question to CogVLM, an open source vision-language model.
This model requires a GPU and can only be run on self-hosted devices, and is not available on the Roboflow Hosted API.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/cog_vlm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Text prompt to the CogVLM model. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
requires_internet— air-gapped / offline deployments - This block depends on a service that is not reachable from fully offline / air-gapped deployments.
-
hard— runtimeself_hosted_cpu; executionlocal - Requires a GPU; run_locally() loads a model that needs CUDA.
-
hard— runtimehosted_serverless; executionremote - LMM_ENABLED=False on Roboflow Hosted Serverless: the /llm_v1 and /infer/cog_vlm endpoints are not registered, so run_remotely() returns 404.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to CogVLM in version v1.
- inputs:
Image Slicer,Polygon Zone Visualization,VLM As Classifier,Contrast Enhancement,Google Gemma API,MoonshotAI Kimi,Stability AI Image Generation,Image Threshold,Line Counter Visualization,Trace Visualization,Stitch OCR Detections,Camera Calibration,QR Code Generator,Anthropic Claude,Icon Visualization,SIFT Comparison,Morphological Transformation,S3 Sink,Color Visualization,LMM For Classification,Perspective Correction,Microsoft SQL Server Sink,Corner Visualization,Roboflow Custom Metadata,Google Vision OCR,Twilio SMS Notification,Halo Visualization,Image Blur,Morphological Transformation,Qwen-VL,Camera Focus,Email Notification,Roboflow Vision Events,Halo Visualization,Stability AI Inpainting,Classification Label Visualization,Stitch OCR Detections,Google Gemma,Event Writer,Grid Visualization,Qwen3.5-VL,Background Color Visualization,Mask Visualization,Llama 3.2 Vision,Ellipse Visualization,Email Notification,Reference Path Visualization,Image Slicer,Label Visualization,Twilio SMS/MMS Notification,Text Display,OPC UA Writer Sink,Dot Visualization,Polygon Visualization,Crop Visualization,Dynamic Crop,Absolute Static Crop,Circle Visualization,Image Preprocessing,Llama 3.2 Vision,Model Monitoring Inference Aggregator,Relative Static Crop,Camera Focus,OpenRouter,OpenAI,Florence-2 Model,OpenAI-Compatible LLM,MoonshotAI Kimi,Heatmap Visualization,Single-Label Classification Model,OpenAI,OCR Model,CogVLM,Blur Visualization,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,Google Gemini,Qwen 3.6 API,Clip Comparison,Google Gemini,Background Subtraction,Keypoint Visualization,CSV Formatter,Webhook Sink,Bounding Box Visualization,Multi-Label Classification Model,LMM,OpenAI,Stitch Images,Florence-2 Model,Image Convert Grayscale,Current Time,Contrast Equalization,OpenAI,VLM As Detector,Google Gemini,Roboflow Visual Search,Triangle Visualization,Slack Notification,EasyOCR,Roboflow Dataset Upload,Pixelate Visualization,Roboflow Dataset Upload,PLC Writer,SIFT,Qwen 3.5 API,Anthropic Claude,Object Detection Model,Local File Sink,MQTT Writer,Image Contours,Polygon Visualization,Keypoint Detection Model,GLM-OCR,Model Comparison Visualization,Roboflow Asset Library Attributes - outputs:
Image Stack,Anthropic Claude,Per-Class Confidence Filter,Color Visualization,Single-Label Classification Model,Perspective Correction,Corner Visualization,Roboflow Custom Metadata,Halo Visualization,Dynamic Zone,Qwen-VL,Keypoint Detection Model,JSON Parser,Email Notification,Object Detection Model,Background Color Visualization,Email Notification,Text Display,Image Preprocessing,Template Matching,Relative Static Crop,Florence-2 Model,VLM As Detector,OpenAI,OCR Model,Blur Visualization,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,PLC EthernetIP,Buffer,Webhook Sink,Byte Tracker,Contrast Equalization,Mask Edge Snap,Moondream2,Line Counter,VLM As Detector,Google Gemini,Triangle Visualization,Overlap Filter,Time in Zone,Inner Workflow,First Non Empty Or Default,Detections Stabilizer,Keypoint Detection Model,VLM As Classifier,Roboflow Asset Library Attributes,Polygon Zone Visualization,Google Gemma API,Contrast Enhancement,Line Counter Visualization,Image Threshold,Distance Measurement,Camera Calibration,Detection Offset,ByteTrack Tracker,Expression,S3 Sink,Microsoft SQL Server Sink,Twilio SMS Notification,Detections Combine,Morphological Transformation,Camera Focus,Size Measurement,Delta Filter,PTZ Tracking (ONVIF),Stability AI Inpainting,Classification Label Visualization,Stitch OCR Detections,Event Writer,Mask Visualization,Dominant Color,Byte Tracker,Rate Limiter,Switch Case,Reference Path Visualization,Image Slicer,Identify Outliers,Byte Tracker,OPC UA Writer Sink,Dot Visualization,Cache Set,Identify Changes,Dynamic Crop,Path Deviation,Llama 3.2 Vision,BoT-SORT Tracker,Gaze Detection,Segment Anything 2 Model,OpenAI-Compatible LLM,Single-Label Classification Model,Overlap Analysis,Qwen3.5,QR Code Detection,Object Detection Model,Qwen 3.6 API,Detections Consensus,Multi-Label Classification Model,OpenAI,SAM 3,PLC Reader,Image Convert Grayscale,Instance Segmentation Model,Roboflow Dataset Upload,SAM 3,Detections Classes Replacement,Instance Segmentation Model,Roboflow Dataset Upload,PLC Writer,Qwen 3.5 API,OC-SORT Tracker,Seg Preview,VLM As Classifier,Line Counter,MoonshotAI Kimi,Stability AI Image Generation,Trace Visualization,Path Deviation,Qwen2.5-VL,Icon Visualization,SIFT Comparison,Morphological Transformation,SmolVLM2,LMM For Classification,Clip Comparison,Detections Merge,Halo Visualization,Data Aggregator,Google Gemma,Ellipse Visualization,Twilio SMS/MMS Notification,Polygon Visualization,Crop Visualization,Absolute Static Crop,Model Monitoring Inference Aggregator,OpenRouter,OpenAI,PLC ModbusTCP,Motion Detection,Heatmap Visualization,Detections Filter,Perception Encoder Embedding Model,Barcode Detection,Dimension Collapse,YOLO-World Model,Google Gemini,Clip Comparison,Google Gemini,Background Subtraction,Keypoint Visualization,CSV Formatter,Stitch Images,Florence-2 Model,Current Time,Detections List Roll-Up,OpenAI,Qwen3-VL,Slack Notification,CLIP Embedding Model,SIFT,Multi-Label Classification Model,Local File Sink,Cosine Similarity,Image Contours,Pixel Color Count,GLM-OCR,Image Slicer,Time in Zone,Semantic Segmentation Model,Stitch OCR Detections,Semantic Segmentation Model,Multi-Label Classification Model,QR Code Generator,Detection Event Log,Detections Transformation,Mask Area Measurement,Google Vision OCR,Image Blur,Property Definition,Roboflow Vision Events,SAM2 Video Tracker,Bounding Rectangle,Qwen3.5-VL,Grid Visualization,Llama 3.2 Vision,Velocity,Label Visualization,SIFT Comparison,Detections Stitch,Circle Visualization,SAM3 Video Tracker,Camera Focus,MoonshotAI Kimi,CogVLM,SAM 3 Interactive,Bounding Box Visualization,LMM,Continue If,Roboflow Visual Search,EasyOCR,Cache Get,Instance Segmentation Model,Pixelate Visualization,Keypoint Detection Model,SORT Tracker,Track Class Lock,Anthropic Claude,Object Detection Model,Time in Zone,MQTT Writer,Polygon Visualization,SAM 3,Model Comparison Visualization,Single-Label Classification Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
CogVLM in version v1 has.
Bindings
-
input
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step CogVLM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/cog_vlm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"json_output_format": {
"count": "number of cats in the picture"
}
}