CogVLM¶
Class: CogVLMBlockV1
Source: inference.core.workflows.core_steps.models.foundation.cog_vlm.v1.CogVLMBlockV1
CogVLM reached End Of Life
Due to dependencies conflicts with newer models and security vulnerabilities discovered in transformers
library patched in the versions of library incompatible with the model we announced End Of Life for CogVLM
support in inference, effective since release 0.38.0.
We are leaving this block in ecosystem until release 0.42.0 for clients to get informed about change that
was introduced.
Starting as of now, all Workflows using the block stop being functional (runtime error will be raised),
after inference release 0.42.0 - this block will be removed and Execution Engine will raise compilation
error seeing the block in Workflow definition.
Ask a question to CogVLM, an open source vision-language model.
This model requires a GPU and can only be run on self-hosted devices, and is not available on the Roboflow Hosted API.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/cog_vlm@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Text prompt to the CogVLM model. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
requires_internet— air-gapped / offline deployments - This block depends on a service that is not reachable from fully offline / air-gapped deployments.
-
hard— runtimeself_hosted_cpu; executionlocal - Requires a GPU; run_locally() loads a model that needs CUDA.
-
hard— runtimehosted_serverless; executionremote - LMM_ENABLED=False on Roboflow Hosted Serverless: the /llm_v1 and /infer/cog_vlm endpoints are not registered, so run_remotely() returns 404.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to CogVLM in version v1.
- inputs:
Halo Visualization,Stitch OCR Detections,GLM-OCR,Image Threshold,Stitch Images,Morphological Transformation,Classification Label Visualization,Twilio SMS/MMS Notification,Crop Visualization,Icon Visualization,Stability AI Outpainting,Blur Visualization,VLM As Classifier,Reference Path Visualization,MoonshotAI Kimi,OpenAI,Google Gemini,Anthropic Claude,Webhook Sink,Camera Focus,QR Code Generator,Model Comparison Visualization,Florence-2 Model,MQTT Writer,Trace Visualization,Ellipse Visualization,Anthropic Claude,Dot Visualization,Perspective Correction,Label Visualization,Image Convert Grayscale,Florence-2 Model,Text Display,Qwen-VL,Llama 3.2 Vision,Roboflow Dataset Upload,Image Blur,Keypoint Detection Model,Absolute Static Crop,SIFT,CSV Formatter,LMM,Google Gemini,EasyOCR,Qwen 3.5 API,Local File Sink,Qwen 3.6 API,Triangle Visualization,Camera Focus,Contrast Equalization,Polygon Visualization,OpenAI,Heatmap Visualization,Clip Comparison,Google Gemma API,Contrast Enhancement,Google Gemini,Halo Visualization,Color Visualization,Morphological Transformation,MoonshotAI Kimi,Stitch OCR Detections,LMM For Classification,Event Writer,VLM As Detector,Llama 3.2 Vision,Polygon Visualization,Email Notification,Mask Visualization,Anthropic Claude,Stability AI Inpainting,Roboflow Asset Library Attributes,Microsoft SQL Server Sink,Keypoint Visualization,OpenAI,Background Subtraction,Multi-Label Classification Model,Roboflow Vision Events,Twilio SMS Notification,Email Notification,Image Slicer,Image Contours,Line Counter Visualization,CogVLM,Object Detection Model,Image Preprocessing,OPC UA Writer Sink,Dynamic Crop,Depth Estimation,Bounding Box Visualization,Qwen3.5-VL,Current Time,Corner Visualization,Polygon Zone Visualization,Camera Calibration,Roboflow Dataset Upload,Grid Visualization,Stability AI Image Generation,OpenAI,S3 Sink,Circle Visualization,Image Slicer,OCR Model,Single-Label Classification Model,Relative Static Crop,Roboflow Custom Metadata,Instance Segmentation Model,Model Monitoring Inference Aggregator,OpenAI-Compatible LLM,Slack Notification,OpenRouter,SIFT Comparison,Pixelate Visualization,Google Vision OCR,Background Color Visualization,Google Gemma - outputs:
Overlap Analysis,Stability AI Outpainting,Detections Transformation,YOLO-World Model,Detections Classes Replacement,Anthropic Claude,Camera Focus,Track Class Lock,SmolVLM2,Label Visualization,Florence-2 Model,Qwen-VL,Text Display,Velocity,CSV Formatter,Gaze Detection,LMM,Qwen 3.6 API,Qwen2.5-VL,Line Counter,Qwen3-VL,Clip Comparison,Google Gemma API,Contrast Enhancement,Halo Visualization,Event Writer,Stability AI Inpainting,Property Definition,Bounding Rectangle,Roboflow Asset Library Attributes,Identify Outliers,Semantic Segmentation Model,Bounding Box Visualization,Clip Comparison,SIFT Comparison,Time in Zone,Single-Label Classification Model,Slack Notification,OpenRouter,Detection Event Log,SAM3 Video Tracker,Google Gemma,Dynamic Zone,CLIP Embedding Model,Stitch OCR Detections,GLM-OCR,Icon Visualization,ByteTrack Tracker,Single-Label Classification Model,Single-Label Classification Model,QR Code Generator,Path Deviation,MQTT Writer,Object Detection Model,Keypoint Detection Model,BoT-SORT Tracker,Dot Visualization,Perspective Correction,Instance Segmentation Model,Seg Preview,Per-Class Confidence Filter,SIFT,Local File Sink,Triangle Visualization,Contrast Equalization,Polygon Visualization,SAM2 Video Tracker,Data Aggregator,Rate Limiter,PLC EthernetIP,Google Gemini,LMM For Classification,Multi-Label Classification Model,Image Stack,Email Notification,Mask Visualization,Distance Measurement,Barcode Detection,PTZ Tracking (ONVIF),Keypoint Visualization,Multi-Label Classification Model,Overlap Filter,Semantic Segmentation Model,Image Contours,Byte Tracker,SAM 3,Motion Detection,Current Time,Polygon Zone Visualization,Corner Visualization,Stability AI Image Generation,Circle Visualization,Anthropic Claude,Background Color Visualization,Line Counter,Template Matching,Morphological Transformation,Classification Label Visualization,Crop Visualization,Blur Visualization,Reference Path Visualization,Delta Filter,OpenAI,Instance Segmentation Model,Size Measurement,Mask Edge Snap,Model Comparison Visualization,Florence-2 Model,Trace Visualization,JSON Parser,Image Convert Grayscale,Llama 3.2 Vision,Image Blur,Keypoint Detection Model,Absolute Static Crop,Keypoint Detection Model,OC-SORT Tracker,Qwen 3.5 API,QR Code Detection,Camera Focus,SORT Tracker,VLM As Detector,Multi-Label Classification Model,Detections Stitch,Stitch OCR Detections,MoonshotAI Kimi,Color Visualization,Morphological Transformation,Buffer,Cache Set,Microsoft SQL Server Sink,Time in Zone,OpenAI,Roboflow Vision Events,Mask Area Measurement,Detection Offset,Dominant Color,CogVLM,Detections Consensus,Object Detection Model,OPC UA Writer Sink,Dynamic Crop,Path Deviation,Byte Tracker,Expression,Detections Combine,Continue If,Qwen3.5-VL,First Non Empty Or Default,SAM 3,Cache Get,OpenAI,OCR Model,Google Vision OCR,SIFT Comparison,Pixelate Visualization,Halo Visualization,Image Threshold,SAM 3 Interactive,Stitch Images,Twilio SMS/MMS Notification,VLM As Classifier,MoonshotAI Kimi,Google Gemini,Byte Tracker,Webhook Sink,Instance Segmentation Model,Ellipse Visualization,Roboflow Dataset Upload,PLC ModbusTCP,Detections Stabilizer,Detections Merge,Google Gemini,Dimension Collapse,EasyOCR,SAM 3,Time in Zone,OpenAI,Heatmap Visualization,Perception Encoder Embedding Model,Detections List Roll-Up,VLM As Detector,Llama 3.2 Vision,Identify Changes,Polygon Visualization,Anthropic Claude,Detections Filter,Background Subtraction,Twilio SMS Notification,Email Notification,Image Slicer,Line Counter Visualization,Image Preprocessing,VLM As Classifier,Depth Estimation,Pixel Color Count,Qwen3.5,Cosine Similarity,Roboflow Dataset Upload,Moondream2,Segment Anything 2 Model,Camera Calibration,Inner Workflow,Grid Visualization,S3 Sink,Image Slicer,Roboflow Custom Metadata,Relative Static Crop,Instance Segmentation Model,Model Monitoring Inference Aggregator,OpenAI-Compatible LLM,Object Detection Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
CogVLM in version v1 has.
Bindings
-
input
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step CogVLM in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/cog_vlm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"json_output_format": {
"count": "number of cats in the picture"
}
}