CogVLM¶
Version v1
¶
Ask a question to CogVLM, an open source vision-language model.
This model requires a GPU and can only be run on self-hosted devices, and is not available on the Roboflow Hosted API.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/cog_vlm@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
The unique name of this step.. | ❌ |
prompt |
str |
Text prompt to the CogVLM model. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Check what blocks you can connect to CogVLM
in version v1
.
- inputs:
Label Visualization
,Crop Visualization
,Mask Visualization
,Blur Visualization
,Image Contours
,Bounding Box Visualization
,Image Convert Grayscale
,Camera Focus
,Dot Visualization
,Color Visualization
,Corner Visualization
,Circle Visualization
,Perspective Correction
,Image Slicer
,Triangle Visualization
,Relative Static Crop
,Absolute Static Crop
,Halo Visualization
,Background Color Visualization
,SIFT
,Pixelate Visualization
,Polygon Visualization
,Dynamic Crop
,Image Blur
,Ellipse Visualization
,Image Threshold
- outputs:
Template Matching
,Segment Anything 2 Model
,Blur Visualization
,Image Convert Grayscale
,VLM as Detector
,Circle Visualization
,CogVLM
,Absolute Static Crop
,LMM For Classification
,Background Color Visualization
,YOLO-World Model
,Detections Transformation
,Polygon Visualization
,Image Threshold
,OCR Model
,Detections Classes Replacement
,Image Contours
,Bounding Box Visualization
,Object Detection Model
,Roboflow Custom Metadata
,Dimension Collapse
,Clip Comparison
,First Non Empty Or Default
,Detections Filter
,Instance Segmentation Model
,Halo Visualization
,SIFT
,Roboflow Dataset Upload
,Google Gemini
,Dynamic Crop
,Image Slicer
,Detection Offset
,Roboflow Dataset Upload
,Detections Stitch
,Multi-Label Classification Model
,Barcode Detection
,Camera Focus
,Dot Visualization
,Corner Visualization
,OpenAI
,Triangle Visualization
,Relative Static Crop
,Pixel Color Count
,Pixelate Visualization
,JSON Parser
,Dynamic Zone
,Detections Consensus
,Label Visualization
,Crop Visualization
,Mask Visualization
,LMM
,Property Definition
,QR Code Detection
,Color Visualization
,VLM as Classifier
,Perspective Correction
,OpenAI
,Keypoint Detection Model
,Clip Comparison
,Anthropic Claude
,Dominant Color
,Continue If
,Expression
,SIFT Comparison
,Image Blur
,Ellipse Visualization
,Single-Label Classification Model
The available connections depend on its binding kinds. Check what binding kinds
CogVLM
in version v1
has.
Bindings
-
input
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step CogVLM
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/cog_vlm@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"json_output_format": {
"count": "number of cats in the picture"
}
}