Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs:
Image Slicer,Polygon Zone Visualization,VLM As Classifier,Contrast Enhancement,Google Gemma API,MoonshotAI Kimi,Stability AI Image Generation,Image Threshold,Line Counter Visualization,Trace Visualization,Image Stack,Stitch OCR Detections,Camera Calibration,QR Code Generator,Anthropic Claude,Icon Visualization,SIFT Comparison,Morphological Transformation,S3 Sink,Color Visualization,LMM For Classification,Perspective Correction,Microsoft SQL Server Sink,Corner Visualization,Clip Comparison,Roboflow Custom Metadata,Google Vision OCR,Twilio SMS Notification,Halo Visualization,Image Blur,Dynamic Zone,Morphological Transformation,Qwen-VL,Camera Focus,Size Measurement,Email Notification,Halo Visualization,Roboflow Vision Events,Stability AI Inpainting,Classification Label Visualization,Google Gemma,Stitch OCR Detections,Event Writer,Grid Visualization,Qwen3.5-VL,Background Color Visualization,Mask Visualization,Llama 3.2 Vision,Ellipse Visualization,Email Notification,Reference Path Visualization,Image Slicer,Label Visualization,Twilio SMS/MMS Notification,Text Display,OPC UA Writer Sink,Dot Visualization,Polygon Visualization,Crop Visualization,Dynamic Crop,Absolute Static Crop,Circle Visualization,Image Preprocessing,Llama 3.2 Vision,Model Monitoring Inference Aggregator,Relative Static Crop,Camera Focus,OpenRouter,OpenAI,PLC ModbusTCP,Florence-2 Model,MoonshotAI Kimi,OpenAI,Heatmap Visualization,Motion Detection,Single-Label Classification Model,OpenAI-Compatible LLM,OCR Model,CogVLM,Blur Visualization,Dimension Collapse,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,Google Gemini,Qwen 3.6 API,Clip Comparison,Google Gemini,PLC EthernetIP,Background Subtraction,Keypoint Visualization,Buffer,CSV Formatter,Webhook Sink,Bounding Box Visualization,Multi-Label Classification Model,LMM,OpenAI,Stitch Images,Florence-2 Model,Image Convert Grayscale,Current Time,Detections List Roll-Up,Contrast Equalization,OpenAI,VLM As Detector,Google Gemini,Roboflow Visual Search,Triangle Visualization,Slack Notification,EasyOCR,Roboflow Dataset Upload,Pixelate Visualization,Roboflow Dataset Upload,PLC Writer,SIFT,Qwen 3.5 API,Anthropic Claude,Object Detection Model,Local File Sink,MQTT Writer,Image Contours,Polygon Visualization,Keypoint Detection Model,GLM-OCR,Model Comparison Visualization,Roboflow Asset Library Attributes - outputs:
VLM As Classifier,Line Counter,MoonshotAI Kimi,Stability AI Image Generation,Trace Visualization,Path Deviation,Anthropic Claude,Per-Class Confidence Filter,Icon Visualization,SIFT Comparison,Morphological Transformation,Color Visualization,LMM For Classification,Single-Label Classification Model,Perspective Correction,Clip Comparison,Corner Visualization,Roboflow Custom Metadata,Halo Visualization,Dynamic Zone,Keypoint Detection Model,Qwen-VL,Email Notification,Halo Visualization,Object Detection Model,Google Gemma,Background Color Visualization,Email Notification,Ellipse Visualization,Twilio SMS/MMS Notification,Text Display,Polygon Visualization,Crop Visualization,Image Preprocessing,Template Matching,Model Monitoring Inference Aggregator,Relative Static Crop,OpenRouter,OpenAI,VLM As Detector,Florence-2 Model,OpenAI,Motion Detection,Heatmap Visualization,Perception Encoder Embedding Model,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,YOLO-World Model,Google Gemini,Clip Comparison,Google Gemini,PLC EthernetIP,Keypoint Visualization,Buffer,Webhook Sink,Byte Tracker,Stitch Images,Florence-2 Model,Current Time,Detections List Roll-Up,Contrast Equalization,OpenAI,Moondream2,Line Counter,VLM As Detector,Google Gemini,Triangle Visualization,Slack Notification,Time in Zone,CLIP Embedding Model,Detections Stabilizer,Multi-Label Classification Model,Local File Sink,Keypoint Detection Model,VLM As Classifier,Pixel Color Count,GLM-OCR,Roboflow Asset Library Attributes,Polygon Zone Visualization,Image Slicer,Time in Zone,Google Gemma API,Stitch OCR Detections,Line Counter Visualization,Semantic Segmentation Model,Distance Measurement,Image Threshold,Multi-Label Classification Model,QR Code Generator,ByteTrack Tracker,S3 Sink,Microsoft SQL Server Sink,Twilio SMS Notification,Google Vision OCR,Image Blur,Morphological Transformation,Size Measurement,Roboflow Vision Events,PTZ Tracking (ONVIF),Stability AI Inpainting,Classification Label Visualization,Stitch OCR Detections,Event Writer,Grid Visualization,Qwen3.5-VL,Mask Visualization,Llama 3.2 Vision,Byte Tracker,Reference Path Visualization,Image Slicer,Label Visualization,Identify Outliers,Byte Tracker,OPC UA Writer Sink,Dot Visualization,Cache Set,Identify Changes,Path Deviation,Dynamic Crop,Circle Visualization,Llama 3.2 Vision,Detections Stitch,SAM3 Video Tracker,BoT-SORT Tracker,Segment Anything 2 Model,MoonshotAI Kimi,OpenAI-Compatible LLM,Single-Label Classification Model,Overlap Analysis,CogVLM,Object Detection Model,Qwen 3.6 API,Detections Consensus,Bounding Box Visualization,Multi-Label Classification Model,LMM,SAM 3,OpenAI,PLC Reader,Instance Segmentation Model,Roboflow Visual Search,Roboflow Dataset Upload,SAM 3,Cache Get,Instance Segmentation Model,Detections Classes Replacement,Keypoint Detection Model,Instance Segmentation Model,Roboflow Dataset Upload,SORT Tracker,Track Class Lock,Qwen 3.5 API,Object Detection Model,Anthropic Claude,Time in Zone,MQTT Writer,Polygon Visualization,OC-SORT Tracker,SAM 3,Model Comparison Visualization,Single-Label Classification Model,Seg Preview
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v2 has.
Bindings
-
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
-
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
Example JSON definition of step Clip Comparison in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs:
Image Slicer,Polygon Zone Visualization,Contrast Enhancement,Google Gemma API,MoonshotAI Kimi,Stability AI Image Generation,Image Threshold,Line Counter Visualization,Trace Visualization,Image Stack,Camera Calibration,QR Code Generator,Anthropic Claude,Icon Visualization,SIFT Comparison,Morphological Transformation,Color Visualization,Perspective Correction,Corner Visualization,Clip Comparison,Halo Visualization,Image Blur,Dynamic Zone,Morphological Transformation,Qwen-VL,Camera Focus,Size Measurement,Halo Visualization,Stability AI Inpainting,Classification Label Visualization,Google Gemma,Grid Visualization,Background Color Visualization,Mask Visualization,Llama 3.2 Vision,Ellipse Visualization,Reference Path Visualization,Image Slicer,Label Visualization,Text Display,Dot Visualization,Polygon Visualization,Crop Visualization,Dynamic Crop,Absolute Static Crop,Circle Visualization,Image Preprocessing,Llama 3.2 Vision,Relative Static Crop,Camera Focus,OpenRouter,OpenAI,PLC ModbusTCP,Florence-2 Model,MoonshotAI Kimi,OpenAI,Heatmap Visualization,Motion Detection,Blur Visualization,Dimension Collapse,Depth Estimation,Stability AI Outpainting,Anthropic Claude,Google Gemini,Qwen 3.6 API,Clip Comparison,Google Gemini,PLC EthernetIP,Background Subtraction,Keypoint Visualization,Buffer,Bounding Box Visualization,Stitch Images,Florence-2 Model,Image Convert Grayscale,Detections List Roll-Up,Contrast Equalization,OpenAI,Google Gemini,Roboflow Visual Search,Triangle Visualization,Pixelate Visualization,SIFT,Qwen 3.5 API,Anthropic Claude,Image Contours,Polygon Visualization,Model Comparison Visualization - outputs:
Polygon Zone Visualization,VLM As Classifier,Line Counter,MoonshotAI Kimi,Time in Zone,Google Gemma API,Seg Preview,Trace Visualization,Path Deviation,Line Counter Visualization,Anthropic Claude,Color Visualization,LMM For Classification,Perspective Correction,Clip Comparison,Corner Visualization,Halo Visualization,Keypoint Detection Model,Qwen-VL,Email Notification,Size Measurement,Halo Visualization,Object Detection Model,Classification Label Visualization,Google Gemma,Grid Visualization,Mask Visualization,Llama 3.2 Vision,Email Notification,Ellipse Visualization,Reference Path Visualization,Twilio SMS/MMS Notification,Label Visualization,Dot Visualization,Polygon Visualization,Cache Set,Crop Visualization,Path Deviation,Circle Visualization,Llama 3.2 Vision,SAM3 Video Tracker,OpenRouter,OpenAI,VLM As Detector,Florence-2 Model,OpenAI,Motion Detection,MoonshotAI Kimi,Object Detection Model,Instance Segmentation Model,Anthropic Claude,YOLO-World Model,Google Gemini,Qwen 3.6 API,Clip Comparison,Google Gemini,PLC EthernetIP,Detections Consensus,Keypoint Visualization,Buffer,Webhook Sink,Bounding Box Visualization,SAM 3,Florence-2 Model,PLC Reader,Instance Segmentation Model,Detections List Roll-Up,OpenAI,Line Counter,VLM As Detector,Google Gemini,Triangle Visualization,Roboflow Dataset Upload,SAM 3,Time in Zone,Instance Segmentation Model,Detections Classes Replacement,Keypoint Detection Model,Instance Segmentation Model,Roboflow Dataset Upload,Qwen 3.5 API,Object Detection Model,Anthropic Claude,Time in Zone,Keypoint Detection Model,VLM As Classifier,Polygon Visualization,SAM 3,Roboflow Asset Library Attributes
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v1 has.
Bindings
-
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
-
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
Example JSON definition of step Clip Comparison in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}