Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs:
Halo Visualization,Stitch OCR Detections,GLM-OCR,Image Threshold,Stitch Images,Morphological Transformation,Classification Label Visualization,Twilio SMS/MMS Notification,Crop Visualization,Icon Visualization,Stability AI Outpainting,Blur Visualization,VLM As Classifier,Reference Path Visualization,MoonshotAI Kimi,OpenAI,Google Gemini,Anthropic Claude,Webhook Sink,Camera Focus,QR Code Generator,Size Measurement,Model Comparison Visualization,Florence-2 Model,MQTT Writer,Trace Visualization,Ellipse Visualization,Anthropic Claude,Dot Visualization,Perspective Correction,Label Visualization,Image Convert Grayscale,Florence-2 Model,Text Display,Qwen-VL,Llama 3.2 Vision,Roboflow Dataset Upload,PLC ModbusTCP,Image Blur,Keypoint Detection Model,Absolute Static Crop,SIFT,CSV Formatter,LMM,Google Gemini,Dimension Collapse,EasyOCR,Qwen 3.5 API,Qwen 3.6 API,Local File Sink,Triangle Visualization,Camera Focus,Contrast Equalization,Polygon Visualization,OpenAI,Heatmap Visualization,Clip Comparison,Google Gemma API,Detections List Roll-Up,Contrast Enhancement,Google Gemini,PLC EthernetIP,Halo Visualization,Color Visualization,Morphological Transformation,MoonshotAI Kimi,Stitch OCR Detections,LMM For Classification,Event Writer,VLM As Detector,Llama 3.2 Vision,Buffer,Polygon Visualization,Image Stack,Email Notification,Mask Visualization,Anthropic Claude,Stability AI Inpainting,Roboflow Asset Library Attributes,Microsoft SQL Server Sink,Keypoint Visualization,OpenAI,Background Subtraction,Multi-Label Classification Model,Roboflow Vision Events,Twilio SMS Notification,Email Notification,Image Slicer,Image Contours,Line Counter Visualization,CogVLM,Object Detection Model,Image Preprocessing,OPC UA Writer Sink,Dynamic Crop,Depth Estimation,Bounding Box Visualization,Motion Detection,Qwen3.5-VL,Current Time,Clip Comparison,Corner Visualization,Polygon Zone Visualization,Camera Calibration,Roboflow Dataset Upload,Grid Visualization,Stability AI Image Generation,Dynamic Zone,OpenAI,S3 Sink,Circle Visualization,Image Slicer,OCR Model,Single-Label Classification Model,Relative Static Crop,Roboflow Custom Metadata,Instance Segmentation Model,Model Monitoring Inference Aggregator,OpenAI-Compatible LLM,Slack Notification,OpenRouter,SIFT Comparison,Pixelate Visualization,Google Vision OCR,Background Color Visualization,Google Gemma - outputs:
Overlap Analysis,Template Matching,Morphological Transformation,Classification Label Visualization,Crop Visualization,Stability AI Outpainting,Reference Path Visualization,OpenAI,YOLO-World Model,Detections Classes Replacement,Anthropic Claude,Track Class Lock,Size Measurement,Instance Segmentation Model,Model Comparison Visualization,Florence-2 Model,Trace Visualization,Label Visualization,Florence-2 Model,Qwen-VL,Llama 3.2 Vision,Text Display,Keypoint Detection Model,Image Blur,Keypoint Detection Model,LMM,OC-SORT Tracker,Qwen 3.5 API,Qwen 3.6 API,Line Counter,SORT Tracker,VLM As Detector,Multi-Label Classification Model,Clip Comparison,Detections Stitch,Google Gemma API,Halo Visualization,MoonshotAI Kimi,Color Visualization,Stitch OCR Detections,Morphological Transformation,Event Writer,Buffer,Stability AI Inpainting,Cache Set,Time in Zone,Roboflow Asset Library Attributes,Microsoft SQL Server Sink,OpenAI,Roboflow Vision Events,Identify Outliers,CogVLM,Detections Consensus,Object Detection Model,OPC UA Writer Sink,Semantic Segmentation Model,Path Deviation,Dynamic Crop,Byte Tracker,Bounding Box Visualization,Qwen3.5-VL,Clip Comparison,SAM 3,Cache Get,OpenAI,Time in Zone,Single-Label Classification Model,Slack Notification,OpenRouter,SIFT Comparison,Google Vision OCR,SAM3 Video Tracker,Dynamic Zone,Google Gemma,Halo Visualization,CLIP Embedding Model,Stitch OCR Detections,GLM-OCR,Image Threshold,Stitch Images,Twilio SMS/MMS Notification,VLM As Classifier,Icon Visualization,MoonshotAI Kimi,ByteTrack Tracker,Google Gemini,Single-Label Classification Model,Byte Tracker,Single-Label Classification Model,Webhook Sink,Instance Segmentation Model,QR Code Generator,Path Deviation,MQTT Writer,Ellipse Visualization,Object Detection Model,Anthropic Claude,Keypoint Detection Model,BoT-SORT Tracker,Dot Visualization,Perspective Correction,Instance Segmentation Model,Seg Preview,Per-Class Confidence Filter,Roboflow Dataset Upload,Detections Stabilizer,Google Gemini,Local File Sink,SAM 3,Triangle Visualization,Time in Zone,Contrast Equalization,Polygon Visualization,OpenAI,Heatmap Visualization,Perception Encoder Embedding Model,Detections List Roll-Up,Google Gemini,PLC EthernetIP,LMM For Classification,VLM As Detector,Llama 3.2 Vision,Multi-Label Classification Model,Polygon Visualization,Email Notification,Identify Changes,Mask Visualization,Anthropic Claude,Distance Measurement,PTZ Tracking (ONVIF),Keypoint Visualization,Multi-Label Classification Model,Twilio SMS Notification,Email Notification,Image Slicer,Line Counter Visualization,Byte Tracker,SAM 3,Image Preprocessing,VLM As Classifier,Depth Estimation,Pixel Color Count,Motion Detection,Current Time,Roboflow Dataset Upload,Corner Visualization,Polygon Zone Visualization,Moondream2,Grid Visualization,Stability AI Image Generation,Segment Anything 2 Model,S3 Sink,Circle Visualization,Image Slicer,Roboflow Custom Metadata,Relative Static Crop,Instance Segmentation Model,Model Monitoring Inference Aggregator,OpenAI-Compatible LLM,Object Detection Model,Background Color Visualization,Line Counter
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v2 has.
Bindings
-
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
-
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
Example JSON definition of step Clip Comparison in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs:
Halo Visualization,Image Threshold,Stitch Images,Morphological Transformation,Classification Label Visualization,Crop Visualization,Icon Visualization,Stability AI Outpainting,Blur Visualization,Reference Path Visualization,MoonshotAI Kimi,OpenAI,Google Gemini,Anthropic Claude,Camera Focus,QR Code Generator,Size Measurement,Model Comparison Visualization,Florence-2 Model,Trace Visualization,Ellipse Visualization,Anthropic Claude,Dot Visualization,Perspective Correction,Label Visualization,Image Convert Grayscale,Florence-2 Model,Text Display,Qwen-VL,Llama 3.2 Vision,PLC ModbusTCP,Image Blur,Absolute Static Crop,SIFT,Google Gemini,Dimension Collapse,Qwen 3.5 API,Qwen 3.6 API,Triangle Visualization,Camera Focus,Contrast Equalization,Polygon Visualization,OpenAI,Heatmap Visualization,Clip Comparison,Google Gemma API,Detections List Roll-Up,Contrast Enhancement,Google Gemini,PLC EthernetIP,Halo Visualization,Color Visualization,Morphological Transformation,MoonshotAI Kimi,Llama 3.2 Vision,Buffer,Polygon Visualization,Image Stack,Mask Visualization,Anthropic Claude,Stability AI Inpainting,Keypoint Visualization,Background Subtraction,Image Slicer,Image Contours,Line Counter Visualization,Image Preprocessing,Dynamic Crop,Depth Estimation,Bounding Box Visualization,Motion Detection,Clip Comparison,Corner Visualization,Polygon Zone Visualization,Camera Calibration,Grid Visualization,Stability AI Image Generation,Dynamic Zone,OpenAI,Circle Visualization,Image Slicer,Relative Static Crop,OpenRouter,SIFT Comparison,Pixelate Visualization,Background Color Visualization,Google Gemma - outputs:
Halo Visualization,Twilio SMS/MMS Notification,Classification Label Visualization,Crop Visualization,VLM As Classifier,Reference Path Visualization,MoonshotAI Kimi,OpenAI,YOLO-World Model,Detections Classes Replacement,Google Gemini,Anthropic Claude,Webhook Sink,Instance Segmentation Model,Size Measurement,Instance Segmentation Model,Path Deviation,Florence-2 Model,Trace Visualization,Ellipse Visualization,Object Detection Model,Keypoint Detection Model,Dot Visualization,Perspective Correction,Label Visualization,Instance Segmentation Model,Florence-2 Model,Seg Preview,Qwen-VL,Llama 3.2 Vision,Roboflow Dataset Upload,Keypoint Detection Model,Keypoint Detection Model,Google Gemini,Qwen 3.5 API,Qwen 3.6 API,SAM 3,Triangle Visualization,Time in Zone,Line Counter,Polygon Visualization,VLM As Detector,OpenAI,Clip Comparison,Google Gemma API,Detections List Roll-Up,Google Gemini,PLC EthernetIP,Halo Visualization,MoonshotAI Kimi,Color Visualization,LMM For Classification,VLM As Detector,Buffer,Llama 3.2 Vision,Polygon Visualization,Email Notification,Mask Visualization,Anthropic Claude,Cache Set,Time in Zone,Roboflow Asset Library Attributes,Keypoint Visualization,Email Notification,Line Counter Visualization,Detections Consensus,Object Detection Model,SAM 3,Path Deviation,VLM As Classifier,Bounding Box Visualization,Motion Detection,Clip Comparison,Roboflow Dataset Upload,Corner Visualization,Polygon Zone Visualization,Grid Visualization,SAM 3,OpenAI,Circle Visualization,Time in Zone,Instance Segmentation Model,Object Detection Model,OpenRouter,Anthropic Claude,Line Counter,SAM3 Video Tracker,Google Gemma
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v1 has.
Bindings
-
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
-
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
Example JSON definition of step Clip Comparison in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}