Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs:
Dynamic Crop,Motion Detection,OCR Model,Email Notification,Image Blur,Background Subtraction,SIFT Comparison,OpenAI,Google Gemini,Google Vision OCR,Image Preprocessing,Instance Segmentation Model,Local File Sink,Single-Label Classification Model,Bounding Box Visualization,Anthropic Claude,Model Monitoring Inference Aggregator,Multi-Label Classification Model,Keypoint Detection Model,Email Notification,Slack Notification,Camera Focus,Twilio SMS/MMS Notification,Dot Visualization,Florence-2 Model,Roboflow Dataset Upload,CSV Formatter,Camera Focus,Stitch OCR Detections,Depth Estimation,Polygon Visualization,Perspective Correction,OpenAI,Camera Calibration,Corner Visualization,Icon Visualization,Image Slicer,Detections List Roll-Up,Line Counter Visualization,Heatmap Visualization,Morphological Transformation,Stability AI Image Generation,Google Gemini,Keypoint Visualization,VLM As Detector,Halo Visualization,Background Color Visualization,Label Visualization,Polygon Visualization,Pixelate Visualization,LMM,CogVLM,Contrast Equalization,Triangle Visualization,Stability AI Outpainting,Dimension Collapse,Mask Visualization,VLM As Classifier,Color Visualization,Text Display,Relative Static Crop,Reference Path Visualization,Stitch OCR Detections,Llama 3.2 Vision,OpenAI,Clip Comparison,Image Threshold,Clip Comparison,Classification Label Visualization,Webhook Sink,Circle Visualization,Polygon Zone Visualization,Image Contours,Image Convert Grayscale,Grid Visualization,Florence-2 Model,Buffer,Roboflow Custom Metadata,Dynamic Zone,LMM For Classification,SIFT,Halo Visualization,Object Detection Model,Anthropic Claude,Google Gemini,Model Comparison Visualization,Blur Visualization,QR Code Generator,EasyOCR,Absolute Static Crop,Image Slicer,S3 Sink,Anthropic Claude,Stability AI Inpainting,Ellipse Visualization,Crop Visualization,Qwen3.5-VL,Trace Visualization,Twilio SMS Notification,Stitch Images,Size Measurement,OpenAI,Roboflow Dataset Upload - outputs:
Dynamic Crop,Image Blur,Google Vision OCR,Google Gemini,Image Preprocessing,Object Detection Model,Local File Sink,Single-Label Classification Model,Bounding Box Visualization,Multi-Label Classification Model,Model Monitoring Inference Aggregator,Keypoint Detection Model,Identify Outliers,Dot Visualization,Florence-2 Model,Roboflow Dataset Upload,Depth Estimation,Polygon Visualization,OpenAI,Line Counter,Detections List Roll-Up,Image Slicer,Line Counter Visualization,Heatmap Visualization,Google Gemini,Stability AI Image Generation,Morphological Transformation,Distance Measurement,Keypoint Visualization,Keypoint Detection Model,Background Color Visualization,Label Visualization,Polygon Visualization,LMM,CogVLM,Time in Zone,Single-Label Classification Model,Triangle Visualization,Stability AI Outpainting,Mask Visualization,Color Visualization,Text Display,Reference Path Visualization,Llama 3.2 Vision,OpenAI,Clip Comparison,Clip Comparison,Classification Label Visualization,Image Threshold,Polygon Zone Visualization,VLM As Classifier,Roboflow Custom Metadata,LMM For Classification,Dynamic Zone,Halo Visualization,Path Deviation,Anthropic Claude,SAM 3,Ellipse Visualization,Identify Changes,Path Deviation,Crop Visualization,Trace Visualization,Twilio SMS Notification,Size Measurement,Stitch Images,Detections Stabilizer,Time in Zone,Motion Detection,Email Notification,OpenAI,SIFT Comparison,Seg Preview,Time in Zone,Instance Segmentation Model,Anthropic Claude,Multi-Label Classification Model,Email Notification,Twilio SMS/MMS Notification,Detections Stitch,Slack Notification,VLM As Detector,Cache Set,SAM 3,Stitch OCR Detections,Perspective Correction,PTZ Tracking (ONVIF),Moondream2,Icon Visualization,Corner Visualization,Byte Tracker,VLM As Detector,Halo Visualization,Contrast Equalization,VLM As Classifier,Instance Segmentation Model,Detections Classes Replacement,Line Counter,Relative Static Crop,Stitch OCR Detections,Webhook Sink,Circle Visualization,Grid Visualization,Byte Tracker,Buffer,Florence-2 Model,SAM 3,Perception Encoder Embedding Model,Cache Get,YOLO-World Model,Object Detection Model,Template Matching,Detections Consensus,Byte Tracker,Anthropic Claude,Google Gemini,Model Comparison Visualization,QR Code Generator,Image Slicer,S3 Sink,CLIP Embedding Model,Stability AI Inpainting,Segment Anything 2 Model,OpenAI,Pixel Color Count,Roboflow Dataset Upload
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v2 has.
Bindings
-
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
-
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
Example JSON definition of step Clip Comparison in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs:
Dynamic Crop,Motion Detection,Circle Visualization,Image Blur,Background Subtraction,Polygon Zone Visualization,SIFT Comparison,OpenAI,Image Contours,Image Preprocessing,Image Convert Grayscale,Grid Visualization,Google Gemini,Clip Comparison,Bounding Box Visualization,Anthropic Claude,Florence-2 Model,Buffer,Camera Focus,Dot Visualization,Florence-2 Model,Dynamic Zone,Camera Focus,SIFT,Depth Estimation,Polygon Visualization,Perspective Correction,Halo Visualization,OpenAI,Camera Calibration,Corner Visualization,Icon Visualization,Image Slicer,Detections List Roll-Up,Line Counter Visualization,Heatmap Visualization,Morphological Transformation,Stability AI Image Generation,Anthropic Claude,Google Gemini,Model Comparison Visualization,Google Gemini,Blur Visualization,Keypoint Visualization,QR Code Generator,Halo Visualization,Background Color Visualization,Absolute Static Crop,Label Visualization,Image Slicer,Polygon Visualization,Pixelate Visualization,Contrast Equalization,Anthropic Claude,Triangle Visualization,Stability AI Outpainting,Dimension Collapse,Stability AI Inpainting,Ellipse Visualization,Mask Visualization,Crop Visualization,Trace Visualization,Color Visualization,Stitch Images,Text Display,Relative Static Crop,Size Measurement,Reference Path Visualization,OpenAI,Llama 3.2 Vision,Image Threshold,Clip Comparison,Classification Label Visualization - outputs:
Time in Zone,Motion Detection,Email Notification,OpenAI,Seg Preview,Time in Zone,Google Gemini,Instance Segmentation Model,Object Detection Model,Bounding Box Visualization,Anthropic Claude,Keypoint Detection Model,Email Notification,Twilio SMS/MMS Notification,VLM As Detector,Dot Visualization,Florence-2 Model,Roboflow Dataset Upload,Cache Set,SAM 3,Polygon Visualization,Perspective Correction,OpenAI,Line Counter,Detections List Roll-Up,Corner Visualization,Line Counter Visualization,Google Gemini,VLM As Detector,Keypoint Visualization,Halo Visualization,Keypoint Detection Model,Label Visualization,Polygon Visualization,Time in Zone,Triangle Visualization,Mask Visualization,VLM As Classifier,Color Visualization,Instance Segmentation Model,Detections Classes Replacement,Line Counter,Reference Path Visualization,Llama 3.2 Vision,Clip Comparison,Clip Comparison,Classification Label Visualization,Webhook Sink,Circle Visualization,Polygon Zone Visualization,Grid Visualization,VLM As Classifier,Buffer,Florence-2 Model,SAM 3,LMM For Classification,YOLO-World Model,Halo Visualization,Object Detection Model,Detections Consensus,Anthropic Claude,Google Gemini,Path Deviation,Anthropic Claude,SAM 3,Ellipse Visualization,Path Deviation,Crop Visualization,Trace Visualization,Size Measurement,OpenAI,Roboflow Dataset Upload
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v1 has.
Bindings
-
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
-
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
Example JSON definition of step Clip Comparison in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}