Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs:
Icon Visualization,Image Preprocessing,LMM,Blur Visualization,Twilio SMS Notification,Morphological Transformation,Roboflow Custom Metadata,Stitch Images,Color Visualization,Contrast Equalization,Llama 3.2 Vision,Circle Visualization,Stability AI Image Generation,Image Blur,Reference Path Visualization,SIFT,Detections List Roll-Up,Buffer,OpenAI,Email Notification,Halo Visualization,EasyOCR,Google Gemini,Trace Visualization,Dimension Collapse,Roboflow Dataset Upload,Twilio SMS/MMS Notification,Single-Label Classification Model,Classification Label Visualization,Clip Comparison,Image Convert Grayscale,CogVLM,Google Vision OCR,Background Color Visualization,Stitch OCR Detections,Multi-Label Classification Model,Camera Calibration,VLM as Detector,LMM For Classification,Triangle Visualization,Text Display,Dynamic Zone,Ellipse Visualization,Slack Notification,Mask Visualization,OpenAI,Local File Sink,Anthropic Claude,Polygon Zone Visualization,Google Gemini,Polygon Visualization,Absolute Static Crop,Model Comparison Visualization,Label Visualization,Webhook Sink,Line Counter Visualization,Perspective Correction,Florence-2 Model,Image Slicer,QR Code Generator,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,Object Detection Model,VLM as Classifier,Grid Visualization,Relative Static Crop,CSV Formatter,Image Slicer,Size Measurement,Image Contours,Stability AI Inpainting,Camera Focus,Google Gemini,Motion Detection,Florence-2 Model,SIFT Comparison,Keypoint Detection Model,Dot Visualization,Camera Focus,Crop Visualization,Clip Comparison,Bounding Box Visualization,OCR Model,Background Subtraction,OpenAI,Roboflow Dataset Upload,Dynamic Crop,Keypoint Visualization,Email Notification,Model Monitoring Inference Aggregator,Image Threshold,Anthropic Claude,Corner Visualization,Depth Estimation,OpenAI,Pixelate Visualization - outputs:
Icon Visualization,LMM,Image Preprocessing,Detections Classes Replacement,Color Visualization,Contrast Equalization,Cache Set,Llama 3.2 Vision,Reference Path Visualization,OpenAI,Buffer,SAM 3,Halo Visualization,Trace Visualization,Roboflow Dataset Upload,Twilio SMS/MMS Notification,VLM as Detector,Single-Label Classification Model,Path Deviation,Background Color Visualization,Multi-Label Classification Model,Single-Label Classification Model,VLM as Detector,Triangle Visualization,Dynamic Zone,Ellipse Visualization,Seg Preview,Slack Notification,Google Gemini,Time in Zone,Webhook Sink,Line Counter Visualization,Florence-2 Model,Detections Consensus,QR Code Generator,Anthropic Claude,Object Detection Model,Keypoint Detection Model,Image Slicer,Byte Tracker,Pixel Color Count,VLM as Classifier,Stability AI Inpainting,Line Counter,Google Gemini,Motion Detection,SIFT Comparison,Line Counter,Keypoint Detection Model,Dot Visualization,Byte Tracker,YOLO-World Model,Bounding Box Visualization,OpenAI,SAM 3,Dynamic Crop,Distance Measurement,Keypoint Visualization,Email Notification,Image Threshold,Anthropic Claude,Corner Visualization,Twilio SMS Notification,Moondream2,Morphological Transformation,Roboflow Custom Metadata,Stitch Images,Circle Visualization,Stability AI Image Generation,Template Matching,Image Blur,Detections List Roll-Up,Email Notification,Perception Encoder Embedding Model,Google Gemini,Instance Segmentation Model,Classification Label Visualization,Clip Comparison,Google Vision OCR,CogVLM,Stitch OCR Detections,Segment Anything 2 Model,LMM For Classification,Text Display,CLIP Embedding Model,SAM 3,Multi-Label Classification Model,OpenAI,Mask Visualization,Local File Sink,Anthropic Claude,Polygon Visualization,Polygon Zone Visualization,Model Comparison Visualization,Label Visualization,Perspective Correction,Identify Changes,Image Slicer,Instance Segmentation Model,Stability AI Outpainting,VLM as Classifier,Grid Visualization,Relative Static Crop,Size Measurement,Path Deviation,Time in Zone,Florence-2 Model,Identify Outliers,Object Detection Model,Detections Stitch,Cache Get,Crop Visualization,Clip Comparison,Byte Tracker,Detections Stabilizer,Roboflow Dataset Upload,PTZ Tracking (ONVIF).md),Model Monitoring Inference Aggregator,Time in Zone,Depth Estimation,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v2 has.
Bindings
-
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
-
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
Example JSON definition of step Clip Comparison in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs:
Icon Visualization,Line Counter Visualization,Perspective Correction,Image Preprocessing,Blur Visualization,Florence-2 Model,Morphological Transformation,Image Slicer,QR Code Generator,Pixelate Visualization,Stitch Images,Color Visualization,Contrast Equalization,Stability AI Outpainting,Anthropic Claude,Llama 3.2 Vision,Circle Visualization,Stability AI Image Generation,Grid Visualization,Relative Static Crop,Image Blur,Reference Path Visualization,SIFT,Image Slicer,Detections List Roll-Up,Size Measurement,Buffer,Image Contours,Stability AI Inpainting,Camera Focus,OpenAI,Halo Visualization,Google Gemini,Motion Detection,Google Gemini,Polygon Zone Visualization,Florence-2 Model,Trace Visualization,Google Gemini,SIFT Comparison,Dimension Collapse,Dot Visualization,Camera Focus,Image Convert Grayscale,Classification Label Visualization,Crop Visualization,Clip Comparison,Clip Comparison,Background Color Visualization,Bounding Box Visualization,Background Subtraction,Camera Calibration,Dynamic Crop,Triangle Visualization,Polygon Visualization,Text Display,Keypoint Visualization,Dynamic Zone,Ellipse Visualization,Image Threshold,Anthropic Claude,Corner Visualization,Mask Visualization,Depth Estimation,OpenAI,OpenAI,Absolute Static Crop,Anthropic Claude,Model Comparison Visualization,Label Visualization - outputs:
Color Visualization,Cache Set,Llama 3.2 Vision,Circle Visualization,Reference Path Visualization,Detections List Roll-Up,OpenAI,Buffer,Email Notification,SAM 3,Halo Visualization,Google Gemini,Trace Visualization,Roboflow Dataset Upload,Twilio SMS/MMS Notification,Instance Segmentation Model,VLM as Detector,Classification Label Visualization,Clip Comparison,Path Deviation,VLM as Detector,LMM For Classification,Triangle Visualization,SAM 3,Ellipse Visualization,Seg Preview,OpenAI,Mask Visualization,Anthropic Claude,Polygon Visualization,Google Gemini,Polygon Zone Visualization,Time in Zone,Label Visualization,Webhook Sink,Line Counter Visualization,Perspective Correction,Florence-2 Model,Detections Consensus,Instance Segmentation Model,Anthropic Claude,Object Detection Model,VLM as Classifier,Keypoint Detection Model,Grid Visualization,Size Measurement,VLM as Classifier,Path Deviation,Line Counter,Time in Zone,Google Gemini,Motion Detection,Florence-2 Model,Object Detection Model,Line Counter,Keypoint Detection Model,Dot Visualization,YOLO-World Model,Crop Visualization,Clip Comparison,Bounding Box Visualization,SAM 3,Roboflow Dataset Upload,Keypoint Visualization,Email Notification,Time in Zone,Anthropic Claude,Corner Visualization,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v1 has.
Bindings
-
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
-
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
Example JSON definition of step Clip Comparison in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}