Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs:
Llama 3.2 Vision,Blur Visualization,Dimension Collapse,Perspective Correction,Polygon Zone Visualization,Bounding Box Visualization,QR Code Generator,Pixelate Visualization,Trace Visualization,Roboflow Custom Metadata,Image Threshold,Polygon Visualization,Dynamic Crop,Icon Visualization,Image Slicer,Stability AI Outpainting,Model Comparison Visualization,Dynamic Zone,Clip Comparison,LMM,OpenAI,Classification Label Visualization,Stitch Images,Size Measurement,Mask Visualization,Florence-2 Model,Single-Label Classification Model,Relative Static Crop,Absolute Static Crop,SIFT Comparison,Google Gemini,Circle Visualization,Florence-2 Model,LMM For Classification,Ellipse Visualization,Image Convert Grayscale,Object Detection Model,OCR Model,Image Preprocessing,Color Visualization,Image Blur,Stability AI Image Generation,Anthropic Claude,Google Vision OCR,Keypoint Visualization,Camera Calibration,Local File Sink,EasyOCR,Image Slicer,Email Notification,VLM as Detector,Roboflow Dataset Upload,Background Color Visualization,Triangle Visualization,Slack Notification,Keypoint Detection Model,Halo Visualization,Corner Visualization,Google Gemini,Model Monitoring Inference Aggregator,Roboflow Dataset Upload,Dot Visualization,Image Contours,Multi-Label Classification Model,Twilio SMS Notification,Instance Segmentation Model,VLM as Classifier,CSV Formatter,Reference Path Visualization,Morphological Transformation,Motion Detection,OpenAI,Webhook Sink,Contrast Equalization,Camera Focus,Stitch OCR Detections,Stability AI Inpainting,Clip Comparison,CogVLM,Line Counter Visualization,Email Notification,Crop Visualization,Grid Visualization,Buffer,OpenAI,SIFT,Depth Estimation,Background Subtraction,Label Visualization,Anthropic Claude,OpenAI - outputs:
Llama 3.2 Vision,SAM 3,Polygon Zone Visualization,Distance Measurement,Trace Visualization,Roboflow Custom Metadata,Image Threshold,Image Slicer,Icon Visualization,Stability AI Outpainting,Single-Label Classification Model,Clip Comparison,Model Comparison Visualization,Dynamic Zone,Stitch Images,Cache Get,Florence-2 Model,Size Measurement,Single-Label Classification Model,SAM 3,Relative Static Crop,SIFT Comparison,Moondream2,Florence-2 Model,LMM For Classification,Anthropic Claude,Image Blur,Stability AI Image Generation,VLM as Detector,Local File Sink,Keypoint Detection Model,Background Color Visualization,Keypoint Detection Model,Multi-Label Classification Model,Google Gemini,Model Monitoring Inference Aggregator,Roboflow Dataset Upload,Byte Tracker,Instance Segmentation Model,VLM as Classifier,Motion Detection,Morphological Transformation,OpenAI,YOLO-World Model,Clip Comparison,CogVLM,Identify Changes,Path Deviation,CLIP Embedding Model,Crop Visualization,Grid Visualization,Buffer,SAM 3,Anthropic Claude,Time in Zone,OpenAI,Line Counter,Detections Consensus,Path Deviation,Perception Encoder Embedding Model,Perspective Correction,Bounding Box Visualization,QR Code Generator,Segment Anything 2 Model,Polygon Visualization,Dynamic Crop,Identify Outliers,LMM,OpenAI,Classification Label Visualization,Mask Visualization,Time in Zone,Google Gemini,Circle Visualization,Ellipse Visualization,Time in Zone,Object Detection Model,Image Preprocessing,Color Visualization,Google Vision OCR,Keypoint Visualization,Email Notification,Line Counter,VLM as Detector,Image Slicer,Byte Tracker,Roboflow Dataset Upload,Triangle Visualization,Slack Notification,Object Detection Model,Halo Visualization,Corner Visualization,Detections Stabilizer,Dot Visualization,Multi-Label Classification Model,Twilio SMS Notification,Seg Preview,Reference Path Visualization,Byte Tracker,Webhook Sink,PTZ Tracking (ONVIF).md),Detections Classes Replacement,Instance Segmentation Model,Detections Stitch,Contrast Equalization,Stitch OCR Detections,Stability AI Inpainting,Line Counter Visualization,Cache Set,Template Matching,Email Notification,VLM as Classifier,OpenAI,Label Visualization,Pixel Color Count
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v2 has.
Bindings
-
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
-
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
Example JSON definition of step Clip Comparison in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs:
Google Gemini,Llama 3.2 Vision,Blur Visualization,Dot Visualization,Image Contours,Dimension Collapse,Perspective Correction,Polygon Zone Visualization,Bounding Box Visualization,QR Code Generator,Pixelate Visualization,Reference Path Visualization,Morphological Transformation,Trace Visualization,Motion Detection,OpenAI,Image Threshold,Polygon Visualization,Dynamic Crop,Icon Visualization,Image Slicer,Stability AI Outpainting,Model Comparison Visualization,Dynamic Zone,Clip Comparison,Contrast Equalization,OpenAI,Classification Label Visualization,Camera Focus,Stitch Images,Mask Visualization,Size Measurement,Stability AI Inpainting,Florence-2 Model,Clip Comparison,Relative Static Crop,Absolute Static Crop,Line Counter Visualization,SIFT Comparison,Google Gemini,Circle Visualization,Florence-2 Model,Ellipse Visualization,Image Convert Grayscale,Crop Visualization,Grid Visualization,Color Visualization,Image Blur,Image Preprocessing,Stability AI Image Generation,Anthropic Claude,Keypoint Visualization,Camera Calibration,Buffer,SIFT,Image Slicer,Depth Estimation,Background Subtraction,Label Visualization,Anthropic Claude,Background Color Visualization,Triangle Visualization,OpenAI,Halo Visualization,Corner Visualization - outputs:
Line Counter,Detections Consensus,Path Deviation,Llama 3.2 Vision,SAM 3,Perspective Correction,Polygon Zone Visualization,Bounding Box Visualization,Trace Visualization,Polygon Visualization,Clip Comparison,OpenAI,Classification Label Visualization,Florence-2 Model,Mask Visualization,Size Measurement,SAM 3,Time in Zone,Google Gemini,Circle Visualization,Florence-2 Model,LMM For Classification,Ellipse Visualization,Time in Zone,Object Detection Model,Anthropic Claude,Color Visualization,Keypoint Visualization,VLM as Detector,Keypoint Detection Model,Email Notification,Line Counter,VLM as Detector,Roboflow Dataset Upload,Triangle Visualization,Keypoint Detection Model,Object Detection Model,Halo Visualization,Corner Visualization,Google Gemini,Roboflow Dataset Upload,Dot Visualization,Instance Segmentation Model,VLM as Classifier,Seg Preview,Reference Path Visualization,Motion Detection,OpenAI,Webhook Sink,Instance Segmentation Model,YOLO-World Model,Clip Comparison,Line Counter Visualization,Cache Set,Path Deviation,Email Notification,Crop Visualization,VLM as Classifier,Grid Visualization,Buffer,SAM 3,Label Visualization,Anthropic Claude,Time in Zone,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v1 has.
Bindings
-
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
-
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
Example JSON definition of step Clip Comparison in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}