Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs:
Google Vision OCR,Label Visualization,LMM For Classification,Blur Visualization,Background Color Visualization,Contrast Equalization,Bounding Box Visualization,Keypoint Visualization,Stability AI Outpainting,Reference Path Visualization,Image Slicer,Pixelate Visualization,Single-Label Classification Model,Clip Comparison,CSV Formatter,Image Preprocessing,Color Visualization,SIFT Comparison,Object Detection Model,Email Notification,Anthropic Claude,Circle Visualization,Image Contours,Polygon Zone Visualization,Ellipse Visualization,Clip Comparison,Email Notification,VLM as Classifier,Model Monitoring Inference Aggregator,OCR Model,Absolute Static Crop,Depth Estimation,LMM,Morphological Transformation,Roboflow Dataset Upload,Crop Visualization,OpenAI,Image Convert Grayscale,Florence-2 Model,CogVLM,Roboflow Custom Metadata,VLM as Detector,Classification Label Visualization,Buffer,Stitch OCR Detections,Keypoint Detection Model,Camera Calibration,Polygon Visualization,Icon Visualization,Triangle Visualization,Roboflow Dataset Upload,Anthropic Claude,Model Comparison Visualization,Corner Visualization,Florence-2 Model,Google Gemini,Google Gemini,EasyOCR,Line Counter Visualization,Grid Visualization,Halo Visualization,Size Measurement,Stability AI Image Generation,QR Code Generator,Dynamic Zone,Twilio SMS Notification,Relative Static Crop,Dot Visualization,Llama 3.2 Vision,Image Blur,Dimension Collapse,Slack Notification,OpenAI,Local File Sink,Multi-Label Classification Model,Image Slicer,OpenAI,Stability AI Inpainting,Dynamic Crop,Camera Focus,Webhook Sink,Image Threshold,Instance Segmentation Model,Perspective Correction,Mask Visualization,Trace Visualization,OpenAI,Stitch Images,SIFT - outputs:
Label Visualization,Background Color Visualization,Contrast Equalization,Reference Path Visualization,Stability AI Outpainting,Image Slicer,Single-Label Classification Model,Clip Comparison,Perception Encoder Embedding Model,Seg Preview,Image Preprocessing,Color Visualization,SIFT Comparison,Email Notification,Cache Set,Circle Visualization,Object Detection Model,Moondream2,VLM as Classifier,Model Monitoring Inference Aggregator,Path Deviation,LMM,Time in Zone,Morphological Transformation,Detections Consensus,Crop Visualization,OpenAI,Florence-2 Model,Classification Label Visualization,Byte Tracker,Segment Anything 2 Model,Time in Zone,YOLO-World Model,PTZ Tracking (ONVIF).md),Icon Visualization,Distance Measurement,VLM as Detector,Line Counter Visualization,Grid Visualization,Halo Visualization,Size Measurement,Dynamic Zone,Time in Zone,Twilio SMS Notification,Detections Stitch,Llama 3.2 Vision,Image Blur,Slack Notification,Byte Tracker,OpenAI,Multi-Label Classification Model,Image Slicer,OpenAI,Dynamic Crop,Pixel Color Count,Mask Visualization,Stitch Images,Google Vision OCR,LMM For Classification,Keypoint Visualization,Bounding Box Visualization,SAM 3,Byte Tracker,SAM 3,Object Detection Model,Path Deviation,Anthropic Claude,Polygon Zone Visualization,Line Counter,Ellipse Visualization,Email Notification,Clip Comparison,Roboflow Dataset Upload,SAM 3,VLM as Detector,Multi-Label Classification Model,CogVLM,Roboflow Custom Metadata,Buffer,Keypoint Detection Model,Stitch OCR Detections,Keypoint Detection Model,Line Counter,Polygon Visualization,CLIP Embedding Model,Detections Classes Replacement,Cache Get,Identify Changes,Triangle Visualization,Template Matching,Roboflow Dataset Upload,Anthropic Claude,Model Comparison Visualization,Corner Visualization,Florence-2 Model,Google Gemini,Google Gemini,Stability AI Image Generation,Identify Outliers,QR Code Generator,Dot Visualization,Relative Static Crop,Local File Sink,Instance Segmentation Model,Stability AI Inpainting,Single-Label Classification Model,Detections Stabilizer,Webhook Sink,Instance Segmentation Model,VLM as Classifier,Perspective Correction,OpenAI,Image Threshold,Trace Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v2 has.
Bindings
-
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
-
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
Example JSON definition of step Clip Comparison in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs:
Label Visualization,Blur Visualization,Background Color Visualization,Contrast Equalization,Bounding Box Visualization,Camera Calibration,Polygon Visualization,Stability AI Outpainting,Image Slicer,Keypoint Visualization,Reference Path Visualization,Pixelate Visualization,Icon Visualization,Clip Comparison,Triangle Visualization,Anthropic Claude,Model Comparison Visualization,Corner Visualization,Florence-2 Model,Image Preprocessing,Google Gemini,Google Gemini,Color Visualization,SIFT Comparison,Line Counter Visualization,Grid Visualization,Stitch Images,Halo Visualization,Size Measurement,Stability AI Image Generation,Anthropic Claude,QR Code Generator,Circle Visualization,Image Contours,Dynamic Zone,Relative Static Crop,Dot Visualization,Polygon Zone Visualization,Ellipse Visualization,Llama 3.2 Vision,Image Blur,Dimension Collapse,Clip Comparison,Absolute Static Crop,Depth Estimation,Image Slicer,OpenAI,Morphological Transformation,Stability AI Inpainting,Dynamic Crop,Camera Focus,Crop Visualization,Image Threshold,Perspective Correction,Image Convert Grayscale,Mask Visualization,Trace Visualization,OpenAI,OpenAI,Florence-2 Model,Classification Label Visualization,Buffer,SIFT - outputs:
Label Visualization,LMM For Classification,Keypoint Visualization,Bounding Box Visualization,Reference Path Visualization,Clip Comparison,SAM 3,Seg Preview,SAM 3,Color Visualization,Object Detection Model,Path Deviation,Email Notification,Cache Set,Anthropic Claude,Circle Visualization,Object Detection Model,Polygon Zone Visualization,Line Counter,Ellipse Visualization,Email Notification,Clip Comparison,VLM as Classifier,Path Deviation,Time in Zone,Roboflow Dataset Upload,Detections Consensus,Crop Visualization,OpenAI,Florence-2 Model,VLM as Detector,SAM 3,Classification Label Visualization,Buffer,Keypoint Detection Model,Keypoint Detection Model,Time in Zone,Line Counter,Polygon Visualization,YOLO-World Model,Triangle Visualization,Roboflow Dataset Upload,Anthropic Claude,Corner Visualization,Florence-2 Model,Google Gemini,Google Gemini,VLM as Detector,Line Counter Visualization,Grid Visualization,Halo Visualization,Size Measurement,Time in Zone,Dot Visualization,Llama 3.2 Vision,Instance Segmentation Model,OpenAI,Webhook Sink,Instance Segmentation Model,VLM as Classifier,Mask Visualization,Perspective Correction,OpenAI,Trace Visualization
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v1 has.
Bindings
-
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
-
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
Example JSON definition of step Clip Comparison in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}