Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2  (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs | 
|---|---|---|---|
name | 
str | 
Unique name of step in workflows. | ❌ | 
classes | 
List[str] | 
List of classes to calculate similarity against each input image. | ✅ | 
version | 
str | 
Variant of CLIP model. | ✅ | 
The Refs column marks possibility to parametrise the property with dynamic values available 
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs: 
Local File Sink,Dot Visualization,Stability AI Inpainting,Reference Path Visualization,VLM as Classifier,Stability AI Outpainting,Multi-Label Classification Model,QR Code Generator,Line Counter Visualization,Size Measurement,Ellipse Visualization,Dimension Collapse,Roboflow Custom Metadata,Background Color Visualization,Polygon Zone Visualization,CSV Formatter,Roboflow Dataset Upload,Contrast Equalization,EasyOCR,Object Detection Model,Image Slicer,Dynamic Zone,Google Gemini,Florence-2 Model,Google Vision OCR,Image Threshold,SIFT Comparison,Image Preprocessing,Icon Visualization,OCR Model,Roboflow Dataset Upload,Clip Comparison,Absolute Static Crop,Pixelate Visualization,Buffer,Image Blur,Perspective Correction,Relative Static Crop,Florence-2 Model,VLM as Detector,Llama 3.2 Vision,LMM For Classification,Clip Comparison,LMM,SIFT,Halo Visualization,Model Monitoring Inference Aggregator,Image Convert Grayscale,Anthropic Claude,Triangle Visualization,Depth Estimation,Image Contours,Mask Visualization,Keypoint Detection Model,Image Slicer,CogVLM,Model Comparison Visualization,Stitch OCR Detections,Twilio SMS Notification,Single-Label Classification Model,Polygon Visualization,Corner Visualization,Crop Visualization,Stitch Images,Blur Visualization,Dynamic Crop,Camera Focus,OpenAI,Email Notification,Color Visualization,Classification Label Visualization,Label Visualization,OpenAI,Circle Visualization,Keypoint Visualization,Trace Visualization,Camera Calibration,Instance Segmentation Model,Morphological Transformation,OpenAI,Bounding Box Visualization,Grid Visualization,Slack Notification,Webhook Sink,Stability AI Image Generation - outputs: 
PTZ Tracking (ONVIF).md),Local File Sink,Stability AI Inpainting,VLM as Classifier,Distance Measurement,Perception Encoder Embedding Model,QR Code Generator,Path Deviation,Time in Zone,Size Measurement,Identify Outliers,Polygon Zone Visualization,Contrast Equalization,Object Detection Model,Florence-2 Model,Detections Consensus,Roboflow Dataset Upload,Clip Comparison,Perspective Correction,Relative Static Crop,Line Counter,VLM as Detector,Single-Label Classification Model,LMM For Classification,Detections Stitch,LMM,Multi-Label Classification Model,Model Monitoring Inference Aggregator,CogVLM,Model Comparison Visualization,Template Matching,Twilio SMS Notification,Single-Label Classification Model,Polygon Visualization,Keypoint Detection Model,Byte Tracker,Instance Segmentation Model,VLM as Detector,Label Visualization,OpenAI,Circle Visualization,Keypoint Visualization,Trace Visualization,Instance Segmentation Model,Stability AI Image Generation,Dot Visualization,Reference Path Visualization,VLM as Classifier,CLIP Embedding Model,Object Detection Model,Slack Notification,Stability AI Outpainting,Multi-Label Classification Model,Cache Set,Line Counter Visualization,Detections Classes Replacement,Ellipse Visualization,Roboflow Custom Metadata,Background Color Visualization,Roboflow Dataset Upload,Image Slicer,Dynamic Zone,Google Gemini,Byte Tracker,Google Vision OCR,Identify Changes,SIFT Comparison,Image Threshold,Image Preprocessing,Icon Visualization,YOLO-World Model,Buffer,Image Blur,Florence-2 Model,Pixel Color Count,Llama 3.2 Vision,Line Counter,Time in Zone,Clip Comparison,Halo Visualization,Cache Get,Anthropic Claude,Triangle Visualization,Mask Visualization,Keypoint Detection Model,Image Slicer,Stitch OCR Detections,Time in Zone,Moondream2,Corner Visualization,Crop Visualization,Stitch Images,Dynamic Crop,Detections Stabilizer,OpenAI,Email Notification,Segment Anything 2 Model,Color Visualization,Classification Label Visualization,Morphological Transformation,OpenAI,Bounding Box Visualization,Path Deviation,Grid Visualization,Seg Preview,Webhook Sink,Byte Tracker 
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds 
Clip Comparison in version v2  has.
Bindings
- 
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
 - 
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
 
Example JSON definition of step Clip Comparison in version v2
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/clip_comparison@v2",
    "images": "$inputs.image",
    "classes": [
        "a",
        "b",
        "c"
    ],
    "version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1  (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs | 
|---|---|---|---|
name | 
str | 
Unique name of step in workflows. | ❌ | 
texts | 
List[str] | 
List of texts to calculate similarity against each input image. | ✅ | 
The Refs column marks possibility to parametrise the property with dynamic values available 
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs: 
Llama 3.2 Vision,Dot Visualization,Stability AI Inpainting,Reference Path Visualization,Clip Comparison,Buffer,SIFT,Halo Visualization,Image Convert Grayscale,Stability AI Outpainting,QR Code Generator,Anthropic Claude,Triangle Visualization,Depth Estimation,Image Contours,Line Counter Visualization,Mask Visualization,Image Slicer,Size Measurement,Ellipse Visualization,Model Comparison Visualization,Dimension Collapse,Polygon Visualization,Background Color Visualization,Polygon Zone Visualization,Corner Visualization,Crop Visualization,Stitch Images,Contrast Equalization,Blur Visualization,Dynamic Crop,Image Slicer,Camera Focus,Dynamic Zone,OpenAI,Google Gemini,Color Visualization,Florence-2 Model,Classification Label Visualization,Label Visualization,OpenAI,Circle Visualization,Image Threshold,SIFT Comparison,Keypoint Visualization,Camera Calibration,Trace Visualization,Image Preprocessing,Morphological Transformation,Icon Visualization,Perspective Correction,Bounding Box Visualization,Clip Comparison,Absolute Static Crop,Grid Visualization,Pixelate Visualization,Image Blur,Relative Static Crop,Florence-2 Model,Stability AI Image Generation - outputs: 
VLM as Detector,Llama 3.2 Vision,LMM For Classification,Dot Visualization,Line Counter,Time in Zone,Clip Comparison,Reference Path Visualization,Halo Visualization,VLM as Classifier,Object Detection Model,VLM as Classifier,Florence-2 Model,Path Deviation,Anthropic Claude,Time in Zone,Triangle Visualization,Cache Set,Mask Visualization,Line Counter Visualization,Size Measurement,Keypoint Detection Model,Ellipse Visualization,Time in Zone,Polygon Visualization,Polygon Zone Visualization,Corner Visualization,Crop Visualization,Roboflow Dataset Upload,Keypoint Detection Model,Object Detection Model,Instance Segmentation Model,OpenAI,Email Notification,VLM as Detector,Google Gemini,Color Visualization,Florence-2 Model,Classification Label Visualization,Label Visualization,OpenAI,Circle Visualization,Keypoint Visualization,Trace Visualization,Detections Consensus,Instance Segmentation Model,YOLO-World Model,Roboflow Dataset Upload,Bounding Box Visualization,Clip Comparison,Webhook Sink,Path Deviation,Grid Visualization,Buffer,Seg Preview,Perspective Correction,Line Counter 
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds 
Clip Comparison in version v1  has.
Bindings
- 
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
 - 
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
 
Example JSON definition of step Clip Comparison in version v1
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/clip_comparison@v1",
    "images": "$inputs.image",
    "texts": [
        "a",
        "b",
        "c"
    ]
}