Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs:
Stability AI Outpainting,OpenAI-Compatible LLM,Morphological Transformation,Motion Detection,Contrast Enhancement,Crop Visualization,Camera Focus,Blur Visualization,Image Preprocessing,Corner Visualization,Ellipse Visualization,Mask Visualization,Stability AI Image Generation,Qwen-VL,Google Gemma API,Heatmap Visualization,Object Detection Model,Stitch OCR Detections,Roboflow Vision Events,Image Slicer,Qwen 3.5 API,Trace Visualization,Background Color Visualization,Slack Notification,Anthropic Claude,Dimension Collapse,Qwen 3.6 API,OpenAI,Webhook Sink,Email Notification,Color Visualization,Bounding Box Visualization,Keypoint Visualization,Model Comparison Visualization,Google Gemma,Relative Static Crop,Llama 3.2 Vision,CogVLM,Polygon Zone Visualization,Qwen3.5-VL,Dynamic Crop,Camera Focus,Polygon Visualization,QR Code Generator,Google Vision OCR,GLM-OCR,Stitch Images,Google Gemini,Llama 3.2 Vision,OpenRouter,Twilio SMS Notification,Image Blur,Clip Comparison,Dynamic Zone,Model Monitoring Inference Aggregator,Anthropic Claude,Image Slicer,Depth Estimation,OpenAI,Detections List Roll-Up,Multi-Label Classification Model,Buffer,Google Gemini,Classification Label Visualization,Pixelate Visualization,EasyOCR,SIFT,Florence-2 Model,MoonshotAI Kimi,Contrast Equalization,Image Threshold,Instance Segmentation Model,MoonshotAI Kimi,Dot Visualization,Polygon Visualization,Background Subtraction,Roboflow Dataset Upload,Anthropic Claude,Halo Visualization,Stability AI Inpainting,Roboflow Custom Metadata,Florence-2 Model,Label Visualization,Local File Sink,Icon Visualization,Single-Label Classification Model,Image Contours,OpenAI,Absolute Static Crop,Google Gemini,Grid Visualization,VLM As Classifier,Camera Calibration,Halo Visualization,Email Notification,Size Measurement,OpenAI,Clip Comparison,LMM,LMM For Classification,Text Display,Image Convert Grayscale,Reference Path Visualization,Circle Visualization,Line Counter Visualization,Stitch OCR Detections,OCR Model,Keypoint Detection Model,SIFT Comparison,VLM As Detector,Image Stack,Morphological Transformation,Twilio SMS/MMS Notification,Roboflow Dataset Upload,CSV Formatter,S3 Sink,Triangle Visualization,Perspective Correction - outputs:
Stability AI Outpainting,Multi-Label Classification Model,CLIP Embedding Model,SAM 3,Motion Detection,Seg Preview,Corner Visualization,Ellipse Visualization,Object Detection Model,Image Preprocessing,Roboflow Vision Events,Heatmap Visualization,Trace Visualization,OC-SORT Tracker,VLM As Classifier,Time in Zone,OpenAI,Email Notification,Byte Tracker,Keypoint Visualization,Detections Consensus,Model Comparison Visualization,YOLO-World Model,Polygon Zone Visualization,Byte Tracker,Single-Label Classification Model,Dynamic Crop,Polygon Visualization,QR Code Generator,GLM-OCR,Stitch Images,OpenRouter,Clip Comparison,Dynamic Zone,Model Monitoring Inference Aggregator,Image Blur,Cache Get,Detections Stitch,Time in Zone,Detections List Roll-Up,Segment Anything 2 Model,Instance Segmentation Model,Google Gemini,Buffer,Contrast Equalization,Image Threshold,Instance Segmentation Model,Polygon Visualization,Anthropic Claude,Halo Visualization,Roboflow Custom Metadata,Keypoint Detection Model,Florence-2 Model,Local File Sink,Icon Visualization,Single-Label Classification Model,OpenAI,SAM 3,Grid Visualization,ByteTrack Tracker,VLM As Detector,Size Measurement,Multi-Label Classification Model,Object Detection Model,LMM,Reference Path Visualization,Stitch OCR Detections,Keypoint Detection Model,SIFT Comparison,Identify Changes,Roboflow Dataset Upload,S3 Sink,Cache Set,BoT-SORT Tracker,OpenAI-Compatible LLM,Object Detection Model,Morphological Transformation,Identify Outliers,Crop Visualization,Mask Visualization,Qwen-VL,Stability AI Image Generation,Stitch OCR Detections,Google Gemma API,Image Slicer,Qwen 3.5 API,Path Deviation,Perception Encoder Embedding Model,Background Color Visualization,Slack Notification,Anthropic Claude,Qwen 3.6 API,Webhook Sink,Color Visualization,Bounding Box Visualization,Google Gemma,Relative Static Crop,Llama 3.2 Vision,Path Deviation,CogVLM,Instance Segmentation Model,Instance Segmentation Model,Google Vision OCR,Google Gemini,SAM 3,Llama 3.2 Vision,Single-Label Classification Model,Distance Measurement,SORT Tracker,Twilio SMS Notification,Detections Stabilizer,Moondream2,Anthropic Claude,OpenAI,Image Slicer,Depth Estimation,Multi-Label Classification Model,Template Matching,Classification Label Visualization,PTZ Tracking (ONVIF),MoonshotAI Kimi,Florence-2 Model,Time in Zone,MoonshotAI Kimi,Line Counter,Dot Visualization,Keypoint Detection Model,Roboflow Dataset Upload,Per-Class Confidence Filter,Stability AI Inpainting,Semantic Segmentation Model,Line Counter,Detections Classes Replacement,Label Visualization,Google Gemini,VLM As Classifier,Email Notification,Halo Visualization,OpenAI,Clip Comparison,Pixel Color Count,LMM For Classification,Text Display,Circle Visualization,Line Counter Visualization,Byte Tracker,VLM As Detector,Overlap Analysis,Twilio SMS/MMS Notification,Morphological Transformation,Triangle Visualization,Perspective Correction
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v2 has.
Bindings
-
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
-
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
Example JSON definition of step Clip Comparison in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs:
Stability AI Outpainting,Morphological Transformation,Motion Detection,Contrast Enhancement,Crop Visualization,Camera Focus,Blur Visualization,Image Preprocessing,Corner Visualization,Ellipse Visualization,Mask Visualization,Stability AI Image Generation,Qwen-VL,Google Gemma API,Heatmap Visualization,Image Slicer,Qwen 3.5 API,Trace Visualization,Background Color Visualization,Anthropic Claude,Dimension Collapse,Qwen 3.6 API,OpenAI,Color Visualization,Bounding Box Visualization,Keypoint Visualization,Model Comparison Visualization,Google Gemma,Relative Static Crop,Llama 3.2 Vision,Polygon Zone Visualization,Dynamic Crop,Camera Focus,Polygon Visualization,QR Code Generator,Stitch Images,Google Gemini,Llama 3.2 Vision,OpenRouter,Image Blur,Clip Comparison,Dynamic Zone,Anthropic Claude,Image Slicer,Depth Estimation,OpenAI,Detections List Roll-Up,Buffer,Google Gemini,Classification Label Visualization,Pixelate Visualization,SIFT,Florence-2 Model,MoonshotAI Kimi,Contrast Equalization,Image Threshold,MoonshotAI Kimi,Dot Visualization,Polygon Visualization,Background Subtraction,Anthropic Claude,Halo Visualization,Stability AI Inpainting,Florence-2 Model,Label Visualization,Icon Visualization,Image Contours,Absolute Static Crop,Google Gemini,Grid Visualization,Camera Calibration,Halo Visualization,Size Measurement,OpenAI,Clip Comparison,Text Display,Image Convert Grayscale,Reference Path Visualization,Circle Visualization,Line Counter Visualization,SIFT Comparison,Image Stack,Morphological Transformation,Triangle Visualization,Perspective Correction - outputs:
Cache Set,Object Detection Model,SAM 3,Motion Detection,Crop Visualization,Mask Visualization,Qwen-VL,Seg Preview,Corner Visualization,Ellipse Visualization,Object Detection Model,Google Gemma API,Qwen 3.5 API,Trace Visualization,Path Deviation,VLM As Classifier,Anthropic Claude,Qwen 3.6 API,Time in Zone,OpenAI,Webhook Sink,Email Notification,Color Visualization,Bounding Box Visualization,Keypoint Visualization,Detections Consensus,Google Gemma,YOLO-World Model,Llama 3.2 Vision,Path Deviation,Polygon Zone Visualization,Instance Segmentation Model,Polygon Visualization,Instance Segmentation Model,Google Gemini,OpenRouter,SAM 3,Llama 3.2 Vision,Clip Comparison,Anthropic Claude,OpenAI,Time in Zone,Detections List Roll-Up,Instance Segmentation Model,Google Gemini,Buffer,Classification Label Visualization,MoonshotAI Kimi,Florence-2 Model,Time in Zone,Instance Segmentation Model,MoonshotAI Kimi,Line Counter,Dot Visualization,Polygon Visualization,Keypoint Detection Model,Roboflow Dataset Upload,Anthropic Claude,Halo Visualization,Keypoint Detection Model,Line Counter,Detections Classes Replacement,Florence-2 Model,Label Visualization,Google Gemini,SAM 3,Grid Visualization,VLM As Classifier,VLM As Detector,Email Notification,Halo Visualization,Size Measurement,OpenAI,Clip Comparison,Object Detection Model,LMM For Classification,Reference Path Visualization,Circle Visualization,Line Counter Visualization,Keypoint Detection Model,VLM As Detector,Twilio SMS/MMS Notification,Roboflow Dataset Upload,Triangle Visualization,Perspective Correction
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v1 has.
Bindings
-
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
-
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
Example JSON definition of step Clip Comparison in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}