Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v2.
- inputs:
Perspective Correction,S3 Sink,Stability AI Inpainting,Image Convert Grayscale,Clip Comparison,Morphological Transformation,Email Notification,Qwen-VL,VLM As Detector,QR Code Generator,Twilio SMS/MMS Notification,OpenRouter,Model Monitoring Inference Aggregator,OpenAI,Llama 3.2 Vision,MoonshotAI Kimi,Polygon Zone Visualization,Image Threshold,Stitch OCR Detections,Anthropic Claude,OpenAI-Compatible LLM,OpenAI,Dynamic Crop,Size Measurement,Heatmap Visualization,Keypoint Visualization,Email Notification,Llama 3.2 Vision,Anthropic Claude,Stability AI Image Generation,Clip Comparison,Google Vision OCR,Camera Focus,Label Visualization,Instance Segmentation Model,Contrast Enhancement,Bounding Box Visualization,Local File Sink,Depth Estimation,Google Gemini,Image Contours,EasyOCR,Relative Static Crop,Motion Detection,Polygon Visualization,Google Gemma API,Background Color Visualization,Qwen 3.6 API,Qwen 3.5 API,Image Blur,Polygon Visualization,Google Gemini,SIFT Comparison,Grid Visualization,Anthropic Claude,Florence-2 Model,Triangle Visualization,Object Detection Model,OCR Model,Roboflow Custom Metadata,OpenAI,Slack Notification,VLM As Classifier,Image Stack,Pixelate Visualization,Stitch Images,Single-Label Classification Model,OpenAI,Buffer,Image Slicer,LMM For Classification,Keypoint Detection Model,Image Preprocessing,SIFT,Line Counter Visualization,Roboflow Dataset Upload,Image Slicer,Dynamic Zone,Corner Visualization,Stability AI Outpainting,Halo Visualization,Multi-Label Classification Model,LMM,Roboflow Dataset Upload,Qwen3.5-VL,Color Visualization,Detections List Roll-Up,Blur Visualization,Google Gemini,Classification Label Visualization,Camera Focus,Camera Calibration,Morphological Transformation,Trace Visualization,Stitch OCR Detections,Reference Path Visualization,Halo Visualization,Ellipse Visualization,Model Comparison Visualization,Dot Visualization,Mask Visualization,GLM-OCR,Crop Visualization,Background Subtraction,Circle Visualization,CogVLM,Text Display,Dimension Collapse,Absolute Static Crop,CSV Formatter,Florence-2 Model,Contrast Equalization,Roboflow Vision Events,Webhook Sink,Icon Visualization,Twilio SMS Notification,MoonshotAI Kimi,Google Gemma - outputs:
S3 Sink,Email Notification,Keypoint Detection Model,Path Deviation,VLM As Detector,Qwen-VL,SAM 3,Clip Comparison,Morphological Transformation,Twilio SMS/MMS Notification,YOLO-World Model,Line Counter,Time in Zone,Polygon Zone Visualization,MoonshotAI Kimi,Stitch OCR Detections,OpenAI-Compatible LLM,OpenAI,VLM As Detector,Heatmap Visualization,Email Notification,Keypoint Visualization,Seg Preview,Anthropic Claude,Llama 3.2 Vision,Stability AI Image Generation,Google Vision OCR,Label Visualization,SAM 3,Instance Segmentation Model,Path Deviation,Local File Sink,Multi-Label Classification Model,Google Gemini,Motion Detection,Byte Tracker,Background Color Visualization,Instance Segmentation Model,Qwen 3.5 API,Google Gemini,Polygon Visualization,Moondream2,Grid Visualization,SIFT Comparison,Florence-2 Model,Time in Zone,Single-Label Classification Model,VLM As Classifier,Detections Stabilizer,LMM For Classification,Keypoint Detection Model,Image Preprocessing,Roboflow Dataset Upload,Dynamic Zone,Corner Visualization,Stability AI Outpainting,Segment Anything 2 Model,Halo Visualization,Multi-Label Classification Model,Time in Zone,Detections List Roll-Up,Semantic Segmentation Model,Perception Encoder Embedding Model,Distance Measurement,VLM As Classifier,Trace Visualization,Morphological Transformation,Stitch OCR Detections,Reference Path Visualization,Halo Visualization,Model Comparison Visualization,Dot Visualization,Pixel Color Count,Text Display,ByteTrack Tracker,Florence-2 Model,Byte Tracker,Identify Outliers,Icon Visualization,Object Detection Model,Perspective Correction,SAM 3,BoT-SORT Tracker,Stability AI Inpainting,Object Detection Model,Line Counter,QR Code Generator,OpenRouter,Model Monitoring Inference Aggregator,OpenAI,Llama 3.2 Vision,Image Threshold,OC-SORT Tracker,Anthropic Claude,Dynamic Crop,Detections Consensus,Size Measurement,Clip Comparison,Cache Set,Bounding Box Visualization,Depth Estimation,Keypoint Detection Model,CLIP Embedding Model,Relative Static Crop,Multi-Label Classification Model,Polygon Visualization,Google Gemma API,Qwen 3.6 API,Template Matching,Single-Label Classification Model,Image Blur,Anthropic Claude,Per-Class Confidence Filter,Triangle Visualization,Object Detection Model,Roboflow Custom Metadata,OpenAI,Slack Notification,Single-Label Classification Model,Stitch Images,Instance Segmentation Model,Buffer,OpenAI,Image Slicer,Line Counter Visualization,Image Slicer,Detections Classes Replacement,Cache Get,LMM,Roboflow Dataset Upload,Color Visualization,Google Gemini,Classification Label Visualization,Detections Stitch,Byte Tracker,Ellipse Visualization,PTZ Tracking (ONVIF),Identify Changes,SORT Tracker,Mask Visualization,GLM-OCR,Crop Visualization,Circle Visualization,CogVLM,Contrast Equalization,Roboflow Vision Events,Webhook Sink,Twilio SMS Notification,MoonshotAI Kimi,Google Gemma
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v2 has.
Bindings
-
input
images(image): The image to infer on..classes(list_of_values): List of classes to calculate similarity against each input image.version(string): Variant of CLIP model.
-
output
similarities(list_of_values): List of values of any type.max_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].most_similar_class(string): String value.min_similarity(float_zero_to_one):floatvalue in range[0.0, 1.0].least_similar_class(string): String value.classification_predictions(classification_prediction): Predictions from classifier.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.
Example JSON definition of step Clip Comparison in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/clip_comparison@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison in version v1.
- inputs:
Perspective Correction,Stability AI Inpainting,Image Convert Grayscale,Clip Comparison,Morphological Transformation,Qwen-VL,QR Code Generator,OpenRouter,OpenAI,Llama 3.2 Vision,MoonshotAI Kimi,Polygon Zone Visualization,Image Threshold,Anthropic Claude,OpenAI,Dynamic Crop,Size Measurement,Heatmap Visualization,Keypoint Visualization,Llama 3.2 Vision,Anthropic Claude,Stability AI Image Generation,Clip Comparison,Camera Focus,Label Visualization,Contrast Enhancement,Bounding Box Visualization,Depth Estimation,Google Gemini,Image Contours,Relative Static Crop,Motion Detection,Polygon Visualization,Google Gemma API,Background Color Visualization,Qwen 3.6 API,Qwen 3.5 API,Image Blur,Polygon Visualization,Google Gemini,SIFT Comparison,Grid Visualization,Anthropic Claude,Florence-2 Model,Triangle Visualization,OpenAI,Image Stack,Pixelate Visualization,Stitch Images,Buffer,Image Slicer,Image Preprocessing,SIFT,Line Counter Visualization,Image Slicer,Dynamic Zone,Corner Visualization,Stability AI Outpainting,Halo Visualization,Color Visualization,Detections List Roll-Up,Blur Visualization,Google Gemini,Classification Label Visualization,Camera Focus,Camera Calibration,Morphological Transformation,Trace Visualization,Reference Path Visualization,Halo Visualization,Ellipse Visualization,Model Comparison Visualization,Dot Visualization,Mask Visualization,Crop Visualization,Background Subtraction,Circle Visualization,Text Display,Dimension Collapse,Absolute Static Crop,Florence-2 Model,Contrast Equalization,Icon Visualization,MoonshotAI Kimi,Google Gemma - outputs:
Object Detection Model,Perspective Correction,SAM 3,Email Notification,Keypoint Detection Model,Path Deviation,VLM As Detector,Qwen-VL,SAM 3,Clip Comparison,Object Detection Model,Twilio SMS/MMS Notification,Line Counter,OpenRouter,YOLO-World Model,OpenAI,Llama 3.2 Vision,Line Counter,Time in Zone,Polygon Zone Visualization,MoonshotAI Kimi,Anthropic Claude,OpenAI,VLM As Detector,Detections Consensus,Size Measurement,Email Notification,Keypoint Visualization,Seg Preview,Anthropic Claude,Llama 3.2 Vision,Clip Comparison,Cache Set,Label Visualization,SAM 3,Instance Segmentation Model,Path Deviation,Bounding Box Visualization,Google Gemini,Keypoint Detection Model,Motion Detection,Polygon Visualization,Google Gemma API,Qwen 3.6 API,Instance Segmentation Model,Qwen 3.5 API,Google Gemini,Polygon Visualization,Grid Visualization,Anthropic Claude,Florence-2 Model,Triangle Visualization,Time in Zone,Object Detection Model,OpenAI,VLM As Classifier,Instance Segmentation Model,Buffer,LMM For Classification,Keypoint Detection Model,Roboflow Dataset Upload,Line Counter Visualization,Detections Classes Replacement,Corner Visualization,Halo Visualization,Roboflow Dataset Upload,Time in Zone,Detections List Roll-Up,Color Visualization,Google Gemini,Classification Label Visualization,VLM As Classifier,Trace Visualization,Reference Path Visualization,Halo Visualization,Ellipse Visualization,Dot Visualization,Mask Visualization,Crop Visualization,Circle Visualization,Florence-2 Model,Webhook Sink,MoonshotAI Kimi,Google Gemma
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison in version v1 has.
Bindings
-
input
images(image): The image to infer on..texts(list_of_values): List of texts to calculate similarity against each input image.
-
output
similarity(list_of_values): List of values of any type.parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.prediction_type(prediction_type): String value with type of prediction.
Example JSON definition of step Clip Comparison in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}