Clip Comparison¶
v2¶
Class: ClipComparisonBlockV2
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v2.ClipComparisonBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/clip_comparison@v2
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
classes |
List[str] |
List of classes to calculate similarity against each input image. | ✅ |
version |
str |
Variant of CLIP model. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison
in version v2
.
- inputs:
Circle Visualization
,Background Color Visualization
,Corner Visualization
,Twilio SMS Notification
,Slack Notification
,VLM as Detector
,LMM
,Polygon Zone Visualization
,Camera Focus
,Image Slicer
,Image Blur
,Dot Visualization
,Google Gemini
,Roboflow Dataset Upload
,Stability AI Inpainting
,Pixelate Visualization
,OpenAI
,Image Convert Grayscale
,Absolute Static Crop
,Stability AI Image Generation
,Webhook Sink
,Color Visualization
,Image Threshold
,Halo Visualization
,Polygon Visualization
,VLM as Classifier
,Dynamic Zone
,CogVLM
,Camera Calibration
,Email Notification
,Classification Label Visualization
,Single-Label Classification Model
,Llama 3.2 Vision
,Google Vision OCR
,Roboflow Dataset Upload
,Ellipse Visualization
,Size Measurement
,Bounding Box Visualization
,Object Detection Model
,Line Counter Visualization
,Image Preprocessing
,Trace Visualization
,Label Visualization
,Clip Comparison
,Local File Sink
,Image Slicer
,Anthropic Claude
,Crop Visualization
,OCR Model
,Relative Static Crop
,Model Comparison Visualization
,Stitch OCR Detections
,Perspective Correction
,OpenAI
,Mask Visualization
,Clip Comparison
,Dynamic Crop
,CSV Formatter
,Florence-2 Model
,Instance Segmentation Model
,Keypoint Detection Model
,Image Contours
,Buffer
,SIFT
,Reference Path Visualization
,Florence-2 Model
,Triangle Visualization
,Model Monitoring Inference Aggregator
,SIFT Comparison
,Keypoint Visualization
,Multi-Label Classification Model
,LMM For Classification
,Roboflow Custom Metadata
,Grid Visualization
,Stitch Images
,Dimension Collapse
,Blur Visualization
- outputs:
Corner Visualization
,Slack Notification
,VLM as Detector
,VLM as Classifier
,Polygon Zone Visualization
,Image Slicer
,Cache Set
,Path Deviation
,Dot Visualization
,Roboflow Dataset Upload
,Single-Label Classification Model
,Stability AI Image Generation
,Halo Visualization
,Detections Classes Replacement
,CogVLM
,Email Notification
,Object Detection Model
,Single-Label Classification Model
,Llama 3.2 Vision
,Byte Tracker
,Ellipse Visualization
,Size Measurement
,Pixel Color Count
,Bounding Box Visualization
,Line Counter Visualization
,Image Preprocessing
,Label Visualization
,Local File Sink
,Image Slicer
,Anthropic Claude
,Crop Visualization
,Detections Stitch
,Relative Static Crop
,Model Comparison Visualization
,Path Deviation
,Clip Comparison
,Time in Zone
,Florence-2 Model
,Keypoint Detection Model
,Buffer
,Florence-2 Model
,Multi-Label Classification Model
,Keypoint Visualization
,Identify Changes
,Multi-Label Classification Model
,Roboflow Custom Metadata
,Line Counter
,Stitch Images
,Circle Visualization
,Background Color Visualization
,Twilio SMS Notification
,LMM
,Image Blur
,Google Gemini
,Stability AI Inpainting
,Line Counter
,OpenAI
,Detections Consensus
,Distance Measurement
,Webhook Sink
,Color Visualization
,Image Threshold
,Polygon Visualization
,VLM as Classifier
,Instance Segmentation Model
,Classification Label Visualization
,Google Vision OCR
,Roboflow Dataset Upload
,Cache Get
,Object Detection Model
,Keypoint Detection Model
,Trace Visualization
,Clip Comparison
,Identify Outliers
,YOLO-World Model
,Perspective Correction
,OpenAI
,Byte Tracker
,Mask Visualization
,Time in Zone
,Dynamic Crop
,Template Matching
,Byte Tracker
,Instance Segmentation Model
,Reference Path Visualization
,CLIP Embedding Model
,Triangle Visualization
,Model Monitoring Inference Aggregator
,SIFT Comparison
,VLM as Detector
,LMM For Classification
,Grid Visualization
,Segment Anything 2 Model
,Detections Stabilizer
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison
in version v2
has.
Bindings
-
input
images
(image
): The image to infer on..classes
(list_of_values
): List of classes to calculate similarity against each input image.version
(string
): Variant of CLIP model.
-
output
similarities
(list_of_values
): List of values of any type.max_similarity
(float_zero_to_one
):float
value in range[0.0, 1.0]
.most_similar_class
(string
): String value.min_similarity
(float_zero_to_one
):float
value in range[0.0, 1.0]
.least_similar_class
(string
): String value.classification_predictions
(classification_prediction
): Predictions from classifier.parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.
Example JSON definition of step Clip Comparison
in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v2",
"images": "$inputs.image",
"classes": [
"a",
"b",
"c"
],
"version": "ViT-B-16"
}
v1¶
Class: ClipComparisonBlockV1
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.clip_comparison.v1.ClipComparisonBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Use the OpenAI CLIP zero-shot classification model to classify images.
This block accepts an image and a list of text prompts. The block then returns the similarity of each text label to the provided image.
This block is useful for classifying images without having to train a fine-tuned classification model. For example, you could use CLIP to classify the type of vehicle in an image, or if an image contains NSFW material.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/clip_comparison@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
texts |
List[str] |
List of texts to calculate similarity against each input image. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Clip Comparison
in version v1
.
- inputs:
Circle Visualization
,Background Color Visualization
,Corner Visualization
,Bounding Box Visualization
,Line Counter Visualization
,Image Preprocessing
,Trace Visualization
,Label Visualization
,Clip Comparison
,Polygon Zone Visualization
,Camera Focus
,Image Slicer
,Image Slicer
,Image Blur
,Anthropic Claude
,Crop Visualization
,Dot Visualization
,Google Gemini
,Relative Static Crop
,Model Comparison Visualization
,Stability AI Inpainting
,Dimension Collapse
,Pixelate Visualization
,Perspective Correction
,OpenAI
,Image Convert Grayscale
,Absolute Static Crop
,Mask Visualization
,Stability AI Image Generation
,Color Visualization
,Image Threshold
,Clip Comparison
,Dynamic Crop
,Halo Visualization
,Polygon Visualization
,Florence-2 Model
,Image Contours
,Dynamic Zone
,Buffer
,Camera Calibration
,SIFT
,Reference Path Visualization
,Florence-2 Model
,Classification Label Visualization
,Triangle Visualization
,SIFT Comparison
,Llama 3.2 Vision
,Keypoint Visualization
,Grid Visualization
,Ellipse Visualization
,Stitch Images
,Size Measurement
,Blur Visualization
- outputs:
Circle Visualization
,Corner Visualization
,Bounding Box Visualization
,Object Detection Model
,Line Counter Visualization
,VLM as Detector
,VLM as Classifier
,Keypoint Detection Model
,Trace Visualization
,Label Visualization
,Clip Comparison
,Polygon Zone Visualization
,Line Counter
,Cache Set
,Anthropic Claude
,Crop Visualization
,Path Deviation
,Dot Visualization
,YOLO-World Model
,Google Gemini
,Line Counter
,Perspective Correction
,OpenAI
,Detections Consensus
,Path Deviation
,Mask Visualization
,Time in Zone
,Webhook Sink
,Color Visualization
,Clip Comparison
,Time in Zone
,Halo Visualization
,Polygon Visualization
,Florence-2 Model
,VLM as Classifier
,Instance Segmentation Model
,Keypoint Detection Model
,Buffer
,Instance Segmentation Model
,Email Notification
,Florence-2 Model
,Reference Path Visualization
,Classification Label Visualization
,Triangle Visualization
,Object Detection Model
,Llama 3.2 Vision
,VLM as Detector
,LMM For Classification
,Grid Visualization
,Ellipse Visualization
,Size Measurement
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Clip Comparison
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..texts
(list_of_values
): List of texts to calculate similarity against each input image.
-
output
similarity
(list_of_values
): List of values of any type.parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.prediction_type
(prediction_type
): String value with type of prediction.
Example JSON definition of step Clip Comparison
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/clip_comparison@v1",
"images": "$inputs.image",
"texts": [
"a",
"b",
"c"
]
}