OpenAI¶
v3¶
Class: OpenAIBlockV3
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v3.OpenAIBlockV3
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained
) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr
) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering
) - Model answers the question you submit in the prompt -
Captioning (short) (
caption
) - Model provides a short description of the image -
Captioning (
detailed-caption
) - Model provides a long description of the image -
Single-Label Classification (
classification
) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification
) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering
) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account
(or
rf_key:user:<id>
) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/open_ai@v3
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI
in version v3
.
- inputs:
Image Contours
,Stability AI Inpainting
,Corner Visualization
,CSV Formatter
,Google Gemini
,Line Counter Visualization
,Reference Path Visualization
,Keypoint Detection Model
,Model Monitoring Inference Aggregator
,Florence-2 Model
,Circle Visualization
,OCR Model
,Llama 3.2 Vision
,Relative Static Crop
,Dimension Collapse
,Roboflow Dataset Upload
,Dynamic Zone
,Image Convert Grayscale
,Pixelate Visualization
,Model Comparison Visualization
,Trace Visualization
,LMM
,Twilio SMS Notification
,Roboflow Dataset Upload
,Depth Estimation
,Label Visualization
,Classification Label Visualization
,Blur Visualization
,OpenAI
,Color Visualization
,Bounding Box Visualization
,Anthropic Claude
,Ellipse Visualization
,Instance Segmentation Model
,Polygon Zone Visualization
,Object Detection Model
,Roboflow Custom Metadata
,Gaze Detection
,Image Slicer
,Image Slicer
,Crop Visualization
,Perspective Correction
,Halo Visualization
,Dot Visualization
,Mask Visualization
,Keypoint Visualization
,Local File Sink
,Absolute Static Crop
,Stitch OCR Detections
,Clip Comparison
,Image Blur
,OpenAI
,Cosine Similarity
,VLM as Classifier
,Clip Comparison
,Identify Changes
,Triangle Visualization
,Background Color Visualization
,SIFT Comparison
,Florence-2 Model
,Camera Calibration
,Google Vision OCR
,Image Threshold
,Single-Label Classification Model
,Buffer
,Image Preprocessing
,OpenAI
,CogVLM
,Slack Notification
,VLM as Detector
,Stability AI Image Generation
,SIFT
,Grid Visualization
,Camera Focus
,Stitch Images
,Stability AI Outpainting
,Size Measurement
,Polygon Visualization
,Multi-Label Classification Model
,Webhook Sink
,Dynamic Crop
,Email Notification
,LMM For Classification
- outputs:
Line Counter
,Stability AI Inpainting
,Detections Consensus
,Corner Visualization
,Time in Zone
,Google Gemini
,Line Counter Visualization
,Reference Path Visualization
,Keypoint Detection Model
,Model Monitoring Inference Aggregator
,Florence-2 Model
,YOLO-World Model
,Circle Visualization
,Llama 3.2 Vision
,Roboflow Dataset Upload
,JSON Parser
,PTZ Tracking (ONVIF)
.md),Detections Classes Replacement
,Object Detection Model
,Model Comparison Visualization
,Trace Visualization
,Detections Stitch
,Path Deviation
,Twilio SMS Notification
,LMM
,Roboflow Dataset Upload
,Label Visualization
,Classification Label Visualization
,Perception Encoder Embedding Model
,OpenAI
,Color Visualization
,Bounding Box Visualization
,Anthropic Claude
,Instance Segmentation Model
,Ellipse Visualization
,Pixel Color Count
,Polygon Zone Visualization
,VLM as Classifier
,Object Detection Model
,Roboflow Custom Metadata
,VLM as Detector
,Cache Get
,Crop Visualization
,Halo Visualization
,Perspective Correction
,Dot Visualization
,Mask Visualization
,Cache Set
,Keypoint Visualization
,Local File Sink
,Line Counter
,Clip Comparison
,Image Blur
,OpenAI
,VLM as Classifier
,Path Deviation
,Clip Comparison
,Instance Segmentation Model
,Distance Measurement
,Triangle Visualization
,Segment Anything 2 Model
,Background Color Visualization
,CLIP Embedding Model
,SIFT Comparison
,Google Vision OCR
,Florence-2 Model
,Image Threshold
,Buffer
,Image Preprocessing
,OpenAI
,CogVLM
,Slack Notification
,VLM as Detector
,Keypoint Detection Model
,Stability AI Image Generation
,Grid Visualization
,Stability AI Outpainting
,Size Measurement
,Polygon Visualization
,Webhook Sink
,Dynamic Crop
,Time in Zone
,Email Notification
,LMM For Classification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI
in version v3
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Text prompt to the OpenAI model.classes
(list_of_values
): List of classes to be used.api_key
(Union[secret
,ROBOFLOW_MANAGED_KEY
,string
]): Your OpenAI API key.model_version
(string
): Model to be used.image_detail
(string
): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature
(float
): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output
(Union[string
,language_model_output
]): String value ifstring
or LLM / VLM output iflanguage_model_output
.classes
(list_of_values
): List of values of any type.
Example JSON definition of step OpenAI
in version v3
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v3",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-4o",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v2¶
Class: OpenAIBlockV2
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v2.OpenAIBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained
) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr
) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering
) - Model answers the question you submit in the prompt -
Captioning (short) (
caption
) - Model provides a short description of the image -
Captioning (
detailed-caption
) - Model provides a long description of the image -
Single-Label Classification (
classification
) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification
) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering
) - Model returns a JSON response with the specified fields
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/open_ai@v2
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI
in version v2
.
- inputs:
Image Contours
,Stability AI Inpainting
,Corner Visualization
,CSV Formatter
,Google Gemini
,Line Counter Visualization
,Reference Path Visualization
,Keypoint Detection Model
,Model Monitoring Inference Aggregator
,Florence-2 Model
,Circle Visualization
,OCR Model
,Llama 3.2 Vision
,Relative Static Crop
,Dimension Collapse
,Roboflow Dataset Upload
,Dynamic Zone
,Image Convert Grayscale
,Pixelate Visualization
,Model Comparison Visualization
,Trace Visualization
,LMM
,Twilio SMS Notification
,Roboflow Dataset Upload
,Depth Estimation
,Label Visualization
,Classification Label Visualization
,Blur Visualization
,OpenAI
,Color Visualization
,Bounding Box Visualization
,Anthropic Claude
,Ellipse Visualization
,Instance Segmentation Model
,Polygon Zone Visualization
,Object Detection Model
,Roboflow Custom Metadata
,Gaze Detection
,Image Slicer
,Image Slicer
,Crop Visualization
,Perspective Correction
,Halo Visualization
,Dot Visualization
,Mask Visualization
,Keypoint Visualization
,Local File Sink
,Absolute Static Crop
,Stitch OCR Detections
,Clip Comparison
,Image Blur
,OpenAI
,Cosine Similarity
,VLM as Classifier
,Clip Comparison
,Identify Changes
,Triangle Visualization
,Background Color Visualization
,SIFT Comparison
,Florence-2 Model
,Camera Calibration
,Google Vision OCR
,Image Threshold
,Single-Label Classification Model
,Buffer
,Image Preprocessing
,OpenAI
,CogVLM
,Slack Notification
,VLM as Detector
,Stability AI Image Generation
,SIFT
,Grid Visualization
,Camera Focus
,Stitch Images
,Stability AI Outpainting
,Size Measurement
,Polygon Visualization
,Multi-Label Classification Model
,Webhook Sink
,Dynamic Crop
,Email Notification
,LMM For Classification
- outputs:
Line Counter
,Stability AI Inpainting
,Detections Consensus
,Corner Visualization
,Time in Zone
,Google Gemini
,Line Counter Visualization
,Reference Path Visualization
,Keypoint Detection Model
,Model Monitoring Inference Aggregator
,Florence-2 Model
,YOLO-World Model
,Circle Visualization
,Llama 3.2 Vision
,Roboflow Dataset Upload
,JSON Parser
,PTZ Tracking (ONVIF)
.md),Detections Classes Replacement
,Object Detection Model
,Model Comparison Visualization
,Trace Visualization
,Detections Stitch
,Path Deviation
,Twilio SMS Notification
,LMM
,Roboflow Dataset Upload
,Label Visualization
,Classification Label Visualization
,Perception Encoder Embedding Model
,OpenAI
,Color Visualization
,Bounding Box Visualization
,Anthropic Claude
,Instance Segmentation Model
,Ellipse Visualization
,Pixel Color Count
,Polygon Zone Visualization
,VLM as Classifier
,Object Detection Model
,Roboflow Custom Metadata
,VLM as Detector
,Cache Get
,Crop Visualization
,Halo Visualization
,Perspective Correction
,Dot Visualization
,Mask Visualization
,Cache Set
,Keypoint Visualization
,Local File Sink
,Line Counter
,Clip Comparison
,Image Blur
,OpenAI
,VLM as Classifier
,Path Deviation
,Clip Comparison
,Instance Segmentation Model
,Distance Measurement
,Triangle Visualization
,Segment Anything 2 Model
,Background Color Visualization
,CLIP Embedding Model
,SIFT Comparison
,Google Vision OCR
,Florence-2 Model
,Image Threshold
,Buffer
,Image Preprocessing
,OpenAI
,CogVLM
,Slack Notification
,VLM as Detector
,Keypoint Detection Model
,Stability AI Image Generation
,Grid Visualization
,Stability AI Outpainting
,Size Measurement
,Polygon Visualization
,Webhook Sink
,Dynamic Crop
,Time in Zone
,Email Notification
,LMM For Classification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI
in version v2
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Text prompt to the OpenAI model.classes
(list_of_values
): List of classes to be used.api_key
(Union[secret
,string
]): Your OpenAI API key.model_version
(string
): Model to be used.image_detail
(string
): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature
(float
): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output
(Union[string
,language_model_output
]): String value ifstring
or LLM / VLM output iflanguage_model_output
.classes
(list_of_values
): List of values of any type.
Example JSON definition of step OpenAI
in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v2",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-4o",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v1¶
Class: OpenAIBlockV1
(there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v1.OpenAIBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts to the OpenAIBlock.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/open_ai@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
openai_api_key |
str |
Your OpenAI API key. | ✅ |
openai_model |
str |
Model to be used. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI
in version v1
.
- inputs:
Image Contours
,Stability AI Inpainting
,Corner Visualization
,CSV Formatter
,Google Gemini
,Line Counter Visualization
,Reference Path Visualization
,Keypoint Detection Model
,Model Monitoring Inference Aggregator
,Florence-2 Model
,Circle Visualization
,OCR Model
,Llama 3.2 Vision
,Relative Static Crop
,Roboflow Dataset Upload
,Image Convert Grayscale
,Pixelate Visualization
,Model Comparison Visualization
,Trace Visualization
,LMM
,Twilio SMS Notification
,Roboflow Dataset Upload
,Depth Estimation
,Label Visualization
,Classification Label Visualization
,Blur Visualization
,OpenAI
,Color Visualization
,Bounding Box Visualization
,Anthropic Claude
,Ellipse Visualization
,Instance Segmentation Model
,Polygon Zone Visualization
,Object Detection Model
,Roboflow Custom Metadata
,Image Slicer
,Image Slicer
,Crop Visualization
,Perspective Correction
,Halo Visualization
,Dot Visualization
,Mask Visualization
,Keypoint Visualization
,Local File Sink
,Absolute Static Crop
,Stitch OCR Detections
,Image Blur
,OpenAI
,VLM as Classifier
,Clip Comparison
,Triangle Visualization
,Background Color Visualization
,SIFT Comparison
,Florence-2 Model
,Camera Calibration
,Google Vision OCR
,Image Threshold
,Single-Label Classification Model
,Image Preprocessing
,OpenAI
,CogVLM
,Slack Notification
,VLM as Detector
,Stability AI Image Generation
,SIFT
,Grid Visualization
,Camera Focus
,Stitch Images
,Stability AI Outpainting
,Polygon Visualization
,Multi-Label Classification Model
,Webhook Sink
,Dynamic Crop
,Email Notification
,LMM For Classification
- outputs:
Image Contours
,Line Counter
,Stability AI Inpainting
,Corner Visualization
,CSV Formatter
,Google Gemini
,Model Monitoring Inference Aggregator
,Florence-2 Model
,Expression
,YOLO-World Model
,SIFT Comparison
,Property Definition
,JSON Parser
,PTZ Tracking (ONVIF)
.md),Image Convert Grayscale
,Byte Tracker
,Pixelate Visualization
,Dominant Color
,Model Comparison Visualization
,Trace Visualization
,Detections Stitch
,Path Deviation
,Twilio SMS Notification
,Velocity
,Bounding Rectangle
,Label Visualization
,Classification Label Visualization
,Blur Visualization
,Moondream2
,Bounding Box Visualization
,Template Matching
,Anthropic Claude
,Pixel Color Count
,Detection Offset
,Polygon Zone Visualization
,VLM as Classifier
,Gaze Detection
,Image Slicer
,Image Slicer
,Cache Get
,Crop Visualization
,Perspective Correction
,Dot Visualization
,QR Code Detection
,Mask Visualization
,Local File Sink
,Absolute Static Crop
,Image Blur
,OpenAI
,Path Deviation
,Detections Merge
,Distance Measurement
,Triangle Visualization
,Background Color Visualization
,CLIP Embedding Model
,SIFT Comparison
,Florence-2 Model
,Camera Calibration
,Image Threshold
,Single-Label Classification Model
,Buffer
,OpenAI
,Rate Limiter
,CogVLM
,Keypoint Detection Model
,SIFT
,Grid Visualization
,Stitch Images
,Identify Outliers
,Stability AI Outpainting
,Size Measurement
,Polygon Visualization
,Barcode Detection
,Byte Tracker
,Delta Filter
,LMM For Classification
,Detections Consensus
,Time in Zone
,Line Counter Visualization
,Reference Path Visualization
,Keypoint Detection Model
,Continue If
,Circle Visualization
,Llama 3.2 Vision
,OCR Model
,Relative Static Crop
,Dimension Collapse
,Roboflow Dataset Upload
,Dynamic Zone
,Detections Classes Replacement
,Multi-Label Classification Model
,Object Detection Model
,LMM
,Roboflow Dataset Upload
,Depth Estimation
,Perception Encoder Embedding Model
,OpenAI
,Color Visualization
,Instance Segmentation Model
,Ellipse Visualization
,Detections Transformation
,Object Detection Model
,Roboflow Custom Metadata
,VLM as Detector
,Halo Visualization
,Cache Set
,Keypoint Visualization
,Detections Stabilizer
,SmolVLM2
,Line Counter
,Stitch OCR Detections
,Clip Comparison
,Cosine Similarity
,VLM as Classifier
,Qwen2.5-VL
,Clip Comparison
,Byte Tracker
,Identify Changes
,Instance Segmentation Model
,Data Aggregator
,Overlap Filter
,Segment Anything 2 Model
,Google Vision OCR
,Image Preprocessing
,Slack Notification
,VLM as Detector
,Stability AI Image Generation
,First Non Empty Or Default
,Camera Focus
,Multi-Label Classification Model
,Webhook Sink
,Dynamic Crop
,Time in Zone
,Email Notification
,Single-Label Classification Model
,Detections Filter
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Text prompt to the OpenAI model.openai_api_key
(Union[secret
,string
]): Your OpenAI API key.openai_model
(string
): Model to be used.image_detail
(string
): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step OpenAI
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"openai_api_key": "xxx-xxx",
"openai_model": "gpt-4o",
"json_output_format": {
"count": "number of cats in the picture"
},
"image_detail": "auto",
"max_tokens": 450
}