OpenAI¶
v4¶
Class: OpenAIBlockV4 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v4.OpenAIBlockV4
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Unprompted Object Detection (
object-detection) - Model detects and returns the bounding boxes for prominent objects in the image -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v4to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
reasoning_effort |
str |
Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'.. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in its response. If not specified, the model will use its default limit. Minimum value is 16.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v4.
- inputs:
Triangle Visualization,Morphological Transformation,Roboflow Dataset Upload,Ellipse Visualization,LMM,Florence-2 Model,Blur Visualization,Halo Visualization,Anthropic Claude,Google Gemini,Camera Focus,Llama 3.2 Vision,Motion Detection,Model Comparison Visualization,Keypoint Visualization,Pixelate Visualization,Size Measurement,Image Slicer,Stitch OCR Detections,Roboflow Dataset Upload,Line Counter Visualization,Label Visualization,SIFT Comparison,QR Code Generator,Dynamic Zone,Email Notification,Clip Comparison,Buffer,Slack Notification,Corner Visualization,Image Slicer,Florence-2 Model,CSV Formatter,EasyOCR,Object Detection Model,Anthropic Claude,OpenAI,Google Gemini,Bounding Box Visualization,Keypoint Detection Model,Anthropic Claude,Background Subtraction,Background Color Visualization,Image Convert Grayscale,Camera Calibration,Polygon Visualization,Image Blur,VLM As Classifier,Relative Static Crop,Clip Comparison,Heatmap Visualization,CogVLM,Mask Visualization,Image Preprocessing,Twilio SMS Notification,VLM As Detector,OpenAI,OCR Model,SIFT,Stitch Images,Stability AI Outpainting,Stitch OCR Detections,Dynamic Crop,Model Monitoring Inference Aggregator,Circle Visualization,Color Visualization,Trace Visualization,OpenAI,Dimension Collapse,Icon Visualization,Dot Visualization,Cosine Similarity,Email Notification,Instance Segmentation Model,Camera Focus,Twilio SMS/MMS Notification,Depth Estimation,Contrast Equalization,LMM For Classification,Roboflow Custom Metadata,Grid Visualization,Text Display,Reference Path Visualization,Image Threshold,Perspective Correction,Image Contours,Polygon Zone Visualization,Multi-Label Classification Model,Polygon Visualization,Local File Sink,Identify Changes,Halo Visualization,Google Vision OCR,Stability AI Inpainting,Crop Visualization,Google Gemini,Detections List Roll-Up,Webhook Sink,Absolute Static Crop,Classification Label Visualization,OpenAI,Single-Label Classification Model,Gaze Detection,Stability AI Image Generation - outputs:
Triangle Visualization,Detections Stitch,Detections Classes Replacement,Ellipse Visualization,Florence-2 Model,Anthropic Claude,Google Gemini,Motion Detection,Keypoint Visualization,Size Measurement,Line Counter,Keypoint Detection Model,Distance Measurement,Clip Comparison,SAM 3,Object Detection Model,Anthropic Claude,Google Gemini,Perception Encoder Embedding Model,Pixel Color Count,Background Color Visualization,Time in Zone,Image Preprocessing,VLM As Detector,PTZ Tracking (ONVIF).md),Stability AI Outpainting,Moondream2,Stitch OCR Detections,Trace Visualization,OpenAI,YOLO-World Model,Icon Visualization,Dot Visualization,Cache Set,Time in Zone,Email Notification,Instance Segmentation Model,Path Deviation,Contrast Equalization,Segment Anything 2 Model,Text Display,Reference Path Visualization,Image Threshold,Perspective Correction,Local File Sink,Google Vision OCR,Crop Visualization,Google Gemini,Detections List Roll-Up,Webhook Sink,Classification Label Visualization,VLM As Classifier,OpenAI,Seg Preview,Stability AI Image Generation,Roboflow Dataset Upload,Morphological Transformation,LMM,Cache Get,CLIP Embedding Model,Halo Visualization,Llama 3.2 Vision,Model Comparison Visualization,VLM As Detector,Stitch OCR Detections,Roboflow Dataset Upload,Line Counter Visualization,Label Visualization,QR Code Generator,SIFT Comparison,Email Notification,Buffer,Slack Notification,Object Detection Model,Path Deviation,Florence-2 Model,Corner Visualization,SAM 3,OpenAI,Bounding Box Visualization,Keypoint Detection Model,Anthropic Claude,Polygon Visualization,Image Blur,VLM As Classifier,Clip Comparison,Heatmap Visualization,CogVLM,Mask Visualization,Twilio SMS Notification,Instance Segmentation Model,OpenAI,Dynamic Crop,Model Monitoring Inference Aggregator,Circle Visualization,Color Visualization,Twilio SMS/MMS Notification,Depth Estimation,Roboflow Custom Metadata,LMM For Classification,Grid Visualization,Polygon Zone Visualization,Polygon Visualization,Halo Visualization,JSON Parser,SAM 3,Stability AI Inpainting,Time in Zone,Detections Consensus,Line Counter
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v4 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string,ROBOFLOW_MANAGED_KEY]): Your OpenAI API key.model_version(string): Model to be used.reasoning_effort(string): Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'..image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v4
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v4",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5.1",
"reasoning_effort": "<block_does_not_provide_example>",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v3¶
Class: OpenAIBlockV3 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v3.OpenAIBlockV3
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v3to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v3.
- inputs:
Triangle Visualization,Morphological Transformation,Roboflow Dataset Upload,Ellipse Visualization,LMM,Florence-2 Model,Blur Visualization,Halo Visualization,Anthropic Claude,Google Gemini,Camera Focus,Llama 3.2 Vision,Motion Detection,Model Comparison Visualization,Keypoint Visualization,Pixelate Visualization,Size Measurement,Image Slicer,Stitch OCR Detections,Roboflow Dataset Upload,Line Counter Visualization,Label Visualization,SIFT Comparison,QR Code Generator,Dynamic Zone,Email Notification,Clip Comparison,Buffer,Slack Notification,Corner Visualization,Image Slicer,Florence-2 Model,CSV Formatter,EasyOCR,Object Detection Model,Anthropic Claude,OpenAI,Google Gemini,Bounding Box Visualization,Keypoint Detection Model,Anthropic Claude,Background Subtraction,Background Color Visualization,Image Convert Grayscale,Camera Calibration,Polygon Visualization,Image Blur,VLM As Classifier,Relative Static Crop,Clip Comparison,Heatmap Visualization,CogVLM,Mask Visualization,Image Preprocessing,Twilio SMS Notification,VLM As Detector,OpenAI,OCR Model,SIFT,Stitch Images,Stability AI Outpainting,Stitch OCR Detections,Dynamic Crop,Model Monitoring Inference Aggregator,Circle Visualization,Color Visualization,Trace Visualization,OpenAI,Dimension Collapse,Icon Visualization,Dot Visualization,Cosine Similarity,Email Notification,Instance Segmentation Model,Camera Focus,Twilio SMS/MMS Notification,Depth Estimation,Contrast Equalization,LMM For Classification,Roboflow Custom Metadata,Grid Visualization,Text Display,Reference Path Visualization,Image Threshold,Perspective Correction,Image Contours,Polygon Zone Visualization,Multi-Label Classification Model,Polygon Visualization,Local File Sink,Identify Changes,Halo Visualization,Google Vision OCR,Stability AI Inpainting,Crop Visualization,Google Gemini,Detections List Roll-Up,Webhook Sink,Absolute Static Crop,Classification Label Visualization,OpenAI,Single-Label Classification Model,Gaze Detection,Stability AI Image Generation - outputs:
Triangle Visualization,Detections Stitch,Detections Classes Replacement,Ellipse Visualization,Florence-2 Model,Anthropic Claude,Google Gemini,Motion Detection,Keypoint Visualization,Size Measurement,Line Counter,Keypoint Detection Model,Distance Measurement,Clip Comparison,SAM 3,Object Detection Model,Anthropic Claude,Google Gemini,Perception Encoder Embedding Model,Pixel Color Count,Background Color Visualization,Time in Zone,Image Preprocessing,VLM As Detector,PTZ Tracking (ONVIF).md),Stability AI Outpainting,Moondream2,Stitch OCR Detections,Trace Visualization,OpenAI,YOLO-World Model,Icon Visualization,Dot Visualization,Cache Set,Time in Zone,Email Notification,Instance Segmentation Model,Path Deviation,Contrast Equalization,Segment Anything 2 Model,Text Display,Reference Path Visualization,Image Threshold,Perspective Correction,Local File Sink,Google Vision OCR,Crop Visualization,Google Gemini,Detections List Roll-Up,Webhook Sink,Classification Label Visualization,VLM As Classifier,OpenAI,Seg Preview,Stability AI Image Generation,Roboflow Dataset Upload,Morphological Transformation,LMM,Cache Get,CLIP Embedding Model,Halo Visualization,Llama 3.2 Vision,Model Comparison Visualization,VLM As Detector,Stitch OCR Detections,Roboflow Dataset Upload,Line Counter Visualization,Label Visualization,QR Code Generator,SIFT Comparison,Email Notification,Buffer,Slack Notification,Object Detection Model,Path Deviation,Florence-2 Model,Corner Visualization,SAM 3,OpenAI,Bounding Box Visualization,Keypoint Detection Model,Anthropic Claude,Polygon Visualization,Image Blur,VLM As Classifier,Clip Comparison,Heatmap Visualization,CogVLM,Mask Visualization,Twilio SMS Notification,Instance Segmentation Model,OpenAI,Dynamic Crop,Model Monitoring Inference Aggregator,Circle Visualization,Color Visualization,Twilio SMS/MMS Notification,Depth Estimation,Roboflow Custom Metadata,LMM For Classification,Grid Visualization,Polygon Zone Visualization,Polygon Visualization,Halo Visualization,JSON Parser,SAM 3,Stability AI Inpainting,Time in Zone,Detections Consensus,Line Counter
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v3 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string,ROBOFLOW_MANAGED_KEY]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v3
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v3",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v2¶
Class: OpenAIBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v2.OpenAIBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-4o and GPT-5).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v2.
- inputs:
Triangle Visualization,Morphological Transformation,Roboflow Dataset Upload,Ellipse Visualization,LMM,Florence-2 Model,Blur Visualization,Halo Visualization,Anthropic Claude,Google Gemini,Camera Focus,Llama 3.2 Vision,Motion Detection,Model Comparison Visualization,Keypoint Visualization,Pixelate Visualization,Size Measurement,Image Slicer,Stitch OCR Detections,Roboflow Dataset Upload,Line Counter Visualization,Label Visualization,SIFT Comparison,QR Code Generator,Dynamic Zone,Email Notification,Clip Comparison,Buffer,Slack Notification,Corner Visualization,Image Slicer,Florence-2 Model,CSV Formatter,EasyOCR,Object Detection Model,Anthropic Claude,OpenAI,Google Gemini,Bounding Box Visualization,Keypoint Detection Model,Anthropic Claude,Background Subtraction,Background Color Visualization,Image Convert Grayscale,Camera Calibration,Polygon Visualization,Image Blur,VLM As Classifier,Relative Static Crop,Clip Comparison,Heatmap Visualization,CogVLM,Mask Visualization,Image Preprocessing,Twilio SMS Notification,VLM As Detector,OpenAI,OCR Model,SIFT,Stitch Images,Stability AI Outpainting,Stitch OCR Detections,Dynamic Crop,Model Monitoring Inference Aggregator,Circle Visualization,Color Visualization,Trace Visualization,OpenAI,Dimension Collapse,Icon Visualization,Dot Visualization,Cosine Similarity,Email Notification,Instance Segmentation Model,Camera Focus,Twilio SMS/MMS Notification,Depth Estimation,Contrast Equalization,LMM For Classification,Roboflow Custom Metadata,Grid Visualization,Text Display,Reference Path Visualization,Image Threshold,Perspective Correction,Image Contours,Polygon Zone Visualization,Multi-Label Classification Model,Polygon Visualization,Local File Sink,Identify Changes,Halo Visualization,Google Vision OCR,Stability AI Inpainting,Crop Visualization,Google Gemini,Detections List Roll-Up,Webhook Sink,Absolute Static Crop,Classification Label Visualization,OpenAI,Single-Label Classification Model,Gaze Detection,Stability AI Image Generation - outputs:
Triangle Visualization,Detections Stitch,Detections Classes Replacement,Ellipse Visualization,Florence-2 Model,Anthropic Claude,Google Gemini,Motion Detection,Keypoint Visualization,Size Measurement,Line Counter,Keypoint Detection Model,Distance Measurement,Clip Comparison,SAM 3,Object Detection Model,Anthropic Claude,Google Gemini,Perception Encoder Embedding Model,Pixel Color Count,Background Color Visualization,Time in Zone,Image Preprocessing,VLM As Detector,PTZ Tracking (ONVIF).md),Stability AI Outpainting,Moondream2,Stitch OCR Detections,Trace Visualization,OpenAI,YOLO-World Model,Icon Visualization,Dot Visualization,Cache Set,Time in Zone,Email Notification,Instance Segmentation Model,Path Deviation,Contrast Equalization,Segment Anything 2 Model,Text Display,Reference Path Visualization,Image Threshold,Perspective Correction,Local File Sink,Google Vision OCR,Crop Visualization,Google Gemini,Detections List Roll-Up,Webhook Sink,Classification Label Visualization,VLM As Classifier,OpenAI,Seg Preview,Stability AI Image Generation,Roboflow Dataset Upload,Morphological Transformation,LMM,Cache Get,CLIP Embedding Model,Halo Visualization,Llama 3.2 Vision,Model Comparison Visualization,VLM As Detector,Stitch OCR Detections,Roboflow Dataset Upload,Line Counter Visualization,Label Visualization,QR Code Generator,SIFT Comparison,Email Notification,Buffer,Slack Notification,Object Detection Model,Path Deviation,Florence-2 Model,Corner Visualization,SAM 3,OpenAI,Bounding Box Visualization,Keypoint Detection Model,Anthropic Claude,Polygon Visualization,Image Blur,VLM As Classifier,Clip Comparison,Heatmap Visualization,CogVLM,Mask Visualization,Twilio SMS Notification,Instance Segmentation Model,OpenAI,Dynamic Crop,Model Monitoring Inference Aggregator,Circle Visualization,Color Visualization,Twilio SMS/MMS Notification,Depth Estimation,Roboflow Custom Metadata,LMM For Classification,Grid Visualization,Polygon Zone Visualization,Polygon Visualization,Halo Visualization,JSON Parser,SAM 3,Stability AI Inpainting,Time in Zone,Detections Consensus,Line Counter
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v2 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v2",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-4o",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v1¶
Class: OpenAIBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v1.OpenAIBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts to the OpenAIBlock.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
openai_api_key |
str |
Your OpenAI API key. | ✅ |
openai_model |
str |
Model to be used. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v1.
- inputs:
Triangle Visualization,Morphological Transformation,Roboflow Dataset Upload,Ellipse Visualization,LMM,Florence-2 Model,Blur Visualization,Halo Visualization,Anthropic Claude,Google Gemini,Camera Focus,Llama 3.2 Vision,Model Comparison Visualization,Keypoint Visualization,Pixelate Visualization,Image Slicer,Stitch OCR Detections,Roboflow Dataset Upload,Line Counter Visualization,Label Visualization,SIFT Comparison,QR Code Generator,Email Notification,Slack Notification,Corner Visualization,Image Slicer,Florence-2 Model,CSV Formatter,EasyOCR,Object Detection Model,Anthropic Claude,OpenAI,Google Gemini,Bounding Box Visualization,Keypoint Detection Model,Anthropic Claude,Background Subtraction,Background Color Visualization,Image Convert Grayscale,Camera Calibration,Polygon Visualization,Image Blur,VLM As Classifier,Relative Static Crop,Clip Comparison,Heatmap Visualization,CogVLM,Mask Visualization,Image Preprocessing,Twilio SMS Notification,VLM As Detector,OpenAI,OCR Model,SIFT,Stitch Images,Stability AI Outpainting,Stitch OCR Detections,Dynamic Crop,Model Monitoring Inference Aggregator,Circle Visualization,Color Visualization,Trace Visualization,OpenAI,Icon Visualization,Dot Visualization,Email Notification,Instance Segmentation Model,Camera Focus,Twilio SMS/MMS Notification,Depth Estimation,Contrast Equalization,LMM For Classification,Roboflow Custom Metadata,Grid Visualization,Text Display,Reference Path Visualization,Image Threshold,Perspective Correction,Image Contours,Polygon Zone Visualization,Multi-Label Classification Model,Polygon Visualization,Local File Sink,Halo Visualization,Google Vision OCR,Stability AI Inpainting,Crop Visualization,Google Gemini,Webhook Sink,Absolute Static Crop,Classification Label Visualization,OpenAI,Single-Label Classification Model,Stability AI Image Generation - outputs:
Triangle Visualization,Detections Stitch,Detections Classes Replacement,Ellipse Visualization,Google Gemini,SIFT Comparison,Keypoint Detection Model,Dynamic Zone,Distance Measurement,CSV Formatter,Detections Combine,Perception Encoder Embedding Model,Image Convert Grayscale,Image Preprocessing,Byte Tracker,SIFT,Stability AI Outpainting,Moondream2,Single-Label Classification Model,Stitch OCR Detections,YOLO-World Model,Cache Set,Delta Filter,Email Notification,Instance Segmentation Model,Path Deviation,Camera Focus,Contrast Equalization,Detections Filter,Reference Path Visualization,Perspective Correction,Multi-Label Classification Model,Local File Sink,Identify Changes,Byte Tracker,Google Gemini,Barcode Detection,Webhook Sink,VLM As Classifier,OpenAI,Seg Preview,Roboflow Dataset Upload,Morphological Transformation,LMM,CLIP Embedding Model,Llama 3.2 Vision,Stitch OCR Detections,Line Counter Visualization,Label Visualization,QR Code Generator,SIFT Comparison,Email Notification,Buffer,Slack Notification,Detections Stabilizer,Object Detection Model,Path Deviation,SAM 3,Anthropic Claude,Background Subtraction,Image Blur,Relative Static Crop,Detections Merge,CogVLM,Mask Visualization,Twilio SMS Notification,Instance Segmentation Model,OpenAI,OCR Model,Dynamic Crop,Byte Tracker,First Non Empty Or Default,Dimension Collapse,Expression,Twilio SMS/MMS Notification,Depth Estimation,Dominant Color,SAM 3,Stability AI Inpainting,Identify Outliers,Detections Consensus,Template Matching,Absolute Static Crop,Line Counter,Rate Limiter,Florence-2 Model,Blur Visualization,Anthropic Claude,Motion Detection,Keypoint Visualization,Qwen3-VL,Pixelate Visualization,Size Measurement,Image Slicer,Line Counter,Clip Comparison,SAM 3,Image Slicer,EasyOCR,Object Detection Model,Anthropic Claude,Google Gemini,Pixel Color Count,Background Color Visualization,Property Definition,Camera Calibration,Time in Zone,VLM As Detector,PTZ Tracking (ONVIF).md),Detections Transformation,Data Aggregator,Detection Offset,Trace Visualization,OpenAI,Qwen2.5-VL,Icon Visualization,Dot Visualization,Time in Zone,Segment Anything 2 Model,Text Display,Image Threshold,Image Contours,Detection Event Log,Google Vision OCR,Crop Visualization,Detections List Roll-Up,Classification Label Visualization,Gaze Detection,Stability AI Image Generation,Cache Get,Halo Visualization,Camera Focus,Model Comparison Visualization,VLM As Detector,Roboflow Dataset Upload,SmolVLM2,Florence-2 Model,Corner Visualization,QR Code Detection,OpenAI,Bounding Box Visualization,Keypoint Detection Model,Polygon Visualization,VLM As Classifier,Clip Comparison,Heatmap Visualization,Stitch Images,Continue If,Model Monitoring Inference Aggregator,Circle Visualization,Color Visualization,Cosine Similarity,Velocity,Roboflow Custom Metadata,LMM For Classification,Bounding Rectangle,Grid Visualization,Polygon Zone Visualization,Polygon Visualization,Halo Visualization,JSON Parser,Time in Zone,Overlap Filter,Multi-Label Classification Model,Single-Label Classification Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.openai_api_key(Union[secret,string]): Your OpenAI API key.openai_model(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step OpenAI in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"openai_api_key": "xxx-xxx",
"openai_model": "gpt-4o",
"json_output_format": {
"count": "number of cats in the picture"
},
"image_detail": "auto",
"max_tokens": 450
}