OpenAI¶
v4¶
Class: OpenAIBlockV4 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v4.OpenAIBlockV4
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Unprompted Object Detection (
object-detection) - Model detects and returns the bounding boxes for prominent objects in the image -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v4to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
reasoning_effort |
str |
Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'.. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in its response. If not specified, the model will use its default limit. Minimum value is 16.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v4.
- inputs:
Anthropic Claude,Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Webhook Sink,Multi-Label Classification Model,Email Notification,Dynamic Zone,QR Code Generator,Dynamic Crop,VLM As Detector,Google Gemini,LMM,Image Blur,Corner Visualization,Image Convert Grayscale,Stability AI Outpainting,Halo Visualization,Stability AI Inpainting,Object Detection Model,Image Contours,Trace Visualization,Google Vision OCR,Morphological Transformation,Triangle Visualization,Clip Comparison,Relative Static Crop,CSV Formatter,Text Display,Stitch Images,Google Gemini,Camera Calibration,Grid Visualization,Local File Sink,Slack Notification,VLM As Classifier,Roboflow Dataset Upload,Camera Focus,Color Visualization,Dot Visualization,Image Slicer,Polygon Visualization,Anthropic Claude,LMM For Classification,Line Counter Visualization,Llama 3.2 Vision,Keypoint Detection Model,Buffer,Contrast Equalization,Identify Changes,SIFT Comparison,Dimension Collapse,Camera Focus,Background Subtraction,Image Slicer,Circle Visualization,Halo Visualization,Florence-2 Model,Blur Visualization,Label Visualization,Twilio SMS/MMS Notification,Clip Comparison,Email Notification,Ellipse Visualization,OpenAI,SIFT,Image Preprocessing,Model Monitoring Inference Aggregator,Single-Label Classification Model,Detections List Roll-Up,OpenAI,Image Threshold,Background Color Visualization,Model Comparison Visualization,Depth Estimation,OpenAI,Motion Detection,Size Measurement,Cosine Similarity,CogVLM,Absolute Static Crop,Roboflow Custom Metadata,Gaze Detection,EasyOCR,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Pixelate Visualization,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,Polygon Visualization,Twilio SMS Notification,Bounding Box Visualization,Polygon Zone Visualization,OCR Model,Icon Visualization,Crop Visualization,Stitch OCR Detections,Google Gemini,OpenAI,Florence-2 Model,Roboflow Dataset Upload - outputs:
Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Detections Consensus,Webhook Sink,Email Notification,QR Code Generator,VLM As Detector,LMM,SAM 3,Corner Visualization,Stability AI Outpainting,Segment Anything 2 Model,Halo Visualization,JSON Parser,Object Detection Model,Trace Visualization,Google Vision OCR,Instance Segmentation Model,Clip Comparison,Text Display,Google Gemini,Slack Notification,Local File Sink,VLM As Classifier,Roboflow Dataset Upload,PTZ Tracking (ONVIF).md),Color Visualization,Dot Visualization,Polygon Visualization,Object Detection Model,Anthropic Claude,Buffer,Contrast Equalization,Detections Classes Replacement,Perception Encoder Embedding Model,Moondream2,Halo Visualization,Florence-2 Model,Twilio SMS/MMS Notification,Label Visualization,Ellipse Visualization,OpenAI,Model Monitoring Inference Aggregator,Detections List Roll-Up,OpenAI,Image Threshold,Model Comparison Visualization,Background Color Visualization,Size Measurement,OpenAI,Keypoint Detection Model,Twilio SMS Notification,Polygon Visualization,SAM 3,Bounding Box Visualization,Icon Visualization,Time in Zone,Google Gemini,Florence-2 Model,Roboflow Dataset Upload,Anthropic Claude,Dynamic Crop,CLIP Embedding Model,VLM As Detector,Google Gemini,Path Deviation,Image Blur,Line Counter,Cache Set,Stability AI Inpainting,Path Deviation,Morphological Transformation,Triangle Visualization,Detections Stitch,Grid Visualization,Llama 3.2 Vision,Line Counter Visualization,LMM For Classification,Keypoint Detection Model,Distance Measurement,SIFT Comparison,Time in Zone,Circle Visualization,Seg Preview,Clip Comparison,Email Notification,Image Preprocessing,SAM 3,Depth Estimation,Cache Get,Line Counter,Time in Zone,CogVLM,Roboflow Custom Metadata,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,VLM As Classifier,Polygon Zone Visualization,YOLO-World Model,Stitch OCR Detections,Crop Visualization,Pixel Color Count,Motion Detection,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v4 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string,ROBOFLOW_MANAGED_KEY]): Your OpenAI API key.model_version(string): Model to be used.reasoning_effort(string): Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'..image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v4
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v4",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5.1",
"reasoning_effort": "<block_does_not_provide_example>",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v3¶
Class: OpenAIBlockV3 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v3.OpenAIBlockV3
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v3to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v3.
- inputs:
Anthropic Claude,Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Webhook Sink,Multi-Label Classification Model,Email Notification,Dynamic Zone,QR Code Generator,Dynamic Crop,VLM As Detector,Google Gemini,LMM,Image Blur,Corner Visualization,Image Convert Grayscale,Stability AI Outpainting,Halo Visualization,Stability AI Inpainting,Object Detection Model,Image Contours,Trace Visualization,Google Vision OCR,Morphological Transformation,Triangle Visualization,Clip Comparison,Relative Static Crop,CSV Formatter,Text Display,Stitch Images,Google Gemini,Camera Calibration,Grid Visualization,Local File Sink,Slack Notification,VLM As Classifier,Roboflow Dataset Upload,Camera Focus,Color Visualization,Dot Visualization,Image Slicer,Polygon Visualization,Anthropic Claude,LMM For Classification,Line Counter Visualization,Llama 3.2 Vision,Keypoint Detection Model,Buffer,Contrast Equalization,Identify Changes,SIFT Comparison,Dimension Collapse,Camera Focus,Background Subtraction,Image Slicer,Circle Visualization,Halo Visualization,Florence-2 Model,Blur Visualization,Label Visualization,Twilio SMS/MMS Notification,Clip Comparison,Email Notification,Ellipse Visualization,OpenAI,SIFT,Image Preprocessing,Model Monitoring Inference Aggregator,Single-Label Classification Model,Detections List Roll-Up,OpenAI,Image Threshold,Background Color Visualization,Model Comparison Visualization,Depth Estimation,OpenAI,Motion Detection,Size Measurement,Cosine Similarity,CogVLM,Absolute Static Crop,Roboflow Custom Metadata,Gaze Detection,EasyOCR,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Pixelate Visualization,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,Polygon Visualization,Twilio SMS Notification,Bounding Box Visualization,Polygon Zone Visualization,OCR Model,Icon Visualization,Crop Visualization,Stitch OCR Detections,Google Gemini,OpenAI,Florence-2 Model,Roboflow Dataset Upload - outputs:
Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Detections Consensus,Webhook Sink,Email Notification,QR Code Generator,VLM As Detector,LMM,SAM 3,Corner Visualization,Stability AI Outpainting,Segment Anything 2 Model,Halo Visualization,JSON Parser,Object Detection Model,Trace Visualization,Google Vision OCR,Instance Segmentation Model,Clip Comparison,Text Display,Google Gemini,Slack Notification,Local File Sink,VLM As Classifier,Roboflow Dataset Upload,PTZ Tracking (ONVIF).md),Color Visualization,Dot Visualization,Polygon Visualization,Object Detection Model,Anthropic Claude,Buffer,Contrast Equalization,Detections Classes Replacement,Perception Encoder Embedding Model,Moondream2,Halo Visualization,Florence-2 Model,Twilio SMS/MMS Notification,Label Visualization,Ellipse Visualization,OpenAI,Model Monitoring Inference Aggregator,Detections List Roll-Up,OpenAI,Image Threshold,Model Comparison Visualization,Background Color Visualization,Size Measurement,OpenAI,Keypoint Detection Model,Twilio SMS Notification,Polygon Visualization,SAM 3,Bounding Box Visualization,Icon Visualization,Time in Zone,Google Gemini,Florence-2 Model,Roboflow Dataset Upload,Anthropic Claude,Dynamic Crop,CLIP Embedding Model,VLM As Detector,Google Gemini,Path Deviation,Image Blur,Line Counter,Cache Set,Stability AI Inpainting,Path Deviation,Morphological Transformation,Triangle Visualization,Detections Stitch,Grid Visualization,Llama 3.2 Vision,Line Counter Visualization,LMM For Classification,Keypoint Detection Model,Distance Measurement,SIFT Comparison,Time in Zone,Circle Visualization,Seg Preview,Clip Comparison,Email Notification,Image Preprocessing,SAM 3,Depth Estimation,Cache Get,Line Counter,Time in Zone,CogVLM,Roboflow Custom Metadata,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,VLM As Classifier,Polygon Zone Visualization,YOLO-World Model,Stitch OCR Detections,Crop Visualization,Pixel Color Count,Motion Detection,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v3 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string,ROBOFLOW_MANAGED_KEY]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v3
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v3",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v2¶
Class: OpenAIBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v2.OpenAIBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-4o and GPT-5).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v2.
- inputs:
Anthropic Claude,Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Webhook Sink,Multi-Label Classification Model,Email Notification,Dynamic Zone,QR Code Generator,Dynamic Crop,VLM As Detector,Google Gemini,LMM,Image Blur,Corner Visualization,Image Convert Grayscale,Stability AI Outpainting,Halo Visualization,Stability AI Inpainting,Object Detection Model,Image Contours,Trace Visualization,Google Vision OCR,Morphological Transformation,Triangle Visualization,Clip Comparison,Relative Static Crop,CSV Formatter,Text Display,Stitch Images,Google Gemini,Camera Calibration,Grid Visualization,Local File Sink,Slack Notification,VLM As Classifier,Roboflow Dataset Upload,Camera Focus,Color Visualization,Dot Visualization,Image Slicer,Polygon Visualization,Anthropic Claude,LMM For Classification,Line Counter Visualization,Llama 3.2 Vision,Keypoint Detection Model,Buffer,Contrast Equalization,Identify Changes,SIFT Comparison,Dimension Collapse,Camera Focus,Background Subtraction,Image Slicer,Circle Visualization,Halo Visualization,Florence-2 Model,Blur Visualization,Label Visualization,Twilio SMS/MMS Notification,Clip Comparison,Email Notification,Ellipse Visualization,OpenAI,SIFT,Image Preprocessing,Model Monitoring Inference Aggregator,Single-Label Classification Model,Detections List Roll-Up,OpenAI,Image Threshold,Background Color Visualization,Model Comparison Visualization,Depth Estimation,OpenAI,Motion Detection,Size Measurement,Cosine Similarity,CogVLM,Absolute Static Crop,Roboflow Custom Metadata,Gaze Detection,EasyOCR,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Pixelate Visualization,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,Polygon Visualization,Twilio SMS Notification,Bounding Box Visualization,Polygon Zone Visualization,OCR Model,Icon Visualization,Crop Visualization,Stitch OCR Detections,Google Gemini,OpenAI,Florence-2 Model,Roboflow Dataset Upload - outputs:
Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Detections Consensus,Webhook Sink,Email Notification,QR Code Generator,VLM As Detector,LMM,SAM 3,Corner Visualization,Stability AI Outpainting,Segment Anything 2 Model,Halo Visualization,JSON Parser,Object Detection Model,Trace Visualization,Google Vision OCR,Instance Segmentation Model,Clip Comparison,Text Display,Google Gemini,Slack Notification,Local File Sink,VLM As Classifier,Roboflow Dataset Upload,PTZ Tracking (ONVIF).md),Color Visualization,Dot Visualization,Polygon Visualization,Object Detection Model,Anthropic Claude,Buffer,Contrast Equalization,Detections Classes Replacement,Perception Encoder Embedding Model,Moondream2,Halo Visualization,Florence-2 Model,Twilio SMS/MMS Notification,Label Visualization,Ellipse Visualization,OpenAI,Model Monitoring Inference Aggregator,Detections List Roll-Up,OpenAI,Image Threshold,Model Comparison Visualization,Background Color Visualization,Size Measurement,OpenAI,Keypoint Detection Model,Twilio SMS Notification,Polygon Visualization,SAM 3,Bounding Box Visualization,Icon Visualization,Time in Zone,Google Gemini,Florence-2 Model,Roboflow Dataset Upload,Anthropic Claude,Dynamic Crop,CLIP Embedding Model,VLM As Detector,Google Gemini,Path Deviation,Image Blur,Line Counter,Cache Set,Stability AI Inpainting,Path Deviation,Morphological Transformation,Triangle Visualization,Detections Stitch,Grid Visualization,Llama 3.2 Vision,Line Counter Visualization,LMM For Classification,Keypoint Detection Model,Distance Measurement,SIFT Comparison,Time in Zone,Circle Visualization,Seg Preview,Clip Comparison,Email Notification,Image Preprocessing,SAM 3,Depth Estimation,Cache Get,Line Counter,Time in Zone,CogVLM,Roboflow Custom Metadata,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,VLM As Classifier,Polygon Zone Visualization,YOLO-World Model,Stitch OCR Detections,Crop Visualization,Pixel Color Count,Motion Detection,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v2 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v2",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-4o",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v1¶
Class: OpenAIBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v1.OpenAIBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts to the OpenAIBlock.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
openai_api_key |
str |
Your OpenAI API key. | ✅ |
openai_model |
str |
Model to be used. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v1.
- inputs:
Anthropic Claude,Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Webhook Sink,Multi-Label Classification Model,Email Notification,QR Code Generator,Dynamic Crop,VLM As Detector,Google Gemini,LMM,Image Blur,Corner Visualization,Image Convert Grayscale,Stability AI Outpainting,Halo Visualization,Stability AI Inpainting,Object Detection Model,Image Contours,Trace Visualization,Google Vision OCR,Morphological Transformation,Triangle Visualization,Relative Static Crop,CSV Formatter,Text Display,Stitch Images,Google Gemini,Camera Calibration,Grid Visualization,Local File Sink,Slack Notification,VLM As Classifier,Roboflow Dataset Upload,Camera Focus,Color Visualization,Dot Visualization,Image Slicer,Polygon Visualization,Anthropic Claude,LMM For Classification,Line Counter Visualization,Llama 3.2 Vision,Keypoint Detection Model,Contrast Equalization,SIFT Comparison,Camera Focus,Background Subtraction,Image Slicer,Circle Visualization,Halo Visualization,Florence-2 Model,Blur Visualization,Label Visualization,Twilio SMS/MMS Notification,Clip Comparison,Email Notification,Ellipse Visualization,OpenAI,SIFT,Image Preprocessing,Model Monitoring Inference Aggregator,Single-Label Classification Model,OpenAI,Image Threshold,Background Color Visualization,Model Comparison Visualization,Depth Estimation,OpenAI,CogVLM,Absolute Static Crop,Roboflow Custom Metadata,EasyOCR,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Pixelate Visualization,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,Polygon Visualization,Twilio SMS Notification,Bounding Box Visualization,Polygon Zone Visualization,OCR Model,Icon Visualization,Crop Visualization,Stitch OCR Detections,Google Gemini,OpenAI,Florence-2 Model,Roboflow Dataset Upload - outputs:
Detections Consensus,Detections Merge,Webhook Sink,Multi-Label Classification Model,VLM As Detector,Multi-Label Classification Model,SAM 3,Corner Visualization,Image Convert Grayscale,Segment Anything 2 Model,Halo Visualization,Object Detection Model,Trace Visualization,Google Vision OCR,Instance Segmentation Model,CSV Formatter,Google Gemini,Slack Notification,VLM As Classifier,Color Visualization,Dot Visualization,Polygon Visualization,Buffer,Contrast Equalization,Identify Changes,Dimension Collapse,Velocity,Continue If,Moondream2,SIFT Comparison,Halo Visualization,Florence-2 Model,Blur Visualization,Twilio SMS/MMS Notification,Detections List Roll-Up,OpenAI,Model Comparison Visualization,Keypoint Detection Model,Gaze Detection,Twilio SMS Notification,SAM 3,OCR Model,Google Gemini,Florence-2 Model,Roboflow Dataset Upload,CLIP Embedding Model,Image Blur,Cache Set,SmolVLM2,Template Matching,Image Contours,Path Deviation,Morphological Transformation,Detections Stitch,Property Definition,Camera Calibration,Camera Focus,Detections Combine,Llama 3.2 Vision,Line Counter Visualization,LMM For Classification,SIFT Comparison,Camera Focus,Dominant Color,Time in Zone,Background Subtraction,Image Slicer,Identify Outliers,Qwen3-VL,Byte Tracker,SAM 3,Depth Estimation,Cosine Similarity,Line Counter,CogVLM,EasyOCR,Stitch OCR Detections,Qwen2.5-VL,Keypoint Visualization,Detection Event Log,Stitch OCR Detections,Crop Visualization,Pixel Color Count,OpenAI,Barcode Detection,Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Email Notification,QR Code Generator,LMM,Detection Offset,Stability AI Outpainting,JSON Parser,Single-Label Classification Model,Clip Comparison,Text Display,Stitch Images,Local File Sink,Roboflow Dataset Upload,PTZ Tracking (ONVIF).md),Object Detection Model,Anthropic Claude,Byte Tracker,Detections Classes Replacement,Perception Encoder Embedding Model,First Non Empty Or Default,Expression,Label Visualization,Ellipse Visualization,OpenAI,SIFT,Model Monitoring Inference Aggregator,Single-Label Classification Model,Image Threshold,Background Color Visualization,Size Measurement,OpenAI,Polygon Visualization,Bounding Box Visualization,Overlap Filter,Icon Visualization,Time in Zone,Anthropic Claude,Dynamic Zone,Dynamic Crop,VLM As Detector,Google Gemini,Path Deviation,Line Counter,Byte Tracker,Stability AI Inpainting,Triangle Visualization,Bounding Rectangle,Relative Static Crop,Detections Filter,Grid Visualization,Detections Stabilizer,Delta Filter,Image Slicer,Keypoint Detection Model,Distance Measurement,Circle Visualization,Seg Preview,Clip Comparison,Email Notification,QR Code Detection,Image Preprocessing,Cache Get,Time in Zone,Absolute Static Crop,Roboflow Custom Metadata,Perspective Correction,Anthropic Claude,Pixelate Visualization,Data Aggregator,Stability AI Image Generation,Reference Path Visualization,VLM As Classifier,Polygon Zone Visualization,YOLO-World Model,Motion Detection,Rate Limiter,Detections Transformation
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.openai_api_key(Union[secret,string]): Your OpenAI API key.openai_model(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step OpenAI in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"openai_api_key": "xxx-xxx",
"openai_model": "gpt-4o",
"json_output_format": {
"count": "number of cats in the picture"
},
"image_detail": "auto",
"max_tokens": 450
}