OpenAI¶
v4¶
Class: OpenAIBlockV4 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v4.OpenAIBlockV4
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Unprompted Object Detection (
object-detection) - Model detects and returns the bounding boxes for prominent objects in the image -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v4to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
reasoning_effort |
str |
Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'.. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in its response. If not specified, the model will use its default limit. Minimum value is 16.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v4.
- inputs:
Florence-2 Model,Trace Visualization,Roboflow Dataset Upload,Classification Label Visualization,Stitch Images,Image Slicer,Ellipse Visualization,Clip Comparison,Crop Visualization,Grid Visualization,Morphological Transformation,Triangle Visualization,Reference Path Visualization,Roboflow Dataset Upload,Google Gemini,LMM,Stitch OCR Detections,Twilio SMS/MMS Notification,Dimension Collapse,Image Slicer,Local File Sink,VLM As Classifier,Icon Visualization,QR Code Generator,Stability AI Outpainting,OpenAI,Florence-2 Model,Google Vision OCR,Camera Focus,Pixelate Visualization,Model Comparison Visualization,Gaze Detection,Image Preprocessing,Cosine Similarity,Background Color Visualization,Clip Comparison,Color Visualization,Twilio SMS Notification,Polygon Zone Visualization,OpenAI,Halo Visualization,Background Subtraction,Keypoint Detection Model,Keypoint Visualization,Instance Segmentation Model,Contrast Equalization,EasyOCR,Image Blur,Polygon Visualization,Anthropic Claude,SIFT,Google Gemini,Webhook Sink,Perspective Correction,Object Detection Model,Circle Visualization,Blur Visualization,Dot Visualization,Camera Calibration,Heatmap Visualization,Image Threshold,Multi-Label Classification Model,Relative Static Crop,Google Gemini,Text Display,Email Notification,OpenAI,Single-Label Classification Model,Anthropic Claude,Depth Estimation,Mask Visualization,CSV Formatter,Stability AI Image Generation,Dynamic Zone,Buffer,Size Measurement,Halo Visualization,Absolute Static Crop,OCR Model,Label Visualization,Stability AI Inpainting,Motion Detection,Anthropic Claude,Corner Visualization,Image Convert Grayscale,Stitch OCR Detections,Roboflow Custom Metadata,SIFT Comparison,Polygon Visualization,CogVLM,Detections List Roll-Up,VLM As Detector,Line Counter Visualization,Bounding Box Visualization,Llama 3.2 Vision,Camera Focus,Email Notification,Slack Notification,Identify Changes,Dynamic Crop,Image Contours,Model Monitoring Inference Aggregator,LMM For Classification,OpenAI - outputs:
Florence-2 Model,Trace Visualization,Roboflow Dataset Upload,Classification Label Visualization,Line Counter,Clip Comparison,Ellipse Visualization,Triangle Visualization,Morphological Transformation,Path Deviation,LMM,Local File Sink,VLM As Classifier,Icon Visualization,QR Code Generator,Stability AI Outpainting,OpenAI,Moondream2,Keypoint Detection Model,Florence-2 Model,Object Detection Model,Background Color Visualization,Clip Comparison,Time in Zone,Keypoint Detection Model,Keypoint Visualization,Perception Encoder Embedding Model,Image Blur,Anthropic Claude,Polygon Visualization,Webhook Sink,Object Detection Model,Cache Get,YOLO-World Model,Heatmap Visualization,Image Threshold,Google Gemini,Text Display,OpenAI,Instance Segmentation Model,Anthropic Claude,Time in Zone,Path Deviation,Detections Consensus,Stability AI Inpainting,Roboflow Custom Metadata,Polygon Visualization,CogVLM,Bounding Box Visualization,CLIP Embedding Model,Llama 3.2 Vision,Email Notification,Dynamic Crop,Time in Zone,LMM For Classification,Buffer,Seg Preview,Segment Anything 2 Model,Line Counter,SAM 3,Distance Measurement,Crop Visualization,Roboflow Dataset Upload,Grid Visualization,Google Gemini,Twilio SMS/MMS Notification,Stitch OCR Detections,Reference Path Visualization,Detections Classes Replacement,Google Vision OCR,Pixel Color Count,Model Comparison Visualization,Image Preprocessing,Twilio SMS Notification,Color Visualization,Polygon Zone Visualization,OpenAI,Halo Visualization,Instance Segmentation Model,Contrast Equalization,Google Gemini,Perspective Correction,Circle Visualization,Dot Visualization,Email Notification,Depth Estimation,VLM As Detector,Mask Visualization,Stability AI Image Generation,Size Measurement,Halo Visualization,Detections Stitch,Label Visualization,Motion Detection,Anthropic Claude,Stitch OCR Detections,Corner Visualization,Cache Set,SIFT Comparison,Detections List Roll-Up,SAM 3,VLM As Detector,Line Counter Visualization,SAM 3,VLM As Classifier,JSON Parser,PTZ Tracking (ONVIF),Slack Notification,Model Monitoring Inference Aggregator,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v4 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string,ROBOFLOW_MANAGED_KEY]): Your OpenAI API key.model_version(string): Model to be used.reasoning_effort(string): Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'..image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v4
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v4",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5.1",
"reasoning_effort": "<block_does_not_provide_example>",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v3¶
Class: OpenAIBlockV3 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v3.OpenAIBlockV3
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v3to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v3.
- inputs:
Florence-2 Model,Trace Visualization,Roboflow Dataset Upload,Classification Label Visualization,Stitch Images,Image Slicer,Ellipse Visualization,Clip Comparison,Crop Visualization,Grid Visualization,Morphological Transformation,Triangle Visualization,Reference Path Visualization,Roboflow Dataset Upload,Google Gemini,LMM,Stitch OCR Detections,Twilio SMS/MMS Notification,Dimension Collapse,Image Slicer,Local File Sink,VLM As Classifier,Icon Visualization,QR Code Generator,Stability AI Outpainting,OpenAI,Florence-2 Model,Google Vision OCR,Camera Focus,Pixelate Visualization,Model Comparison Visualization,Gaze Detection,Image Preprocessing,Cosine Similarity,Background Color Visualization,Clip Comparison,Color Visualization,Twilio SMS Notification,Polygon Zone Visualization,OpenAI,Halo Visualization,Background Subtraction,Keypoint Detection Model,Keypoint Visualization,Instance Segmentation Model,Contrast Equalization,EasyOCR,Image Blur,Polygon Visualization,Anthropic Claude,SIFT,Google Gemini,Webhook Sink,Perspective Correction,Object Detection Model,Circle Visualization,Blur Visualization,Dot Visualization,Camera Calibration,Heatmap Visualization,Image Threshold,Multi-Label Classification Model,Relative Static Crop,Google Gemini,Text Display,Email Notification,OpenAI,Single-Label Classification Model,Anthropic Claude,Depth Estimation,Mask Visualization,CSV Formatter,Stability AI Image Generation,Dynamic Zone,Buffer,Size Measurement,Halo Visualization,Absolute Static Crop,OCR Model,Label Visualization,Stability AI Inpainting,Motion Detection,Anthropic Claude,Corner Visualization,Image Convert Grayscale,Stitch OCR Detections,Roboflow Custom Metadata,SIFT Comparison,Polygon Visualization,CogVLM,Detections List Roll-Up,VLM As Detector,Line Counter Visualization,Bounding Box Visualization,Llama 3.2 Vision,Camera Focus,Email Notification,Slack Notification,Identify Changes,Dynamic Crop,Image Contours,Model Monitoring Inference Aggregator,LMM For Classification,OpenAI - outputs:
Florence-2 Model,Trace Visualization,Roboflow Dataset Upload,Classification Label Visualization,Line Counter,Clip Comparison,Ellipse Visualization,Triangle Visualization,Morphological Transformation,Path Deviation,LMM,Local File Sink,VLM As Classifier,Icon Visualization,QR Code Generator,Stability AI Outpainting,OpenAI,Moondream2,Keypoint Detection Model,Florence-2 Model,Object Detection Model,Background Color Visualization,Clip Comparison,Time in Zone,Keypoint Detection Model,Keypoint Visualization,Perception Encoder Embedding Model,Image Blur,Anthropic Claude,Polygon Visualization,Webhook Sink,Object Detection Model,Cache Get,YOLO-World Model,Heatmap Visualization,Image Threshold,Google Gemini,Text Display,OpenAI,Instance Segmentation Model,Anthropic Claude,Time in Zone,Path Deviation,Detections Consensus,Stability AI Inpainting,Roboflow Custom Metadata,Polygon Visualization,CogVLM,Bounding Box Visualization,CLIP Embedding Model,Llama 3.2 Vision,Email Notification,Dynamic Crop,Time in Zone,LMM For Classification,Buffer,Seg Preview,Segment Anything 2 Model,Line Counter,SAM 3,Distance Measurement,Crop Visualization,Roboflow Dataset Upload,Grid Visualization,Google Gemini,Twilio SMS/MMS Notification,Stitch OCR Detections,Reference Path Visualization,Detections Classes Replacement,Google Vision OCR,Pixel Color Count,Model Comparison Visualization,Image Preprocessing,Twilio SMS Notification,Color Visualization,Polygon Zone Visualization,OpenAI,Halo Visualization,Instance Segmentation Model,Contrast Equalization,Google Gemini,Perspective Correction,Circle Visualization,Dot Visualization,Email Notification,Depth Estimation,VLM As Detector,Mask Visualization,Stability AI Image Generation,Size Measurement,Halo Visualization,Detections Stitch,Label Visualization,Motion Detection,Anthropic Claude,Stitch OCR Detections,Corner Visualization,Cache Set,SIFT Comparison,Detections List Roll-Up,SAM 3,VLM As Detector,Line Counter Visualization,SAM 3,VLM As Classifier,JSON Parser,PTZ Tracking (ONVIF),Slack Notification,Model Monitoring Inference Aggregator,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v3 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string,ROBOFLOW_MANAGED_KEY]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v3
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v3",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v2¶
Class: OpenAIBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v2.OpenAIBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-4o and GPT-5).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v2.
- inputs:
Florence-2 Model,Trace Visualization,Roboflow Dataset Upload,Classification Label Visualization,Stitch Images,Image Slicer,Ellipse Visualization,Clip Comparison,Crop Visualization,Grid Visualization,Morphological Transformation,Triangle Visualization,Reference Path Visualization,Roboflow Dataset Upload,Google Gemini,LMM,Stitch OCR Detections,Twilio SMS/MMS Notification,Dimension Collapse,Image Slicer,Local File Sink,VLM As Classifier,Icon Visualization,QR Code Generator,Stability AI Outpainting,OpenAI,Florence-2 Model,Google Vision OCR,Camera Focus,Pixelate Visualization,Model Comparison Visualization,Gaze Detection,Image Preprocessing,Cosine Similarity,Background Color Visualization,Clip Comparison,Color Visualization,Twilio SMS Notification,Polygon Zone Visualization,OpenAI,Halo Visualization,Background Subtraction,Keypoint Detection Model,Keypoint Visualization,Instance Segmentation Model,Contrast Equalization,EasyOCR,Image Blur,Polygon Visualization,Anthropic Claude,SIFT,Google Gemini,Webhook Sink,Perspective Correction,Object Detection Model,Circle Visualization,Blur Visualization,Dot Visualization,Camera Calibration,Heatmap Visualization,Image Threshold,Multi-Label Classification Model,Relative Static Crop,Google Gemini,Text Display,Email Notification,OpenAI,Single-Label Classification Model,Anthropic Claude,Depth Estimation,Mask Visualization,CSV Formatter,Stability AI Image Generation,Dynamic Zone,Buffer,Size Measurement,Halo Visualization,Absolute Static Crop,OCR Model,Label Visualization,Stability AI Inpainting,Motion Detection,Anthropic Claude,Corner Visualization,Image Convert Grayscale,Stitch OCR Detections,Roboflow Custom Metadata,SIFT Comparison,Polygon Visualization,CogVLM,Detections List Roll-Up,VLM As Detector,Line Counter Visualization,Bounding Box Visualization,Llama 3.2 Vision,Camera Focus,Email Notification,Slack Notification,Identify Changes,Dynamic Crop,Image Contours,Model Monitoring Inference Aggregator,LMM For Classification,OpenAI - outputs:
Florence-2 Model,Trace Visualization,Roboflow Dataset Upload,Classification Label Visualization,Line Counter,Clip Comparison,Ellipse Visualization,Triangle Visualization,Morphological Transformation,Path Deviation,LMM,Local File Sink,VLM As Classifier,Icon Visualization,QR Code Generator,Stability AI Outpainting,OpenAI,Moondream2,Keypoint Detection Model,Florence-2 Model,Object Detection Model,Background Color Visualization,Clip Comparison,Time in Zone,Keypoint Detection Model,Keypoint Visualization,Perception Encoder Embedding Model,Image Blur,Anthropic Claude,Polygon Visualization,Webhook Sink,Object Detection Model,Cache Get,YOLO-World Model,Heatmap Visualization,Image Threshold,Google Gemini,Text Display,OpenAI,Instance Segmentation Model,Anthropic Claude,Time in Zone,Path Deviation,Detections Consensus,Stability AI Inpainting,Roboflow Custom Metadata,Polygon Visualization,CogVLM,Bounding Box Visualization,CLIP Embedding Model,Llama 3.2 Vision,Email Notification,Dynamic Crop,Time in Zone,LMM For Classification,Buffer,Seg Preview,Segment Anything 2 Model,Line Counter,SAM 3,Distance Measurement,Crop Visualization,Roboflow Dataset Upload,Grid Visualization,Google Gemini,Twilio SMS/MMS Notification,Stitch OCR Detections,Reference Path Visualization,Detections Classes Replacement,Google Vision OCR,Pixel Color Count,Model Comparison Visualization,Image Preprocessing,Twilio SMS Notification,Color Visualization,Polygon Zone Visualization,OpenAI,Halo Visualization,Instance Segmentation Model,Contrast Equalization,Google Gemini,Perspective Correction,Circle Visualization,Dot Visualization,Email Notification,Depth Estimation,VLM As Detector,Mask Visualization,Stability AI Image Generation,Size Measurement,Halo Visualization,Detections Stitch,Label Visualization,Motion Detection,Anthropic Claude,Stitch OCR Detections,Corner Visualization,Cache Set,SIFT Comparison,Detections List Roll-Up,SAM 3,VLM As Detector,Line Counter Visualization,SAM 3,VLM As Classifier,JSON Parser,PTZ Tracking (ONVIF),Slack Notification,Model Monitoring Inference Aggregator,OpenAI
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v2 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v2",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-4o",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v1¶
Class: OpenAIBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v1.OpenAIBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts to the OpenAIBlock.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
openai_api_key |
str |
Your OpenAI API key. | ✅ |
openai_model |
str |
Model to be used. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v1.
- inputs:
Florence-2 Model,Trace Visualization,Roboflow Dataset Upload,Classification Label Visualization,Stitch Images,Image Slicer,Ellipse Visualization,Crop Visualization,Grid Visualization,Morphological Transformation,Triangle Visualization,Reference Path Visualization,Roboflow Dataset Upload,Google Gemini,LMM,Stitch OCR Detections,Twilio SMS/MMS Notification,Image Slicer,Local File Sink,VLM As Classifier,Icon Visualization,QR Code Generator,Stability AI Outpainting,OpenAI,Florence-2 Model,Google Vision OCR,Camera Focus,Pixelate Visualization,Model Comparison Visualization,Image Preprocessing,Background Color Visualization,Clip Comparison,Color Visualization,Twilio SMS Notification,Polygon Zone Visualization,OpenAI,Halo Visualization,Background Subtraction,Keypoint Detection Model,Keypoint Visualization,Instance Segmentation Model,Contrast Equalization,EasyOCR,Image Blur,Polygon Visualization,Anthropic Claude,SIFT,Google Gemini,Webhook Sink,Perspective Correction,Object Detection Model,Circle Visualization,Blur Visualization,Dot Visualization,Camera Calibration,Heatmap Visualization,Image Threshold,Multi-Label Classification Model,Relative Static Crop,Google Gemini,Text Display,Email Notification,OpenAI,Single-Label Classification Model,Anthropic Claude,Depth Estimation,Mask Visualization,CSV Formatter,Stability AI Image Generation,Halo Visualization,Absolute Static Crop,OCR Model,Label Visualization,Stability AI Inpainting,Anthropic Claude,Corner Visualization,Image Convert Grayscale,Stitch OCR Detections,Roboflow Custom Metadata,SIFT Comparison,Polygon Visualization,CogVLM,VLM As Detector,Line Counter Visualization,Bounding Box Visualization,Llama 3.2 Vision,Camera Focus,Email Notification,Slack Notification,Dynamic Crop,Image Contours,Model Monitoring Inference Aggregator,LMM For Classification,OpenAI - outputs:
Florence-2 Model,Trace Visualization,Roboflow Dataset Upload,Single-Label Classification Model,Qwen3-VL,Triangle Visualization,Morphological Transformation,Path Deviation,LMM,SmolVLM2,Local File Sink,Stability AI Outpainting,Gaze Detection,Background Color Visualization,Time in Zone,Background Subtraction,Keypoint Detection Model,Keypoint Visualization,Perception Encoder Embedding Model,Image Blur,Anthropic Claude,Polygon Visualization,SIFT,Webhook Sink,Dominant Color,Property Definition,Multi-Label Classification Model,Google Gemini,Qwen2.5-VL,Single-Label Classification Model,Anthropic Claude,CSV Formatter,Path Deviation,Detections Consensus,Stability AI Inpainting,QR Code Detection,Polygon Visualization,CogVLM,Velocity,Email Notification,Time in Zone,Image Contours,Stitch Images,Image Slicer,Line Counter,Byte Tracker,SAM 3,Distance Measurement,Roboflow Dataset Upload,Grid Visualization,Stitch OCR Detections,Multi-Label Classification Model,Data Aggregator,Detections Classes Replacement,Detection Offset,Detections Transformation,Pixel Color Count,Camera Focus,Image Preprocessing,Twilio SMS Notification,Polygon Zone Visualization,Contrast Equalization,Mask Area Measurement,Blur Visualization,Relative Static Crop,VLM As Detector,Mask Visualization,Dynamic Zone,Size Measurement,Halo Visualization,Detections Stitch,OCR Model,Label Visualization,Anthropic Claude,Corner Visualization,SIFT Comparison,VLM As Detector,Line Counter Visualization,JSON Parser,Identify Changes,Model Monitoring Inference Aggregator,Byte Tracker,OpenAI,Detections Combine,Delta Filter,Classification Label Visualization,Line Counter,Clip Comparison,Ellipse Visualization,Detections Stabilizer,First Non Empty Or Default,Dimension Collapse,Barcode Detection,VLM As Classifier,Icon Visualization,QR Code Generator,OpenAI,Moondream2,Keypoint Detection Model,Florence-2 Model,Pixelate Visualization,Object Detection Model,Cosine Similarity,Clip Comparison,Overlap Filter,EasyOCR,Object Detection Model,Cache Get,YOLO-World Model,Heatmap Visualization,Image Threshold,Text Display,Detection Event Log,OpenAI,Instance Segmentation Model,Continue If,Time in Zone,Rate Limiter,Roboflow Custom Metadata,Bounding Box Visualization,CLIP Embedding Model,Llama 3.2 Vision,Identify Outliers,Camera Focus,Dynamic Crop,LMM For Classification,Buffer,Seg Preview,Segment Anything 2 Model,Bounding Rectangle,Crop Visualization,Google Gemini,Twilio SMS/MMS Notification,Reference Path Visualization,Image Slicer,Google Vision OCR,Model Comparison Visualization,Template Matching,Color Visualization,OpenAI,Halo Visualization,Instance Segmentation Model,Google Gemini,Perspective Correction,Circle Visualization,Dot Visualization,Camera Calibration,Email Notification,Depth Estimation,Stability AI Image Generation,Detections Filter,Byte Tracker,Absolute Static Crop,Detections Merge,Motion Detection,Stitch OCR Detections,Cache Set,Image Convert Grayscale,SIFT Comparison,Expression,Detections List Roll-Up,SAM 3,SAM 3,VLM As Classifier,PTZ Tracking (ONVIF),Slack Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.openai_api_key(Union[secret,string]): Your OpenAI API key.openai_model(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step OpenAI in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"openai_api_key": "xxx-xxx",
"openai_model": "gpt-4o",
"json_output_format": {
"count": "number of cats in the picture"
},
"image_detail": "auto",
"max_tokens": 450
}