OpenAI¶
v4¶
Class: OpenAIBlockV4 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v4.OpenAIBlockV4
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Unprompted Object Detection (
object-detection) - Model detects and returns the bounding boxes for prominent objects in the image -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v4to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
reasoning_effort |
str |
Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'.. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in its response. If not specified, the model will use its default limit. Minimum value is 16.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v4.
- inputs:
QR Code Generator,Image Convert Grayscale,Google Gemini,Dynamic Crop,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Identify Changes,Camera Focus,Slack Notification,Keypoint Visualization,Trace Visualization,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Local File Sink,Triangle Visualization,Polygon Zone Visualization,Halo Visualization,LMM,Stability AI Image Generation,Florence-2 Model,Circle Visualization,Email Notification,Google Vision OCR,Google Gemini,Motion Detection,Clip Comparison,Cosine Similarity,Camera Focus,Anthropic Claude,Object Detection Model,Instance Segmentation Model,Perspective Correction,CSV Formatter,Reference Path Visualization,Corner Visualization,Color Visualization,Twilio SMS/MMS Notification,VLM as Classifier,Image Slicer,OpenAI,Stitch OCR Detections,Camera Calibration,Image Blur,Buffer,VLM as Detector,Dot Visualization,Roboflow Custom Metadata,Image Threshold,Model Monitoring Inference Aggregator,Morphological Transformation,Label Visualization,Background Color Visualization,Classification Label Visualization,OCR Model,Roboflow Dataset Upload,Mask Visualization,Dynamic Zone,Detections List Roll-Up,Pixelate Visualization,Absolute Static Crop,Keypoint Detection Model,Size Measurement,Webhook Sink,Grid Visualization,Contrast Equalization,Image Preprocessing,Google Gemini,Relative Static Crop,Stability AI Inpainting,Image Contours,Line Counter Visualization,Stitch Images,OpenAI,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,SIFT Comparison,Gaze Detection,Depth Estimation,Twilio SMS Notification,Single-Label Classification Model,Florence-2 Model,Background Subtraction,LMM For Classification,Multi-Label Classification Model,EasyOCR,CogVLM,Image Slicer,Roboflow Dataset Upload,Email Notification - outputs:
QR Code Generator,Google Gemini,Bounding Box Visualization,Stability AI Outpainting,Trace Visualization,Instance Segmentation Model,Pixel Color Count,Ellipse Visualization,OpenAI,Model Comparison Visualization,Triangle Visualization,SAM 3,Distance Measurement,Stability AI Image Generation,Path Deviation,VLM as Detector,CLIP Embedding Model,Florence-2 Model,Email Notification,Google Gemini,VLM as Classifier,Corner Visualization,Color Visualization,Twilio SMS/MMS Notification,OpenAI,Line Counter,Buffer,SAM 3,Dot Visualization,Roboflow Custom Metadata,Image Threshold,Model Monitoring Inference Aggregator,Time in Zone,Classification Label Visualization,Roboflow Dataset Upload,Mask Visualization,Detections List Roll-Up,Line Counter,Keypoint Detection Model,Size Measurement,Detections Consensus,Webhook Sink,Stability AI Inpainting,Line Counter Visualization,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,Time in Zone,LMM For Classification,SAM 3,Object Detection Model,CogVLM,Roboflow Dataset Upload,Cache Get,Detections Stitch,Seg Preview,YOLO-World Model,Dynamic Crop,Slack Notification,Keypoint Visualization,Polygon Visualization,Cache Set,Anthropic Claude,Local File Sink,Polygon Zone Visualization,Halo Visualization,LMM,Time in Zone,Circle Visualization,Google Vision OCR,Motion Detection,Clip Comparison,Detections Classes Replacement,Anthropic Claude,Object Detection Model,Instance Segmentation Model,Perspective Correction,Perception Encoder Embedding Model,Reference Path Visualization,Stitch OCR Detections,Image Blur,VLM as Detector,Morphological Transformation,Label Visualization,Background Color Visualization,Keypoint Detection Model,Path Deviation,PTZ Tracking (ONVIF).md),Moondream2,Image Preprocessing,Contrast Equalization,Grid Visualization,Google Gemini,OpenAI,JSON Parser,SIFT Comparison,Depth Estimation,Twilio SMS Notification,VLM as Classifier,Florence-2 Model,Segment Anything 2 Model,Email Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v4 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[string,ROBOFLOW_MANAGED_KEY,secret]): Your OpenAI API key.model_version(string): Model to be used.reasoning_effort(string): Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'..image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v4
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v4",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5.1",
"reasoning_effort": "<block_does_not_provide_example>",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v3¶
Class: OpenAIBlockV3 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v3.OpenAIBlockV3
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v3to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v3.
- inputs:
QR Code Generator,Image Convert Grayscale,Google Gemini,Dynamic Crop,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Identify Changes,Camera Focus,Slack Notification,Keypoint Visualization,Trace Visualization,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Local File Sink,Triangle Visualization,Polygon Zone Visualization,Halo Visualization,LMM,Stability AI Image Generation,Florence-2 Model,Circle Visualization,Email Notification,Google Vision OCR,Google Gemini,Motion Detection,Clip Comparison,Cosine Similarity,Camera Focus,Anthropic Claude,Object Detection Model,Instance Segmentation Model,Perspective Correction,CSV Formatter,Reference Path Visualization,Corner Visualization,Color Visualization,Twilio SMS/MMS Notification,VLM as Classifier,Image Slicer,OpenAI,Stitch OCR Detections,Camera Calibration,Image Blur,Buffer,VLM as Detector,Dot Visualization,Roboflow Custom Metadata,Image Threshold,Model Monitoring Inference Aggregator,Morphological Transformation,Label Visualization,Background Color Visualization,Classification Label Visualization,OCR Model,Roboflow Dataset Upload,Mask Visualization,Dynamic Zone,Detections List Roll-Up,Pixelate Visualization,Absolute Static Crop,Keypoint Detection Model,Size Measurement,Webhook Sink,Grid Visualization,Contrast Equalization,Image Preprocessing,Google Gemini,Relative Static Crop,Stability AI Inpainting,Image Contours,Line Counter Visualization,Stitch Images,OpenAI,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,SIFT Comparison,Gaze Detection,Depth Estimation,Twilio SMS Notification,Single-Label Classification Model,Florence-2 Model,Background Subtraction,LMM For Classification,Multi-Label Classification Model,EasyOCR,CogVLM,Image Slicer,Roboflow Dataset Upload,Email Notification - outputs:
QR Code Generator,Google Gemini,Bounding Box Visualization,Stability AI Outpainting,Trace Visualization,Instance Segmentation Model,Pixel Color Count,Ellipse Visualization,OpenAI,Model Comparison Visualization,Triangle Visualization,SAM 3,Distance Measurement,Stability AI Image Generation,Path Deviation,VLM as Detector,CLIP Embedding Model,Florence-2 Model,Email Notification,Google Gemini,VLM as Classifier,Corner Visualization,Color Visualization,Twilio SMS/MMS Notification,OpenAI,Line Counter,Buffer,SAM 3,Dot Visualization,Roboflow Custom Metadata,Image Threshold,Model Monitoring Inference Aggregator,Time in Zone,Classification Label Visualization,Roboflow Dataset Upload,Mask Visualization,Detections List Roll-Up,Line Counter,Keypoint Detection Model,Size Measurement,Detections Consensus,Webhook Sink,Stability AI Inpainting,Line Counter Visualization,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,Time in Zone,LMM For Classification,SAM 3,Object Detection Model,CogVLM,Roboflow Dataset Upload,Cache Get,Detections Stitch,Seg Preview,YOLO-World Model,Dynamic Crop,Slack Notification,Keypoint Visualization,Polygon Visualization,Cache Set,Anthropic Claude,Local File Sink,Polygon Zone Visualization,Halo Visualization,LMM,Time in Zone,Circle Visualization,Google Vision OCR,Motion Detection,Clip Comparison,Detections Classes Replacement,Anthropic Claude,Object Detection Model,Instance Segmentation Model,Perspective Correction,Perception Encoder Embedding Model,Reference Path Visualization,Stitch OCR Detections,Image Blur,VLM as Detector,Morphological Transformation,Label Visualization,Background Color Visualization,Keypoint Detection Model,Path Deviation,PTZ Tracking (ONVIF).md),Moondream2,Image Preprocessing,Contrast Equalization,Grid Visualization,Google Gemini,OpenAI,JSON Parser,SIFT Comparison,Depth Estimation,Twilio SMS Notification,VLM as Classifier,Florence-2 Model,Segment Anything 2 Model,Email Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v3 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[string,ROBOFLOW_MANAGED_KEY,secret]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v3
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v3",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v2¶
Class: OpenAIBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v2.OpenAIBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-4o and GPT-5).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v2.
- inputs:
QR Code Generator,Image Convert Grayscale,Google Gemini,Dynamic Crop,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Identify Changes,Camera Focus,Slack Notification,Keypoint Visualization,Trace Visualization,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,OpenAI,Anthropic Claude,Dimension Collapse,Local File Sink,Triangle Visualization,Polygon Zone Visualization,Halo Visualization,LMM,Stability AI Image Generation,Florence-2 Model,Circle Visualization,Email Notification,Google Vision OCR,Google Gemini,Motion Detection,Clip Comparison,Cosine Similarity,Camera Focus,Anthropic Claude,Object Detection Model,Instance Segmentation Model,Perspective Correction,CSV Formatter,Reference Path Visualization,Corner Visualization,Color Visualization,Twilio SMS/MMS Notification,VLM as Classifier,Image Slicer,OpenAI,Stitch OCR Detections,Camera Calibration,Image Blur,Buffer,VLM as Detector,Dot Visualization,Roboflow Custom Metadata,Image Threshold,Model Monitoring Inference Aggregator,Morphological Transformation,Label Visualization,Background Color Visualization,Classification Label Visualization,OCR Model,Roboflow Dataset Upload,Mask Visualization,Dynamic Zone,Detections List Roll-Up,Pixelate Visualization,Absolute Static Crop,Keypoint Detection Model,Size Measurement,Webhook Sink,Grid Visualization,Contrast Equalization,Image Preprocessing,Google Gemini,Relative Static Crop,Stability AI Inpainting,Image Contours,Line Counter Visualization,Stitch Images,OpenAI,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,SIFT Comparison,Gaze Detection,Depth Estimation,Twilio SMS Notification,Single-Label Classification Model,Florence-2 Model,Background Subtraction,LMM For Classification,Multi-Label Classification Model,EasyOCR,CogVLM,Image Slicer,Roboflow Dataset Upload,Email Notification - outputs:
QR Code Generator,Google Gemini,Bounding Box Visualization,Stability AI Outpainting,Trace Visualization,Instance Segmentation Model,Pixel Color Count,Ellipse Visualization,OpenAI,Model Comparison Visualization,Triangle Visualization,SAM 3,Distance Measurement,Stability AI Image Generation,Path Deviation,VLM as Detector,CLIP Embedding Model,Florence-2 Model,Email Notification,Google Gemini,VLM as Classifier,Corner Visualization,Color Visualization,Twilio SMS/MMS Notification,OpenAI,Line Counter,Buffer,SAM 3,Dot Visualization,Roboflow Custom Metadata,Image Threshold,Model Monitoring Inference Aggregator,Time in Zone,Classification Label Visualization,Roboflow Dataset Upload,Mask Visualization,Detections List Roll-Up,Line Counter,Keypoint Detection Model,Size Measurement,Detections Consensus,Webhook Sink,Stability AI Inpainting,Line Counter Visualization,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,Time in Zone,LMM For Classification,SAM 3,Object Detection Model,CogVLM,Roboflow Dataset Upload,Cache Get,Detections Stitch,Seg Preview,YOLO-World Model,Dynamic Crop,Slack Notification,Keypoint Visualization,Polygon Visualization,Cache Set,Anthropic Claude,Local File Sink,Polygon Zone Visualization,Halo Visualization,LMM,Time in Zone,Circle Visualization,Google Vision OCR,Motion Detection,Clip Comparison,Detections Classes Replacement,Anthropic Claude,Object Detection Model,Instance Segmentation Model,Perspective Correction,Perception Encoder Embedding Model,Reference Path Visualization,Stitch OCR Detections,Image Blur,VLM as Detector,Morphological Transformation,Label Visualization,Background Color Visualization,Keypoint Detection Model,Path Deviation,PTZ Tracking (ONVIF).md),Moondream2,Image Preprocessing,Contrast Equalization,Grid Visualization,Google Gemini,OpenAI,JSON Parser,SIFT Comparison,Depth Estimation,Twilio SMS Notification,VLM as Classifier,Florence-2 Model,Segment Anything 2 Model,Email Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v2 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[string,secret]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v2",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-4o",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v1¶
Class: OpenAIBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v1.OpenAIBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts to the OpenAIBlock.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
openai_api_key |
str |
Your OpenAI API key. | ✅ |
openai_model |
str |
Model to be used. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v1.
- inputs:
QR Code Generator,Image Convert Grayscale,Google Gemini,Dynamic Crop,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Camera Focus,Slack Notification,Keypoint Visualization,Trace Visualization,Polygon Visualization,Ellipse Visualization,Model Comparison Visualization,OpenAI,Anthropic Claude,Local File Sink,Triangle Visualization,Polygon Zone Visualization,Halo Visualization,LMM,Stability AI Image Generation,Florence-2 Model,Circle Visualization,Email Notification,Google Vision OCR,Google Gemini,Clip Comparison,Camera Focus,Anthropic Claude,Object Detection Model,Instance Segmentation Model,Perspective Correction,CSV Formatter,Reference Path Visualization,Corner Visualization,Color Visualization,Twilio SMS/MMS Notification,VLM as Classifier,Image Slicer,OpenAI,Stitch OCR Detections,Camera Calibration,Image Blur,VLM as Detector,Dot Visualization,Roboflow Custom Metadata,Image Threshold,Model Monitoring Inference Aggregator,Morphological Transformation,Label Visualization,Background Color Visualization,Classification Label Visualization,OCR Model,Roboflow Dataset Upload,Mask Visualization,Pixelate Visualization,Absolute Static Crop,Keypoint Detection Model,Webhook Sink,Grid Visualization,Contrast Equalization,Image Preprocessing,Google Gemini,Relative Static Crop,Stability AI Inpainting,Image Contours,Line Counter Visualization,Stitch Images,OpenAI,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,SIFT Comparison,Depth Estimation,Twilio SMS Notification,Single-Label Classification Model,Florence-2 Model,Background Subtraction,LMM For Classification,Multi-Label Classification Model,EasyOCR,CogVLM,Image Slicer,Roboflow Dataset Upload,Email Notification - outputs:
QR Code Generator,Qwen3-VL,Google Gemini,Image Convert Grayscale,QR Code Detection,Velocity,SIFT Comparison,Detection Offset,Blur Visualization,SIFT,Bounding Box Visualization,Stability AI Outpainting,Trace Visualization,Instance Segmentation Model,Pixel Color Count,Ellipse Visualization,OpenAI,Model Comparison Visualization,Dimension Collapse,Triangle Visualization,SAM 3,Distance Measurement,Byte Tracker,Stability AI Image Generation,Path Deviation,VLM as Detector,CLIP Embedding Model,Florence-2 Model,Single-Label Classification Model,Email Notification,Google Gemini,Continue If,Camera Focus,VLM as Classifier,Corner Visualization,Color Visualization,Twilio SMS/MMS Notification,Multi-Label Classification Model,Image Slicer,OpenAI,Line Counter,Buffer,Property Definition,Detections Transformation,SAM 3,Dot Visualization,Roboflow Custom Metadata,Data Aggregator,Image Threshold,Model Monitoring Inference Aggregator,Time in Zone,Classification Label Visualization,Byte Tracker,OCR Model,Roboflow Dataset Upload,Mask Visualization,Detections List Roll-Up,Pixelate Visualization,Line Counter,Keypoint Detection Model,Detections Consensus,Size Measurement,Webhook Sink,Stability AI Inpainting,Template Matching,Line Counter Visualization,Crop Visualization,OpenAI,Llama 3.2 Vision,Icon Visualization,Clip Comparison,Time in Zone,Background Subtraction,LMM For Classification,SAM 3,Delta Filter,Object Detection Model,CogVLM,Roboflow Dataset Upload,Overlap Filter,Cache Get,Detections Stitch,Detections Combine,Seg Preview,Byte Tracker,YOLO-World Model,Dynamic Crop,First Non Empty Or Default,Identify Changes,Camera Focus,Slack Notification,Keypoint Visualization,Polygon Visualization,Cache Set,Dominant Color,Anthropic Claude,Local File Sink,Qwen2.5-VL,Polygon Zone Visualization,Halo Visualization,LMM,Time in Zone,Circle Visualization,Detections Stabilizer,Google Vision OCR,Motion Detection,Clip Comparison,Cosine Similarity,Detections Classes Replacement,Anthropic Claude,Detections Filter,Object Detection Model,Instance Segmentation Model,Perspective Correction,Perception Encoder Embedding Model,EasyOCR,Reference Path Visualization,CSV Formatter,Stitch OCR Detections,Image Blur,Camera Calibration,SmolVLM2,VLM as Detector,Rate Limiter,Expression,Morphological Transformation,Label Visualization,Background Color Visualization,Keypoint Detection Model,Path Deviation,Dynamic Zone,PTZ Tracking (ONVIF).md),Absolute Static Crop,Moondream2,Image Preprocessing,Contrast Equalization,Grid Visualization,Google Gemini,Relative Static Crop,OpenAI,Image Contours,Barcode Detection,Stitch Images,JSON Parser,Bounding Rectangle,SIFT Comparison,Gaze Detection,Depth Estimation,Twilio SMS Notification,Single-Label Classification Model,VLM as Classifier,Florence-2 Model,Detections Merge,Identify Outliers,Multi-Label Classification Model,Segment Anything 2 Model,Image Slicer,Email Notification
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.openai_api_key(Union[string,secret]): Your OpenAI API key.openai_model(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step OpenAI in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"openai_api_key": "xxx-xxx",
"openai_model": "gpt-4o",
"json_output_format": {
"count": "number of cats in the picture"
},
"image_detail": "auto",
"max_tokens": 450
}