OpenAI¶
v4¶
Class: OpenAIBlockV4 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v4.OpenAIBlockV4
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Unprompted Object Detection (
object-detection) - Model detects and returns the bounding boxes for prominent objects in the image -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v4to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
reasoning_effort |
str |
Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'.. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in its response. If not specified, the model will use its default limit. Minimum value is 16.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v4.
- inputs:
Roboflow Dataset Upload,Line Counter Visualization,OCR Model,Image Slicer,Gaze Detection,Instance Segmentation Model,Color Visualization,Ellipse Visualization,Polygon Visualization,Relative Static Crop,Webhook Sink,Trace Visualization,Stitch OCR Detections,Camera Focus,Qwen 3.5 API,OpenAI,Buffer,Size Measurement,Image Threshold,Heatmap Visualization,Florence-2 Model,Halo Visualization,GLM-OCR,Dot Visualization,S3 Sink,Twilio SMS Notification,Model Monitoring Inference Aggregator,Google Gemini,Roboflow Dataset Upload,Dynamic Zone,Clip Comparison,Pixelate Visualization,Twilio SMS/MMS Notification,Polygon Zone Visualization,Motion Detection,Blur Visualization,Background Subtraction,Text Display,CSV Formatter,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Bounding Box Visualization,Depth Estimation,Stability AI Inpainting,Polygon Visualization,SIFT,Roboflow Vision Events,Google Gemini,Label Visualization,Grid Visualization,Qwen3.5-VL,Contrast Equalization,Triangle Visualization,Halo Visualization,Circle Visualization,Mask Visualization,OpenAI,MoonshotAI Kimi,Llama 3.2 Vision,Email Notification,Slack Notification,Object Detection Model,Stability AI Outpainting,Email Notification,Google Gemma API,Google Vision OCR,Image Preprocessing,Google Gemini,EasyOCR,Cosine Similarity,OpenAI,Anthropic Claude,Model Comparison Visualization,Roboflow Custom Metadata,Single-Label Classification Model,VLM As Classifier,Detections List Roll-Up,Stitch Images,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,CogVLM,Crop Visualization,Camera Calibration,Florence-2 Model,Icon Visualization,Local File Sink,Image Contours,Reference Path Visualization,Dimension Collapse,Anthropic Claude,Clip Comparison,VLM As Detector,LMM,Identify Changes,Classification Label Visualization,Image Slicer,Absolute Static Crop,Image Blur,Multi-Label Classification Model,Image Convert Grayscale,OpenAI,Corner Visualization,Dynamic Crop,Keypoint Visualization,QR Code Generator,Camera Focus,LMM For Classification,Morphological Transformation,Keypoint Detection Model,Contrast Enhancement,Background Color Visualization,Stitch OCR Detections - outputs:
Roboflow Dataset Upload,Line Counter Visualization,Distance Measurement,Instance Segmentation Model,Color Visualization,Multi-Label Classification Model,Ellipse Visualization,Polygon Visualization,Single-Label Classification Model,Detections Consensus,Detections Classes Replacement,Cache Set,Webhook Sink,Trace Visualization,Stitch OCR Detections,Qwen 3.5 API,Object Detection Model,OpenAI,Buffer,SAM 3,Size Measurement,Image Threshold,Heatmap Visualization,Florence-2 Model,Halo Visualization,Path Deviation,GLM-OCR,Dot Visualization,S3 Sink,Path Deviation,Semantic Segmentation Model,Twilio SMS Notification,Seg Preview,Model Monitoring Inference Aggregator,Google Gemini,Roboflow Dataset Upload,Clip Comparison,VLM As Classifier,Line Counter,Twilio SMS/MMS Notification,Polygon Zone Visualization,Motion Detection,Text Display,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Line Counter,Bounding Box Visualization,Depth Estimation,Stability AI Inpainting,Polygon Visualization,Roboflow Vision Events,VLM As Detector,Google Gemini,Label Visualization,Grid Visualization,Contrast Equalization,Triangle Visualization,Halo Visualization,Circle Visualization,Segment Anything 2 Model,Mask Visualization,OpenAI,MoonshotAI Kimi,Llama 3.2 Vision,Email Notification,Slack Notification,CLIP Embedding Model,Detections Stitch,Object Detection Model,Email Notification,Google Gemma API,Stability AI Outpainting,Google Vision OCR,Google Gemini,Image Preprocessing,Object Detection Model,OpenAI,Anthropic Claude,Time in Zone,Model Comparison Visualization,Roboflow Custom Metadata,YOLO-World Model,Perception Encoder Embedding Model,Instance Segmentation Model,VLM As Classifier,Detections List Roll-Up,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,Instance Segmentation Model,CogVLM,Crop Visualization,Florence-2 Model,Time in Zone,SAM 3,Local File Sink,Icon Visualization,JSON Parser,Keypoint Detection Model,Time in Zone,Reference Path Visualization,Anthropic Claude,Clip Comparison,VLM As Detector,LMM,Pixel Color Count,Classification Label Visualization,Image Blur,SAM 3,OpenAI,Corner Visualization,Keypoint Detection Model,Dynamic Crop,Keypoint Visualization,Moondream2,QR Code Generator,LMM For Classification,Morphological Transformation,Keypoint Detection Model,Background Color Visualization,PTZ Tracking (ONVIF),Stitch OCR Detections,Cache Get
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v4 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string,ROBOFLOW_MANAGED_KEY]): Your OpenAI API key.model_version(string): Model to be used.reasoning_effort(string): Controls reasoning. Reducing can result in faster responses and fewer tokens. GPT-5.1 and higher models default to 'none' (no reasoning) and support 'none', 'low', 'medium', 'high'. GPT-5.2 also supports 'xhigh'. GPT-5 models default to 'medium' and support 'minimal', 'low', 'medium', 'high'..image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v4
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v4",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5.1",
"reasoning_effort": "<block_does_not_provide_example>",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v3¶
Class: OpenAIBlockV3 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v3.OpenAIBlockV3
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-5 and GPT-4o).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
Provide your OpenAI API key or set the value to rf_key:account (or
rf_key:user:<id>) to proxy requests through Roboflow's API.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v3to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v3.
- inputs:
Roboflow Dataset Upload,Line Counter Visualization,OCR Model,Image Slicer,Gaze Detection,Instance Segmentation Model,Color Visualization,Ellipse Visualization,Polygon Visualization,Relative Static Crop,Webhook Sink,Trace Visualization,Stitch OCR Detections,Camera Focus,Qwen 3.5 API,OpenAI,Buffer,Size Measurement,Image Threshold,Heatmap Visualization,Florence-2 Model,Halo Visualization,GLM-OCR,Dot Visualization,S3 Sink,Twilio SMS Notification,Model Monitoring Inference Aggregator,Google Gemini,Roboflow Dataset Upload,Dynamic Zone,Clip Comparison,Pixelate Visualization,Twilio SMS/MMS Notification,Polygon Zone Visualization,Motion Detection,Blur Visualization,Background Subtraction,Text Display,CSV Formatter,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Bounding Box Visualization,Depth Estimation,Stability AI Inpainting,Polygon Visualization,SIFT,Roboflow Vision Events,Google Gemini,Label Visualization,Grid Visualization,Qwen3.5-VL,Contrast Equalization,Triangle Visualization,Halo Visualization,Circle Visualization,Mask Visualization,OpenAI,MoonshotAI Kimi,Llama 3.2 Vision,Email Notification,Slack Notification,Object Detection Model,Stability AI Outpainting,Email Notification,Google Gemma API,Google Vision OCR,Image Preprocessing,Google Gemini,EasyOCR,Cosine Similarity,OpenAI,Anthropic Claude,Model Comparison Visualization,Roboflow Custom Metadata,Single-Label Classification Model,VLM As Classifier,Detections List Roll-Up,Stitch Images,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,CogVLM,Crop Visualization,Camera Calibration,Florence-2 Model,Icon Visualization,Local File Sink,Image Contours,Reference Path Visualization,Dimension Collapse,Anthropic Claude,Clip Comparison,VLM As Detector,LMM,Identify Changes,Classification Label Visualization,Image Slicer,Absolute Static Crop,Image Blur,Multi-Label Classification Model,Image Convert Grayscale,OpenAI,Corner Visualization,Dynamic Crop,Keypoint Visualization,QR Code Generator,Camera Focus,LMM For Classification,Morphological Transformation,Keypoint Detection Model,Contrast Enhancement,Background Color Visualization,Stitch OCR Detections - outputs:
Roboflow Dataset Upload,Line Counter Visualization,Distance Measurement,Instance Segmentation Model,Color Visualization,Multi-Label Classification Model,Ellipse Visualization,Polygon Visualization,Single-Label Classification Model,Detections Consensus,Detections Classes Replacement,Cache Set,Webhook Sink,Trace Visualization,Stitch OCR Detections,Qwen 3.5 API,Object Detection Model,OpenAI,Buffer,SAM 3,Size Measurement,Image Threshold,Heatmap Visualization,Florence-2 Model,Halo Visualization,Path Deviation,GLM-OCR,Dot Visualization,S3 Sink,Path Deviation,Semantic Segmentation Model,Twilio SMS Notification,Seg Preview,Model Monitoring Inference Aggregator,Google Gemini,Roboflow Dataset Upload,Clip Comparison,VLM As Classifier,Line Counter,Twilio SMS/MMS Notification,Polygon Zone Visualization,Motion Detection,Text Display,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Line Counter,Bounding Box Visualization,Depth Estimation,Stability AI Inpainting,Polygon Visualization,Roboflow Vision Events,VLM As Detector,Google Gemini,Label Visualization,Grid Visualization,Contrast Equalization,Triangle Visualization,Halo Visualization,Circle Visualization,Segment Anything 2 Model,Mask Visualization,OpenAI,MoonshotAI Kimi,Llama 3.2 Vision,Email Notification,Slack Notification,CLIP Embedding Model,Detections Stitch,Object Detection Model,Email Notification,Google Gemma API,Stability AI Outpainting,Google Vision OCR,Google Gemini,Image Preprocessing,Object Detection Model,OpenAI,Anthropic Claude,Time in Zone,Model Comparison Visualization,Roboflow Custom Metadata,YOLO-World Model,Perception Encoder Embedding Model,Instance Segmentation Model,VLM As Classifier,Detections List Roll-Up,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,Instance Segmentation Model,CogVLM,Crop Visualization,Florence-2 Model,Time in Zone,SAM 3,Local File Sink,Icon Visualization,JSON Parser,Keypoint Detection Model,Time in Zone,Reference Path Visualization,Anthropic Claude,Clip Comparison,VLM As Detector,LMM,Pixel Color Count,Classification Label Visualization,Image Blur,SAM 3,OpenAI,Corner Visualization,Keypoint Detection Model,Dynamic Crop,Keypoint Visualization,Moondream2,QR Code Generator,LMM For Classification,Morphological Transformation,Keypoint Detection Model,Background Color Visualization,PTZ Tracking (ONVIF),Stitch OCR Detections,Cache Get
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v3 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string,ROBOFLOW_MANAGED_KEY]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v3
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v3",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-5",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v2¶
Class: OpenAIBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v2.OpenAIBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT models with vision capabilities (including GPT-4o and GPT-5).
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering) - Model answers the question you submit in the prompt -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Single-Label Classification (
classification) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification) - Model classifies the image content as one or more of the provided classes -
Structured Output Generation (
structured-answering) - Model returns a JSON response with the specified fields
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v2.
- inputs:
Roboflow Dataset Upload,Line Counter Visualization,OCR Model,Image Slicer,Gaze Detection,Instance Segmentation Model,Color Visualization,Ellipse Visualization,Polygon Visualization,Relative Static Crop,Webhook Sink,Trace Visualization,Stitch OCR Detections,Camera Focus,Qwen 3.5 API,OpenAI,Buffer,Size Measurement,Image Threshold,Heatmap Visualization,Florence-2 Model,Halo Visualization,GLM-OCR,Dot Visualization,S3 Sink,Twilio SMS Notification,Model Monitoring Inference Aggregator,Google Gemini,Roboflow Dataset Upload,Dynamic Zone,Clip Comparison,Pixelate Visualization,Twilio SMS/MMS Notification,Polygon Zone Visualization,Motion Detection,Blur Visualization,Background Subtraction,Text Display,CSV Formatter,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Bounding Box Visualization,Depth Estimation,Stability AI Inpainting,Polygon Visualization,SIFT,Roboflow Vision Events,Google Gemini,Label Visualization,Grid Visualization,Qwen3.5-VL,Contrast Equalization,Triangle Visualization,Halo Visualization,Circle Visualization,Mask Visualization,OpenAI,MoonshotAI Kimi,Llama 3.2 Vision,Email Notification,Slack Notification,Object Detection Model,Stability AI Outpainting,Email Notification,Google Gemma API,Google Vision OCR,Image Preprocessing,Google Gemini,EasyOCR,Cosine Similarity,OpenAI,Anthropic Claude,Model Comparison Visualization,Roboflow Custom Metadata,Single-Label Classification Model,VLM As Classifier,Detections List Roll-Up,Stitch Images,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,CogVLM,Crop Visualization,Camera Calibration,Florence-2 Model,Icon Visualization,Local File Sink,Image Contours,Reference Path Visualization,Dimension Collapse,Anthropic Claude,Clip Comparison,VLM As Detector,LMM,Identify Changes,Classification Label Visualization,Image Slicer,Absolute Static Crop,Image Blur,Multi-Label Classification Model,Image Convert Grayscale,OpenAI,Corner Visualization,Dynamic Crop,Keypoint Visualization,QR Code Generator,Camera Focus,LMM For Classification,Morphological Transformation,Keypoint Detection Model,Contrast Enhancement,Background Color Visualization,Stitch OCR Detections - outputs:
Roboflow Dataset Upload,Line Counter Visualization,Distance Measurement,Instance Segmentation Model,Color Visualization,Multi-Label Classification Model,Ellipse Visualization,Polygon Visualization,Single-Label Classification Model,Detections Consensus,Detections Classes Replacement,Cache Set,Webhook Sink,Trace Visualization,Stitch OCR Detections,Qwen 3.5 API,Object Detection Model,OpenAI,Buffer,SAM 3,Size Measurement,Image Threshold,Heatmap Visualization,Florence-2 Model,Halo Visualization,Path Deviation,GLM-OCR,Dot Visualization,S3 Sink,Path Deviation,Semantic Segmentation Model,Twilio SMS Notification,Seg Preview,Model Monitoring Inference Aggregator,Google Gemini,Roboflow Dataset Upload,Clip Comparison,VLM As Classifier,Line Counter,Twilio SMS/MMS Notification,Polygon Zone Visualization,Motion Detection,Text Display,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Line Counter,Bounding Box Visualization,Depth Estimation,Stability AI Inpainting,Polygon Visualization,Roboflow Vision Events,VLM As Detector,Google Gemini,Label Visualization,Grid Visualization,Contrast Equalization,Triangle Visualization,Halo Visualization,Circle Visualization,Segment Anything 2 Model,Mask Visualization,OpenAI,MoonshotAI Kimi,Llama 3.2 Vision,Email Notification,Slack Notification,CLIP Embedding Model,Detections Stitch,Object Detection Model,Email Notification,Google Gemma API,Stability AI Outpainting,Google Vision OCR,Google Gemini,Image Preprocessing,Object Detection Model,OpenAI,Anthropic Claude,Time in Zone,Model Comparison Visualization,Roboflow Custom Metadata,YOLO-World Model,Perception Encoder Embedding Model,Instance Segmentation Model,VLM As Classifier,Detections List Roll-Up,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,Instance Segmentation Model,CogVLM,Crop Visualization,Florence-2 Model,Time in Zone,SAM 3,Local File Sink,Icon Visualization,JSON Parser,Keypoint Detection Model,Time in Zone,Reference Path Visualization,Anthropic Claude,Clip Comparison,VLM As Detector,LMM,Pixel Color Count,Classification Label Visualization,Image Blur,SAM 3,OpenAI,Corner Visualization,Keypoint Detection Model,Dynamic Crop,Keypoint Visualization,Moondream2,QR Code Generator,LMM For Classification,Morphological Transformation,Keypoint Detection Model,Background Color Visualization,PTZ Tracking (ONVIF),Stitch OCR Detections,Cache Get
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v2 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.classes(list_of_values): List of classes to be used.api_key(Union[secret,string]): Your OpenAI API key.model_version(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature(float): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.classes(list_of_values): List of values of any type.
Example JSON definition of step OpenAI in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v2",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-4o",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
v1¶
Class: OpenAIBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.openai.v1.OpenAIBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts to the OpenAIBlock.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/open_ai@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
openai_api_key |
str |
Your OpenAI API key. | ✅ |
openai_model |
str |
Model to be used. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to OpenAI in version v1.
- inputs:
Roboflow Dataset Upload,Line Counter Visualization,OCR Model,Image Slicer,Instance Segmentation Model,Color Visualization,Ellipse Visualization,Polygon Visualization,Relative Static Crop,Webhook Sink,Trace Visualization,Stitch OCR Detections,Camera Focus,Qwen 3.5 API,OpenAI,Image Threshold,Heatmap Visualization,Florence-2 Model,Halo Visualization,GLM-OCR,Dot Visualization,S3 Sink,Twilio SMS Notification,Model Monitoring Inference Aggregator,Google Gemini,Roboflow Dataset Upload,Pixelate Visualization,Twilio SMS/MMS Notification,Polygon Zone Visualization,Blur Visualization,Background Subtraction,Text Display,CSV Formatter,Stability AI Image Generation,Perspective Correction,Anthropic Claude,Bounding Box Visualization,Depth Estimation,Stability AI Inpainting,Polygon Visualization,SIFT,Roboflow Vision Events,Google Gemini,Label Visualization,Grid Visualization,Qwen3.5-VL,Contrast Equalization,Triangle Visualization,Halo Visualization,Circle Visualization,Mask Visualization,OpenAI,MoonshotAI Kimi,Llama 3.2 Vision,Email Notification,Slack Notification,Object Detection Model,Stability AI Outpainting,Email Notification,Google Gemma API,Google Vision OCR,Image Preprocessing,Google Gemini,EasyOCR,OpenAI,Anthropic Claude,Model Comparison Visualization,Roboflow Custom Metadata,Single-Label Classification Model,VLM As Classifier,Stitch Images,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,CogVLM,Crop Visualization,Camera Calibration,Florence-2 Model,Icon Visualization,Local File Sink,Image Contours,Reference Path Visualization,Anthropic Claude,Clip Comparison,VLM As Detector,LMM,Classification Label Visualization,Image Slicer,Absolute Static Crop,Image Blur,Multi-Label Classification Model,Image Convert Grayscale,OpenAI,Corner Visualization,Dynamic Crop,Keypoint Visualization,QR Code Generator,Camera Focus,LMM For Classification,Morphological Transformation,Keypoint Detection Model,Contrast Enhancement,Background Color Visualization,Stitch OCR Detections - outputs:
Gaze Detection,Image Slicer,Distance Measurement,Bounding Rectangle,Ellipse Visualization,ByteTrack Tracker,Relative Static Crop,Detections Classes Replacement,Barcode Detection,Trace Visualization,Qwen 3.5 API,Camera Focus,Buffer,SAM 3,Image Threshold,SORT Tracker,Florence-2 Model,Detections Transformation,Path Deviation,Semantic Segmentation Model,Twilio SMS Notification,Google Gemini,Roboflow Dataset Upload,Clip Comparison,VLM As Classifier,Line Counter,Background Subtraction,Detections Merge,Perspective Correction,Overlap Filter,Rate Limiter,SmolVLM2,SIFT,Roboflow Vision Events,Label Visualization,Expression,Grid Visualization,Per-Class Confidence Filter,Property Definition,Halo Visualization,Circle Visualization,Segment Anything 2 Model,MoonshotAI Kimi,Slack Notification,Detections Stabilizer,Object Detection Model,Stability AI Outpainting,Google Vision OCR,Image Preprocessing,Object Detection Model,Cosine Similarity,OpenAI,Detection Event Log,Byte Tracker,Anthropic Claude,Time in Zone,YOLO-World Model,Inner Workflow,Perception Encoder Embedding Model,Semantic Segmentation Model,Single-Label Classification Model,Detections List Roll-Up,Mask Area Measurement,Stitch Images,Instance Segmentation Model,CogVLM,Florence-2 Model,Camera Calibration,Multi-Label Classification Model,Time in Zone,SAM 3,Local File Sink,Icon Visualization,First Non Empty Or Default,Image Contours,JSON Parser,Time in Zone,Reference Path Visualization,Dimension Collapse,Anthropic Claude,VLM As Detector,LMM,Identify Changes,Multi-Label Classification Model,Absolute Static Crop,Image Convert Grayscale,OpenAI,Corner Visualization,Dynamic Crop,Keypoint Visualization,QR Code Generator,LMM For Classification,Morphological Transformation,Contrast Enhancement,Background Color Visualization,PTZ Tracking (ONVIF),Roboflow Dataset Upload,Line Counter Visualization,Mask Edge Snap,OCR Model,Qwen2.5-VL,Instance Segmentation Model,Color Visualization,Multi-Label Classification Model,Data Aggregator,Polygon Visualization,Single-Label Classification Model,Byte Tracker,Detections Consensus,Cache Set,Webhook Sink,Continue If,Stitch OCR Detections,Object Detection Model,OpenAI,Size Measurement,Heatmap Visualization,Halo Visualization,Path Deviation,GLM-OCR,Dot Visualization,S3 Sink,Seg Preview,Model Monitoring Inference Aggregator,Dynamic Zone,Pixelate Visualization,Twilio SMS/MMS Notification,Polygon Zone Visualization,Motion Detection,Blur Visualization,Text Display,CSV Formatter,Stability AI Image Generation,Anthropic Claude,Line Counter,Bounding Box Visualization,Velocity,Depth Estimation,Stability AI Inpainting,Polygon Visualization,VLM As Detector,Google Gemini,Qwen3.5-VL,Contrast Equalization,Triangle Visualization,Mask Visualization,Dominant Color,OpenAI,Llama 3.2 Vision,Email Notification,CLIP Embedding Model,Detections Stitch,Email Notification,Google Gemma API,Identify Outliers,Google Gemini,EasyOCR,Detections Combine,SAM2 Video Tracker,Qwen3-VL,Model Comparison Visualization,Roboflow Custom Metadata,Detection Offset,Instance Segmentation Model,VLM As Classifier,Template Matching,Qwen 3.6 API,SIFT Comparison,Morphological Transformation,Crop Visualization,OC-SORT Tracker,QR Code Detection,Detections Filter,Keypoint Detection Model,Clip Comparison,Pixel Color Count,Classification Label Visualization,Image Slicer,Image Blur,Byte Tracker,SAM 3,Single-Label Classification Model,Delta Filter,Keypoint Detection Model,Moondream2,SIFT Comparison,Camera Focus,Keypoint Detection Model,Stitch OCR Detections,Cache Get
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
OpenAI in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the OpenAI model.openai_api_key(Union[secret,string]): Your OpenAI API key.openai_model(string): Model to be used.image_detail(string): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..
-
output
parent_id(parent_id): Identifier of parent for step output.root_parent_id(parent_id): Identifier of parent for step output.image(image_metadata): Dictionary with image metadata required by supervision.structured_output(dictionary): Dictionary.raw_output(string): String value.*(*): Equivalent of any element.
Example JSON definition of step OpenAI in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"openai_api_key": "xxx-xxx",
"openai_model": "gpt-4o",
"json_output_format": {
"count": "number of cats in the picture"
},
"image_detail": "auto",
"max_tokens": 450
}