OpenAI¶
Version v2
¶
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
unconstrained
- any arbitrary prompt you like -
ocr
- predefined prompt to recognise text from image -
visual-question-answering
- your prompt is supposed to provide question and will be wrapped into structure that is suited for VQA task -
caption
- predefined prompt to generate short caption of the image -
detailed-caption
- predefined prompt to generate elaborated caption of the image -
classification
- predefined prompt to generate multi-class classification output (that can be parsed withVLM as Classifier
block) -
multi-label-classification
- predefined prompt to generate multi-label classification output (that can be parsed withVLM as Classifier
block) -
structured-answering
- your input defines expected JSON output fields that can be parsed withJSON Parser
block.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/open_ai@v2
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
The unique name of this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value of parameter determine set of fields that are required. For unconstrained , visual-question-answering , - prompt parameter must be provided.For structured-answering - output-structure must be provided. For classification , multi-label-classification - classes must be filled. ocr , caption , detailed-caption do notrequire any additional parameter.. |
❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your OpenAI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit OpenAI limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Check what blocks you can connect to OpenAI
in version v2
.
- inputs:
Label Visualization
,Crop Visualization
,Mask Visualization
,Blur Visualization
,Image Contours
,Bounding Box Visualization
,Image Convert Grayscale
,Camera Focus
,Dot Visualization
,Color Visualization
,Corner Visualization
,Circle Visualization
,Perspective Correction
,Image Slicer
,Triangle Visualization
,Relative Static Crop
,Absolute Static Crop
,Halo Visualization
,Background Color Visualization
,SIFT
,Pixelate Visualization
,Polygon Visualization
,Dynamic Crop
,Image Blur
,Ellipse Visualization
,Image Threshold
- outputs:
JSON Parser
,VLM as Detector
,Roboflow Custom Metadata
,VLM as Classifier
,Perspective Correction
The available connections depend on its binding kinds. Check what binding kinds
OpenAI
in version v2
has.
Bindings
-
input
images
(image
): The image to infer on.prompt
(string
): Text prompt to the OpenAI model.classes
(list_of_values
): List of classes to be used.api_key
(string
): Your OpenAI API key.model_version
(string
): Model to be used.image_detail
(string
): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..temperature
(float
): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output
(Union[string
,language_model_output
]): String value ifstring
or LLM / VLM output iflanguage_model_output
.classes
(list_of_values
): List of values of any type.
Example JSON definition of step OpenAI
in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v2",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gpt-4o",
"image_detail": "auto",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}
Version v1
¶
Ask a question to OpenAI's GPT-4 with Vision model.
You can specify arbitrary text prompts to the OpenAIBlock.
You need to provide your OpenAI API key to use the GPT-4 with Vision model.
This model was previously part of the LMM block.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/open_ai@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
The unique name of this step.. | ❌ |
prompt |
str |
Text prompt to the OpenAI model. | ✅ |
openai_api_key |
str |
Your OpenAI API key. | ✅ |
openai_model |
str |
Model to be used. | ✅ |
json_output_format |
Dict[str, str] |
Holds dictionary that maps name of requested output field into its description. | ❌ |
image_detail |
str |
Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity.. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Check what blocks you can connect to OpenAI
in version v1
.
- inputs:
Label Visualization
,Crop Visualization
,Mask Visualization
,Blur Visualization
,Image Contours
,Bounding Box Visualization
,Image Convert Grayscale
,Camera Focus
,Dot Visualization
,Color Visualization
,Corner Visualization
,Circle Visualization
,Perspective Correction
,Image Slicer
,Triangle Visualization
,Relative Static Crop
,Absolute Static Crop
,Halo Visualization
,Background Color Visualization
,SIFT
,Pixelate Visualization
,Polygon Visualization
,Dynamic Crop
,Image Blur
,Ellipse Visualization
,Image Threshold
- outputs:
Template Matching
,Segment Anything 2 Model
,Blur Visualization
,Image Convert Grayscale
,VLM as Detector
,Circle Visualization
,CogVLM
,Absolute Static Crop
,LMM For Classification
,Background Color Visualization
,YOLO-World Model
,Detections Transformation
,Polygon Visualization
,Image Threshold
,OCR Model
,Detections Classes Replacement
,Image Contours
,Bounding Box Visualization
,Object Detection Model
,Roboflow Custom Metadata
,Dimension Collapse
,Clip Comparison
,First Non Empty Or Default
,Detections Filter
,Instance Segmentation Model
,Halo Visualization
,SIFT
,Roboflow Dataset Upload
,Google Gemini
,Dynamic Crop
,Image Slicer
,Detection Offset
,Roboflow Dataset Upload
,Detections Stitch
,Multi-Label Classification Model
,Barcode Detection
,Camera Focus
,Dot Visualization
,Corner Visualization
,OpenAI
,Triangle Visualization
,Relative Static Crop
,Pixel Color Count
,Pixelate Visualization
,JSON Parser
,Dynamic Zone
,Detections Consensus
,Label Visualization
,Crop Visualization
,Mask Visualization
,LMM
,Property Definition
,QR Code Detection
,Color Visualization
,VLM as Classifier
,Perspective Correction
,OpenAI
,Keypoint Detection Model
,Clip Comparison
,Anthropic Claude
,Dominant Color
,Continue If
,Expression
,SIFT Comparison
,Image Blur
,Ellipse Visualization
,Single-Label Classification Model
The available connections depend on its binding kinds. Check what binding kinds
OpenAI
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on.prompt
(string
): Text prompt to the OpenAI model.openai_api_key
(string
): Your OpenAI API key.openai_model
(string
): Model to be used.image_detail
(string
): Indicates the image's quality, with 'high' suggesting it is of high resolution and should be processed or displayed with high fidelity..
-
output
parent_id
(parent_id
): Identifier of parent for step output.root_parent_id
(parent_id
): Identifier of parent for step output.image
(image_metadata
): Dictionary with image metadata required by supervision.structured_output
(dictionary
): Dictionary.raw_output
(string
): String value.*
(*
): Equivalent of any element.
Example JSON definition of step OpenAI
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/open_ai@v1",
"images": "$inputs.image",
"prompt": "my prompt",
"openai_api_key": "xxx-xxx",
"openai_model": "gpt-4o",
"json_output_format": {
"count": "number of cats in the picture"
},
"image_detail": "auto",
"max_tokens": 450
}