Google Gemini¶
Class: GoogleGeminiBlockV1
Source: inference.core.workflows.core_steps.models.foundation.google_gemini.v1.GoogleGeminiBlockV1
Ask a question to Google's Gemini model with vision capabilities.
You can specify arbitrary text prompts or predefined ones, the block supports the following types of prompt:
-
Open Prompt (
unconstrained
) - Use any prompt to generate a raw response -
Text Recognition (OCR) (
ocr
) - Model recognizes text in the image -
Visual Question Answering (
visual-question-answering
) - Model answers the question you submit in the prompt -
Captioning (short) (
caption
) - Model provides a short description of the image -
Captioning (
detailed-caption
) - Model provides a long description of the image -
Single-Label Classification (
classification
) - Model classifies the image content as one of the provided classes -
Multi-Label Classification (
multi-label-classification
) - Model classifies the image content as one or more of the provided classes -
Unprompted Object Detection (
object-detection
) - Model detects and returns the bounding boxes for prominent objects in the image -
Structured Output Generation (
structured-answering
) - Model returns a JSON response with the specified fields
You need to provide your Google AI API key to use the Gemini model.
WARNING!
This block makes use of /v1beta
API of Google Gemini model - the implementation may change
in the future, without guarantee of backward compatibility.
Type identifier¶
Use the following identifier in step "type"
field: roboflow_core/google_gemini@v1
to add the block as
as step in your workflow.
Properties¶
Name | Type | Description | Refs |
---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the Gemini model. | ✅ |
output_structure |
Dict[str, str] |
Dictionary with structure of expected JSON response. | ❌ |
classes |
List[str] |
List of classes to be used. | ✅ |
api_key |
str |
Your Google AI API key. | ✅ |
model_version |
str |
Model to be used. | ✅ |
max_tokens |
int |
Maximum number of tokens the model can generate in it's response.. | ❌ |
temperature |
float |
Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are.. | ✅ |
max_concurrent_requests |
int |
Number of concurrent requests that can be executed by block when batch of input images provided. If not given - block defaults to value configured globally in Workflows Execution Engine. Please restrict if you hit Google Gemini API limits.. | ❌ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow
runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Google Gemini
in version v1
.
- inputs:
Stability AI Outpainting
,Size Measurement
,CogVLM
,Image Slicer
,Pixelate Visualization
,Webhook Sink
,Clip Comparison
,Image Threshold
,Blur Visualization
,VLM as Classifier
,Twilio SMS Notification
,Image Slicer
,Image Blur
,Camera Calibration
,Morphological Transformation
,OpenAI
,Polygon Visualization
,Dot Visualization
,Florence-2 Model
,Single-Label Classification Model
,Identify Changes
,Background Color Visualization
,VLM as Detector
,Stability AI Image Generation
,Multi-Label Classification Model
,Roboflow Dataset Upload
,Corner Visualization
,Halo Visualization
,CSV Formatter
,Mask Visualization
,Trace Visualization
,OCR Model
,Model Monitoring Inference Aggregator
,Camera Focus
,Color Visualization
,Clip Comparison
,LMM
,Cosine Similarity
,Ellipse Visualization
,Model Comparison Visualization
,Triangle Visualization
,Image Preprocessing
,Anthropic Claude
,Roboflow Custom Metadata
,SIFT
,Depth Estimation
,Email Notification
,Gaze Detection
,Dynamic Crop
,Line Counter Visualization
,Crop Visualization
,Image Contours
,Dimension Collapse
,Google Gemini
,Grid Visualization
,Contrast Equalization
,Instance Segmentation Model
,Google Vision OCR
,OpenAI
,SIFT Comparison
,Classification Label Visualization
,Roboflow Dataset Upload
,Stitch Images
,Keypoint Visualization
,Absolute Static Crop
,OpenAI
,Perspective Correction
,Polygon Zone Visualization
,Keypoint Detection Model
,Llama 3.2 Vision
,Dynamic Zone
,Stitch OCR Detections
,Image Convert Grayscale
,QR Code Generator
,Local File Sink
,Circle Visualization
,Slack Notification
,Icon Visualization
,Object Detection Model
,Stability AI Inpainting
,Florence-2 Model
,Bounding Box Visualization
,LMM For Classification
,EasyOCR
,Label Visualization
,Reference Path Visualization
,Relative Static Crop
,Buffer
- outputs:
Stability AI Outpainting
,VLM as Classifier
,Size Measurement
,CogVLM
,Clip Comparison
,Webhook Sink
,Image Threshold
,Cache Get
,VLM as Classifier
,Twilio SMS Notification
,Moondream2
,Image Blur
,Detections Consensus
,Morphological Transformation
,OpenAI
,Polygon Visualization
,Dot Visualization
,Florence-2 Model
,Time in Zone
,PTZ Tracking (ONVIF)
.md),Background Color Visualization
,Distance Measurement
,VLM as Detector
,Stability AI Image Generation
,Roboflow Dataset Upload
,Corner Visualization
,Line Counter
,Cache Set
,Halo Visualization
,Mask Visualization
,Detections Classes Replacement
,Trace Visualization
,Model Monitoring Inference Aggregator
,Color Visualization
,Clip Comparison
,LMM
,Detections Stitch
,Instance Segmentation Model
,Ellipse Visualization
,Model Comparison Visualization
,Object Detection Model
,Time in Zone
,Triangle Visualization
,Image Preprocessing
,Anthropic Claude
,Roboflow Custom Metadata
,Keypoint Detection Model
,Email Notification
,Line Counter Visualization
,Dynamic Crop
,VLM as Detector
,Crop Visualization
,Segment Anything 2 Model
,Contrast Equalization
,Perception Encoder Embedding Model
,Grid Visualization
,Instance Segmentation Model
,Google Vision OCR
,Time in Zone
,OpenAI
,SIFT Comparison
,Roboflow Dataset Upload
,CLIP Embedding Model
,Classification Label Visualization
,Keypoint Visualization
,OpenAI
,Perspective Correction
,Pixel Color Count
,Path Deviation
,Polygon Zone Visualization
,Keypoint Detection Model
,Llama 3.2 Vision
,Stitch OCR Detections
,YOLO-World Model
,QR Code Generator
,Local File Sink
,Slack Notification
,Circle Visualization
,Icon Visualization
,Object Detection Model
,Stability AI Inpainting
,Florence-2 Model
,Bounding Box Visualization
,LMM For Classification
,Path Deviation
,JSON Parser
,Label Visualization
,Reference Path Visualization
,Google Gemini
,Line Counter
,Buffer
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Google Gemini
in version v1
has.
Bindings
-
input
images
(image
): The image to infer on..prompt
(string
): Text prompt to the Gemini model.classes
(list_of_values
): List of classes to be used.api_key
(Union[secret
,string
]): Your Google AI API key.model_version
(string
): Model to be used.temperature
(float
): Temperature to sample from the model - value in range 0.0-2.0, the higher - the more random / "creative" the generations are..
-
output
output
(Union[string
,language_model_output
]): String value ifstring
or LLM / VLM output iflanguage_model_output
.classes
(list_of_values
): List of values of any type.
Example JSON definition of step Google Gemini
in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/google_gemini@v1",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"output_structure": {
"my_key": "description"
},
"classes": [
"class-a",
"class-b"
],
"api_key": "xxx-xxx",
"model_version": "gemini-2.5-pro",
"max_tokens": "<block_does_not_provide_example>",
"temperature": "<block_does_not_provide_example>",
"max_concurrent_requests": "<block_does_not_provide_example>"
}