Florence-2 Model¶
v2¶
Class: Florence2BlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.florence2.v2.Florence2BlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Dedicated inference server required (GPU recommended) - you may want to use dedicated deployment
This Workflow block introduces Florence 2, a Visual Language Model (VLM) capable of performing a wide range of tasks, including:
-
Object Detection
-
Instance Segmentation
-
Image Captioning
-
Optical Character Recognition (OCR)
-
and more...
Below is a comprehensive list of tasks supported by the model, along with descriptions on how to utilize their outputs within the Workflows ecosystem:
Task Descriptions:
-
Custom Prompt (
custom) - Use free-form prompt to generate a response. Useful with finetuned models. -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Text Detection & Recognition (OCR) (
ocr-with-text-detection) - Model detects text regions in the image, and then performs OCR on each detected region -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Captioning (long) (
more-detailed-caption) - Model provides a very long description of the image -
Unprompted Object Detection (
object-detection) - Model detects and returns the bounding boxes for prominent objects in the image -
Object Detection (
open-vocabulary-object-detection) - Model detects and returns the bounding boxes for the provided classes -
Detection & Captioning (
object-detection-and-caption) - Model detects prominent objects and captions them -
Prompted Object Detection (
phrase-grounded-object-detection) - Based on the textual prompt, model detects objects matching the descriptions -
Prompted Instance Segmentation (
phrase-grounded-instance-segmentation) - Based on the textual prompt, model segments objects matching the descriptions -
Segment Bounding Box (
detection-grounded-instance-segmentation) - Model segments the object in the provided bounding box into a polygon -
Classification of Bounding Box (
detection-grounded-classification) - Model classifies the object inside the provided bounding box -
Captioning of Bounding Box (
detection-grounded-caption) - Model captions the object in the provided bounding box -
Text Recognition (OCR) for Bounding Box (
detection-grounded-ocr) - Model performs OCR on the text inside the provided bounding box -
Regions of Interest proposal (
region-proposal) - Model proposes Regions of Interest (Bounding Boxes) in the image
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/florence_2@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the Florence-2 model. | ✅ |
classes |
List[str] |
List of classes to be used. | ✅ |
grounding_detection |
Optional[List[float], List[int]] |
Detection to ground Florence-2 model. May be statically provided bounding box [left_top_x, left_top_y, right_bottom_x, right_bottom_y] or result of object-detection model. If the latter is true, one box will be selected based on grounding_selection_mode.. |
✅ |
grounding_selection_mode |
str |
. | ❌ |
model_id |
str |
Model to be used. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Florence-2 Model in version v2.
- inputs:
Detections Consensus,Llama 3.2 Vision,Path Deviation,Blur Visualization,SAM 3,Dimension Collapse,Perspective Correction,Polygon Zone Visualization,Bounding Box Visualization,QR Code Generator,Pixelate Visualization,Trace Visualization,Roboflow Custom Metadata,Velocity,Detections Transformation,Segment Anything 2 Model,Image Threshold,Polygon Visualization,Dynamic Crop,Icon Visualization,Image Slicer,Stability AI Outpainting,Model Comparison Visualization,Dynamic Zone,Clip Comparison,LMM,OpenAI,Classification Label Visualization,Stitch Images,Florence-2 Model,Mask Visualization,Single-Label Classification Model,Size Measurement,Relative Static Crop,Absolute Static Crop,SIFT Comparison,SAM 3,Time in Zone,Moondream2,Google Gemini,Circle Visualization,Florence-2 Model,LMM For Classification,Ellipse Visualization,Image Convert Grayscale,Time in Zone,Object Detection Model,OCR Model,Image Preprocessing,Color Visualization,Image Blur,Stability AI Image Generation,Anthropic Claude,Google Vision OCR,Keypoint Visualization,Camera Calibration,Local File Sink,VLM as Detector,EasyOCR,Image Slicer,Email Notification,VLM as Detector,Keypoint Detection Model,Line Counter,Detections Combine,Detections Filter,Roboflow Dataset Upload,Byte Tracker,Background Color Visualization,Triangle Visualization,Slack Notification,Keypoint Detection Model,Overlap Filter,Gaze Detection,Halo Visualization,Object Detection Model,Corner Visualization,Google Gemini,Detections Stabilizer,Model Monitoring Inference Aggregator,Roboflow Dataset Upload,Dot Visualization,Image Contours,Multi-Label Classification Model,Twilio SMS Notification,Detections Merge,Byte Tracker,Instance Segmentation Model,VLM as Classifier,Seg Preview,CSV Formatter,Reference Path Visualization,Morphological Transformation,Motion Detection,OpenAI,Byte Tracker,Webhook Sink,PTZ Tracking (ONVIF).md),Detections Classes Replacement,Instance Segmentation Model,Detections Stitch,Contrast Equalization,Camera Focus,YOLO-World Model,Stitch OCR Detections,Stability AI Inpainting,CogVLM,Clip Comparison,Line Counter Visualization,Template Matching,Path Deviation,Email Notification,Crop Visualization,Grid Visualization,OpenAI,Buffer,Bounding Rectangle,SIFT,Depth Estimation,Background Subtraction,Label Visualization,Anthropic Claude,SAM 3,Time in Zone,Detection Offset,OpenAI - outputs:
Llama 3.2 Vision,SAM 3,Polygon Zone Visualization,Distance Measurement,Trace Visualization,Roboflow Custom Metadata,Image Threshold,Icon Visualization,Stability AI Outpainting,Model Comparison Visualization,Clip Comparison,Cache Get,Size Measurement,Florence-2 Model,SAM 3,SIFT Comparison,Moondream2,Florence-2 Model,LMM For Classification,Anthropic Claude,Image Blur,Stability AI Image Generation,Local File Sink,VLM as Detector,Keypoint Detection Model,Background Color Visualization,Keypoint Detection Model,Google Gemini,Model Monitoring Inference Aggregator,Roboflow Dataset Upload,Instance Segmentation Model,VLM as Classifier,Morphological Transformation,Motion Detection,OpenAI,YOLO-World Model,JSON Parser,Clip Comparison,CogVLM,Path Deviation,CLIP Embedding Model,Crop Visualization,Grid Visualization,Buffer,SAM 3,Anthropic Claude,Time in Zone,OpenAI,Line Counter,Path Deviation,Detections Consensus,Perception Encoder Embedding Model,Perspective Correction,Bounding Box Visualization,QR Code Generator,Segment Anything 2 Model,Polygon Visualization,Dynamic Crop,LMM,OpenAI,Classification Label Visualization,Mask Visualization,Time in Zone,Google Gemini,Circle Visualization,Time in Zone,Ellipse Visualization,Object Detection Model,Image Preprocessing,Color Visualization,Google Vision OCR,Keypoint Visualization,Line Counter,Email Notification,VLM as Detector,Roboflow Dataset Upload,Triangle Visualization,Slack Notification,Halo Visualization,Object Detection Model,Corner Visualization,Dot Visualization,Twilio SMS Notification,Seg Preview,Reference Path Visualization,Webhook Sink,PTZ Tracking (ONVIF).md),Detections Classes Replacement,Instance Segmentation Model,Detections Stitch,Contrast Equalization,Stitch OCR Detections,Stability AI Inpainting,Line Counter Visualization,Cache Set,Email Notification,VLM as Classifier,OpenAI,Label Visualization,Pixel Color Count
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Florence-2 Model in version v2 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the Florence-2 model.classes(list_of_values): List of classes to be used.grounding_detection(Union[list_of_values,keypoint_detection_prediction,object_detection_prediction,instance_segmentation_prediction]): Detection to ground Florence-2 model. May be statically provided bounding box[left_top_x, left_top_y, right_bottom_x, right_bottom_y]or result of object-detection model. If the latter is true, one box will be selected based ongrounding_selection_mode..model_id(roboflow_model_id): Model to be used.
-
output
raw_output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.parsed_output(dictionary): Dictionary.classes(list_of_values): List of values of any type.
Example JSON definition of step Florence-2 Model in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/florence_2@v2",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"classes": [
"class-a",
"class-b"
],
"grounding_detection": "$steps.detection.predictions",
"grounding_selection_mode": "first",
"model_id": "florence-2-base"
}
v1¶
Class: Florence2BlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.models.foundation.florence2.v1.Florence2BlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Dedicated inference server required (GPU recommended) - you may want to use dedicated deployment
This Workflow block introduces Florence 2, a Visual Language Model (VLM) capable of performing a wide range of tasks, including:
-
Object Detection
-
Instance Segmentation
-
Image Captioning
-
Optical Character Recognition (OCR)
-
and more...
Below is a comprehensive list of tasks supported by the model, along with descriptions on how to utilize their outputs within the Workflows ecosystem:
Task Descriptions:
-
Custom Prompt (
custom) - Use free-form prompt to generate a response. Useful with finetuned models. -
Text Recognition (OCR) (
ocr) - Model recognizes text in the image -
Text Detection & Recognition (OCR) (
ocr-with-text-detection) - Model detects text regions in the image, and then performs OCR on each detected region -
Captioning (short) (
caption) - Model provides a short description of the image -
Captioning (
detailed-caption) - Model provides a long description of the image -
Captioning (long) (
more-detailed-caption) - Model provides a very long description of the image -
Unprompted Object Detection (
object-detection) - Model detects and returns the bounding boxes for prominent objects in the image -
Object Detection (
open-vocabulary-object-detection) - Model detects and returns the bounding boxes for the provided classes -
Detection & Captioning (
object-detection-and-caption) - Model detects prominent objects and captions them -
Prompted Object Detection (
phrase-grounded-object-detection) - Based on the textual prompt, model detects objects matching the descriptions -
Prompted Instance Segmentation (
phrase-grounded-instance-segmentation) - Based on the textual prompt, model segments objects matching the descriptions -
Segment Bounding Box (
detection-grounded-instance-segmentation) - Model segments the object in the provided bounding box into a polygon -
Classification of Bounding Box (
detection-grounded-classification) - Model classifies the object inside the provided bounding box -
Captioning of Bounding Box (
detection-grounded-caption) - Model captions the object in the provided bounding box -
Text Recognition (OCR) for Bounding Box (
detection-grounded-ocr) - Model performs OCR on the text inside the provided bounding box -
Regions of Interest proposal (
region-proposal) - Model proposes Regions of Interest (Bounding Boxes) in the image
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/florence_2@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
task_type |
str |
Task type to be performed by model. Value determines required parameters and output response.. | ❌ |
prompt |
str |
Text prompt to the Florence-2 model. | ✅ |
classes |
List[str] |
List of classes to be used. | ✅ |
grounding_detection |
Optional[List[float], List[int]] |
Detection to ground Florence-2 model. May be statically provided bounding box [left_top_x, left_top_y, right_bottom_x, right_bottom_y] or result of object-detection model. If the latter is true, one box will be selected based on grounding_selection_mode.. |
✅ |
grounding_selection_mode |
str |
. | ❌ |
model_version |
str |
Model to be used. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Florence-2 Model in version v1.
- inputs:
Llama 3.2 Vision,SAM 3,Polygon Zone Visualization,Pixelate Visualization,Trace Visualization,Roboflow Custom Metadata,Detections Transformation,Image Threshold,Image Slicer,Icon Visualization,Stability AI Outpainting,Model Comparison Visualization,Dynamic Zone,Clip Comparison,Stitch Images,Florence-2 Model,Size Measurement,Single-Label Classification Model,Relative Static Crop,Absolute Static Crop,SIFT Comparison,SAM 3,Moondream2,Florence-2 Model,LMM For Classification,Anthropic Claude,Image Blur,Stability AI Image Generation,Camera Calibration,Local File Sink,VLM as Detector,Keypoint Detection Model,Detections Filter,Gaze Detection,Background Color Visualization,Keypoint Detection Model,Google Gemini,Model Monitoring Inference Aggregator,Roboflow Dataset Upload,Byte Tracker,Instance Segmentation Model,VLM as Classifier,CSV Formatter,Morphological Transformation,Motion Detection,OpenAI,YOLO-World Model,CogVLM,Clip Comparison,Path Deviation,Crop Visualization,Grid Visualization,Buffer,Bounding Rectangle,SIFT,SAM 3,Anthropic Claude,Time in Zone,Detection Offset,OpenAI,Detections Consensus,Path Deviation,Blur Visualization,Dimension Collapse,Perspective Correction,Bounding Box Visualization,QR Code Generator,Velocity,Segment Anything 2 Model,Polygon Visualization,Dynamic Crop,LMM,OpenAI,Classification Label Visualization,Mask Visualization,Time in Zone,Google Gemini,Circle Visualization,Ellipse Visualization,Image Convert Grayscale,Time in Zone,Object Detection Model,OCR Model,Image Preprocessing,Color Visualization,Google Vision OCR,Keypoint Visualization,EasyOCR,Image Slicer,Email Notification,VLM as Detector,Line Counter,Detections Combine,Byte Tracker,Roboflow Dataset Upload,Overlap Filter,Triangle Visualization,Slack Notification,Halo Visualization,Object Detection Model,Corner Visualization,Detections Stabilizer,Dot Visualization,Image Contours,Multi-Label Classification Model,Twilio SMS Notification,Detections Merge,Seg Preview,Reference Path Visualization,Byte Tracker,Webhook Sink,PTZ Tracking (ONVIF).md),Detections Classes Replacement,Instance Segmentation Model,Detections Stitch,Contrast Equalization,Camera Focus,Stitch OCR Detections,Stability AI Inpainting,Line Counter Visualization,Template Matching,Email Notification,OpenAI,Depth Estimation,Background Subtraction,Label Visualization - outputs:
Llama 3.2 Vision,SAM 3,Polygon Zone Visualization,Distance Measurement,Trace Visualization,Roboflow Custom Metadata,Image Threshold,Icon Visualization,Stability AI Outpainting,Model Comparison Visualization,Clip Comparison,Cache Get,Size Measurement,Florence-2 Model,SAM 3,SIFT Comparison,Moondream2,Florence-2 Model,LMM For Classification,Anthropic Claude,Image Blur,Stability AI Image Generation,Local File Sink,VLM as Detector,Keypoint Detection Model,Background Color Visualization,Keypoint Detection Model,Google Gemini,Model Monitoring Inference Aggregator,Roboflow Dataset Upload,Instance Segmentation Model,VLM as Classifier,Morphological Transformation,Motion Detection,OpenAI,YOLO-World Model,JSON Parser,Clip Comparison,CogVLM,Path Deviation,CLIP Embedding Model,Crop Visualization,Grid Visualization,Buffer,SAM 3,Anthropic Claude,Time in Zone,OpenAI,Line Counter,Path Deviation,Detections Consensus,Perception Encoder Embedding Model,Perspective Correction,Bounding Box Visualization,QR Code Generator,Segment Anything 2 Model,Polygon Visualization,Dynamic Crop,LMM,OpenAI,Classification Label Visualization,Mask Visualization,Time in Zone,Google Gemini,Circle Visualization,Time in Zone,Ellipse Visualization,Object Detection Model,Image Preprocessing,Color Visualization,Google Vision OCR,Keypoint Visualization,Line Counter,Email Notification,VLM as Detector,Roboflow Dataset Upload,Triangle Visualization,Slack Notification,Halo Visualization,Object Detection Model,Corner Visualization,Dot Visualization,Twilio SMS Notification,Seg Preview,Reference Path Visualization,Webhook Sink,PTZ Tracking (ONVIF).md),Detections Classes Replacement,Instance Segmentation Model,Detections Stitch,Contrast Equalization,Stitch OCR Detections,Stability AI Inpainting,Line Counter Visualization,Cache Set,Email Notification,VLM as Classifier,OpenAI,Label Visualization,Pixel Color Count
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Florence-2 Model in version v1 has.
Bindings
-
input
images(image): The image to infer on..prompt(string): Text prompt to the Florence-2 model.classes(list_of_values): List of classes to be used.grounding_detection(Union[list_of_values,keypoint_detection_prediction,object_detection_prediction,instance_segmentation_prediction]): Detection to ground Florence-2 model. May be statically provided bounding box[left_top_x, left_top_y, right_bottom_x, right_bottom_y]or result of object-detection model. If the latter is true, one box will be selected based ongrounding_selection_mode..model_version(string): Model to be used.
-
output
raw_output(Union[string,language_model_output]): String value ifstringor LLM / VLM output iflanguage_model_output.parsed_output(dictionary): Dictionary.classes(list_of_values): List of values of any type.
Example JSON definition of step Florence-2 Model in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/florence_2@v1",
"images": "$inputs.image",
"task_type": "<block_does_not_provide_example>",
"prompt": "my prompt",
"classes": [
"class-a",
"class-b"
],
"grounding_detection": "$steps.detection.predictions",
"grounding_selection_mode": "first",
"model_version": "florence-2-base"
}