Florence-2 Model¶

v2¶

Class: Florence2BlockV2 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.models.foundation.florence2.v2.Florence2BlockV2

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Dedicated inference server required (GPU recommended) - you may want to use dedicated deployment

This Workflow block introduces Florence 2, a Visual Language Model (VLM) capable of performing a wide range of tasks, including:

Object Detection
Instance Segmentation
Image Captioning
Optical Character Recognition (OCR)
and more...

Below is a comprehensive list of tasks supported by the model, along with descriptions on how to utilize their outputs within the Workflows ecosystem:

Task Descriptions:

Custom Prompt (custom) - Use free-form prompt to generate a response. Useful with finetuned models.
Text Recognition (OCR) (ocr) - Model recognizes text in the image
Text Detection & Recognition (OCR) (ocr-with-text-detection) - Model detects text regions in the image, and then performs OCR on each detected region
Captioning (short) (caption) - Model provides a short description of the image
Captioning (detailed-caption) - Model provides a long description of the image
Captioning (long) (more-detailed-caption) - Model provides a very long description of the image
Unprompted Object Detection (object-detection) - Model detects and returns the bounding boxes for prominent objects in the image
Object Detection (open-vocabulary-object-detection) - Model detects and returns the bounding boxes for the provided classes
Detection & Captioning (object-detection-and-caption) - Model detects prominent objects and captions them
Prompted Object Detection (phrase-grounded-object-detection) - Based on the textual prompt, model detects objects matching the descriptions
Prompted Instance Segmentation (phrase-grounded-instance-segmentation) - Based on the textual prompt, model segments objects matching the descriptions
Segment Bounding Box (detection-grounded-instance-segmentation) - Model segments the object in the provided bounding box into a polygon
Classification of Bounding Box (detection-grounded-classification) - Model classifies the object inside the provided bounding box
Captioning of Bounding Box (detection-grounded-caption) - Model captions the object in the provided bounding box
Text Recognition (OCR) for Bounding Box (detection-grounded-ocr) - Model performs OCR on the text inside the provided bounding box
Regions of Interest proposal (region-proposal) - Model proposes Regions of Interest (Bounding Boxes) in the image

Type identifier¶

Use the following identifier in step "type" field: roboflow_core/florence_2@v2to add the block as as step in your workflow.

Properties¶

Name	Type	Description	Refs
`name`	`str`	Enter a unique identifier for this step..	❌
`task_type`	`str`	Task type to be performed by model. Value determines required parameters and output response..	❌
`prompt`	`str`	Text prompt to the Florence-2 model.	✅
`classes`	`List[str]`	List of classes to be used.	✅
`grounding_detection`	`Optional[List[float], List[int]]`	Detection to ground Florence-2 model. May be statically provided bounding box `[left_top_x, left_top_y, right_bottom_x, right_bottom_y]` or result of object-detection model. If the latter is true, one box will be selected based on `grounding_selection_mode`..	✅
`grounding_selection_mode`	`str`	.	❌
`model_id`	`str`	Model to be used.	✅

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections¶

Compatible Blocks

Check what blocks you can connect to Florence-2 Model in version v2.

Input and Output Bindings¶

The available connections depend on its binding kinds. Check what binding kinds Florence-2 Model in version v2 has.

Bindings

input
- images (image): The image to infer on..
- prompt (string): Text prompt to the Florence-2 model.
- classes (list_of_values): List of classes to be used.
- grounding_detection (Union[keypoint_detection_prediction, list_of_values, object_detection_prediction, instance_segmentation_prediction]): Detection to ground Florence-2 model. May be statically provided bounding box [left_top_x, left_top_y, right_bottom_x, right_bottom_y] or result of object-detection model. If the latter is true, one box will be selected based on grounding_selection_mode..
- model_id (roboflow_model_id): Model to be used.
output
- raw_output (Union[string, language_model_output]): String value if string or LLM / VLM output if language_model_output.
- parsed_output (dictionary): Dictionary.
- classes (list_of_values): List of values of any type.

Example JSON definition of step Florence-2 Model in version v2

{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/florence_2@v2",
    "images": "$inputs.image",
    "task_type": "<block_does_not_provide_example>",
    "prompt": "my prompt",
    "classes": [
        "class-a",
        "class-b"
    ],
    "grounding_detection": "$steps.detection.predictions",
    "grounding_selection_mode": "first",
    "model_id": "florence-2-base"
}

v1¶

Class: Florence2BlockV1 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.models.foundation.florence2.v1.Florence2BlockV1

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Dedicated inference server required (GPU recommended) - you may want to use dedicated deployment

This Workflow block introduces Florence 2, a Visual Language Model (VLM) capable of performing a wide range of tasks, including:

Object Detection
Instance Segmentation
Image Captioning
Optical Character Recognition (OCR)
and more...

Below is a comprehensive list of tasks supported by the model, along with descriptions on how to utilize their outputs within the Workflows ecosystem:

Task Descriptions:

Custom Prompt (custom) - Use free-form prompt to generate a response. Useful with finetuned models.
Text Recognition (OCR) (ocr) - Model recognizes text in the image
Text Detection & Recognition (OCR) (ocr-with-text-detection) - Model detects text regions in the image, and then performs OCR on each detected region
Captioning (short) (caption) - Model provides a short description of the image
Captioning (detailed-caption) - Model provides a long description of the image
Captioning (long) (more-detailed-caption) - Model provides a very long description of the image
Unprompted Object Detection (object-detection) - Model detects and returns the bounding boxes for prominent objects in the image
Object Detection (open-vocabulary-object-detection) - Model detects and returns the bounding boxes for the provided classes
Detection & Captioning (object-detection-and-caption) - Model detects prominent objects and captions them
Prompted Object Detection (phrase-grounded-object-detection) - Based on the textual prompt, model detects objects matching the descriptions
Prompted Instance Segmentation (phrase-grounded-instance-segmentation) - Based on the textual prompt, model segments objects matching the descriptions
Segment Bounding Box (detection-grounded-instance-segmentation) - Model segments the object in the provided bounding box into a polygon
Classification of Bounding Box (detection-grounded-classification) - Model classifies the object inside the provided bounding box
Captioning of Bounding Box (detection-grounded-caption) - Model captions the object in the provided bounding box
Text Recognition (OCR) for Bounding Box (detection-grounded-ocr) - Model performs OCR on the text inside the provided bounding box
Regions of Interest proposal (region-proposal) - Model proposes Regions of Interest (Bounding Boxes) in the image

Type identifier¶

Use the following identifier in step "type" field: roboflow_core/florence_2@v1to add the block as as step in your workflow.

Properties¶

Name	Type	Description	Refs
`name`	`str`	Enter a unique identifier for this step..	❌
`task_type`	`str`	Task type to be performed by model. Value determines required parameters and output response..	❌
`prompt`	`str`	Text prompt to the Florence-2 model.	✅
`classes`	`List[str]`	List of classes to be used.	✅
`grounding_detection`	`Optional[List[float], List[int]]`	Detection to ground Florence-2 model. May be statically provided bounding box `[left_top_x, left_top_y, right_bottom_x, right_bottom_y]` or result of object-detection model. If the latter is true, one box will be selected based on `grounding_selection_mode`..	✅
`grounding_selection_mode`	`str`	.	❌
`model_version`	`str`	Model to be used.	✅

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections¶

Compatible Blocks

Check what blocks you can connect to Florence-2 Model in version v1.

Input and Output Bindings¶

The available connections depend on its binding kinds. Check what binding kinds Florence-2 Model in version v1 has.

Bindings

input
- images (image): The image to infer on..
- prompt (string): Text prompt to the Florence-2 model.
- classes (list_of_values): List of classes to be used.
- grounding_detection (Union[keypoint_detection_prediction, list_of_values, object_detection_prediction, instance_segmentation_prediction]): Detection to ground Florence-2 model. May be statically provided bounding box [left_top_x, left_top_y, right_bottom_x, right_bottom_y] or result of object-detection model. If the latter is true, one box will be selected based on grounding_selection_mode..
- model_version (string): Model to be used.
output
- raw_output (Union[string, language_model_output]): String value if string or LLM / VLM output if language_model_output.
- parsed_output (dictionary): Dictionary.
- classes (list_of_values): List of values of any type.

Example JSON definition of step Florence-2 Model in version v1

{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/florence_2@v1",
    "images": "$inputs.image",
    "task_type": "<block_does_not_provide_example>",
    "prompt": "my prompt",
    "classes": [
        "class-a",
        "class-b"
    ],
    "grounding_detection": "$steps.detection.predictions",
    "grounding_selection_mode": "first",
    "model_version": "florence-2-base"
}