Florence-2 Model¶

v2¶

Class: Florence2BlockV2 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.models.foundation.florence2.v2.Florence2BlockV2

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Dedicated inference server required (GPU recommended) - you may want to use dedicated deployment

This Workflow block introduces Florence 2, a Visual Language Model (VLM) capable of performing a wide range of tasks, including:

Object Detection
Instance Segmentation
Image Captioning
Optical Character Recognition (OCR)
and more...

Below is a comprehensive list of tasks supported by the model, along with descriptions on how to utilize their outputs within the Workflows ecosystem:

Task Descriptions:

Custom Prompt (custom) - Use free-form prompt to generate a response. Useful with finetuned models.
Text Recognition (OCR) (ocr) - Model recognizes text in the image
Text Detection & Recognition (OCR) (ocr-with-text-detection) - Model detects text regions in the image, and then performs OCR on each detected region
Captioning (short) (caption) - Model provides a short description of the image
Captioning (detailed-caption) - Model provides a long description of the image
Captioning (long) (more-detailed-caption) - Model provides a very long description of the image
Unprompted Object Detection (object-detection) - Model detects and returns the bounding boxes for prominent objects in the image
Object Detection (open-vocabulary-object-detection) - Model detects and returns the bounding boxes for the provided classes
Detection & Captioning (object-detection-and-caption) - Model detects prominent objects and captions them
Prompted Object Detection (phrase-grounded-object-detection) - Based on the textual prompt, model detects objects matching the descriptions
Prompted Instance Segmentation (phrase-grounded-instance-segmentation) - Based on the textual prompt, model segments objects matching the descriptions
Segment Bounding Box (detection-grounded-instance-segmentation) - Model segments the object in the provided bounding box into a polygon
Classification of Bounding Box (detection-grounded-classification) - Model classifies the object inside the provided bounding box
Captioning of Bounding Box (detection-grounded-caption) - Model captions the object in the provided bounding box
Text Recognition (OCR) for Bounding Box (detection-grounded-ocr) - Model performs OCR on the text inside the provided bounding box
Regions of Interest proposal (region-proposal) - Model proposes Regions of Interest (Bounding Boxes) in the image

Type identifier¶

Use the following identifier in step "type" field: roboflow_core/florence_2@v2to add the block as as step in your workflow.

Properties¶

Name	Type	Description	Refs
`name`	`str`	Enter a unique identifier for this step..	❌
`task_type`	`str`	Task type to be performed by model. Value determines required parameters and output response..	❌
`prompt`	`str`	Text prompt to the Florence-2 model.	✅
`classes`	`List[str]`	List of classes to be used.	✅
`grounding_detection`	`Optional[List[float], List[int]]`	Detection to ground Florence-2 model. May be statically provided bounding box `[left_top_x, left_top_y, right_bottom_x, right_bottom_y]` or result of object-detection model. If the latter is true, one box will be selected based on `grounding_selection_mode`..	✅
`grounding_selection_mode`	`str`	.	❌
`model_id`	`str`	Model to be used.	✅

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Runtime compatibility¶

hard — runtime self_hosted_cpu; execution local: Requires a GPU; run_locally() loads a model that needs CUDA.

Available Connections¶

Compatible Blocks

Check what blocks you can connect to Florence-2 Model in version v2.

Input and Output Bindings¶

The available connections depend on its binding kinds. Check what binding kinds Florence-2 Model in version v2 has.

Bindings

input
- images (image): The image to infer on..
- prompt (string): Text prompt to the Florence-2 model.
- classes (list_of_values): List of classes to be used.
- grounding_detection (Union[instance_segmentation_prediction, object_detection_prediction, list_of_values, keypoint_detection_prediction]): Detection to ground Florence-2 model. May be statically provided bounding box [left_top_x, left_top_y, right_bottom_x, right_bottom_y] or result of object-detection model. If the latter is true, one box will be selected based on grounding_selection_mode..
- model_id (roboflow_model_id): Model to be used.
output
- raw_output (Union[string, language_model_output]): String value if string or LLM / VLM output if language_model_output.
- parsed_output (dictionary): Dictionary.
- classes (list_of_values): List of values of any type.

Example JSON definition of step Florence-2 Model in version v2

{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/florence_2@v2",
    "images": "$inputs.image",
    "task_type": "<block_does_not_provide_example>",
    "prompt": "my prompt",
    "classes": [
        "class-a",
        "class-b"
    ],
    "grounding_detection": "$steps.detection.predictions",
    "grounding_selection_mode": "first",
    "model_id": "florence-2-base"
}

v1¶

Class: Florence2BlockV1 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.models.foundation.florence2.v1.Florence2BlockV1

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Dedicated inference server required (GPU recommended) - you may want to use dedicated deployment

This Workflow block introduces Florence 2, a Visual Language Model (VLM) capable of performing a wide range of tasks, including:

Object Detection
Instance Segmentation
Image Captioning
Optical Character Recognition (OCR)
and more...

Below is a comprehensive list of tasks supported by the model, along with descriptions on how to utilize their outputs within the Workflows ecosystem:

Task Descriptions:

Custom Prompt (custom) - Use free-form prompt to generate a response. Useful with finetuned models.
Text Recognition (OCR) (ocr) - Model recognizes text in the image
Text Detection & Recognition (OCR) (ocr-with-text-detection) - Model detects text regions in the image, and then performs OCR on each detected region
Captioning (short) (caption) - Model provides a short description of the image
Captioning (detailed-caption) - Model provides a long description of the image
Captioning (long) (more-detailed-caption) - Model provides a very long description of the image
Unprompted Object Detection (object-detection) - Model detects and returns the bounding boxes for prominent objects in the image
Object Detection (open-vocabulary-object-detection) - Model detects and returns the bounding boxes for the provided classes
Detection & Captioning (object-detection-and-caption) - Model detects prominent objects and captions them
Prompted Object Detection (phrase-grounded-object-detection) - Based on the textual prompt, model detects objects matching the descriptions
Prompted Instance Segmentation (phrase-grounded-instance-segmentation) - Based on the textual prompt, model segments objects matching the descriptions
Segment Bounding Box (detection-grounded-instance-segmentation) - Model segments the object in the provided bounding box into a polygon
Classification of Bounding Box (detection-grounded-classification) - Model classifies the object inside the provided bounding box
Captioning of Bounding Box (detection-grounded-caption) - Model captions the object in the provided bounding box
Text Recognition (OCR) for Bounding Box (detection-grounded-ocr) - Model performs OCR on the text inside the provided bounding box
Regions of Interest proposal (region-proposal) - Model proposes Regions of Interest (Bounding Boxes) in the image

Type identifier¶

Use the following identifier in step "type" field: roboflow_core/florence_2@v1to add the block as as step in your workflow.

Properties¶

Name	Type	Description	Refs
`name`	`str`	Enter a unique identifier for this step..	❌
`task_type`	`str`	Task type to be performed by model. Value determines required parameters and output response..	❌
`prompt`	`str`	Text prompt to the Florence-2 model.	✅
`classes`	`List[str]`	List of classes to be used.	✅
`grounding_detection`	`Optional[List[float], List[int]]`	Detection to ground Florence-2 model. May be statically provided bounding box `[left_top_x, left_top_y, right_bottom_x, right_bottom_y]` or result of object-detection model. If the latter is true, one box will be selected based on `grounding_selection_mode`..	✅
`grounding_selection_mode`	`str`	.	❌
`model_version`	`str`	Model to be used.	✅

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Runtime compatibility¶

hard — runtime self_hosted_cpu; execution local: Requires a GPU; run_locally() loads a model that needs CUDA.

Available Connections¶

Compatible Blocks

Check what blocks you can connect to Florence-2 Model in version v1.

Input and Output Bindings¶

The available connections depend on its binding kinds. Check what binding kinds Florence-2 Model in version v1 has.

Bindings

input
- images (image): The image to infer on..
- prompt (string): Text prompt to the Florence-2 model.
- classes (list_of_values): List of classes to be used.
- grounding_detection (Union[instance_segmentation_prediction, object_detection_prediction, list_of_values, keypoint_detection_prediction]): Detection to ground Florence-2 model. May be statically provided bounding box [left_top_x, left_top_y, right_bottom_x, right_bottom_y] or result of object-detection model. If the latter is true, one box will be selected based on grounding_selection_mode..
- model_version (string): Model to be used.
output
- raw_output (Union[string, language_model_output]): String value if string or LLM / VLM output if language_model_output.
- parsed_output (dictionary): Dictionary.
- classes (list_of_values): List of values of any type.

Example JSON definition of step Florence-2 Model in version v1

{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/florence_2@v1",
    "images": "$inputs.image",
    "task_type": "<block_does_not_provide_example>",
    "prompt": "my prompt",
    "classes": [
        "class-a",
        "class-b"
    ],
    "grounding_detection": "$steps.detection.predictions",
    "grounding_selection_mode": "first",
    "model_version": "florence-2-base"
}