Skip to content

Florence-2 Model

v2

Class: Florence2BlockV2 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.models.foundation.florence2.v2.Florence2BlockV2

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Dedicated inference server required (GPU recommended) - you may want to use dedicated deployment

This Workflow block introduces Florence 2, a Visual Language Model (VLM) capable of performing a wide range of tasks, including:

  • Object Detection

  • Instance Segmentation

  • Image Captioning

  • Optical Character Recognition (OCR)

  • and more...

Below is a comprehensive list of tasks supported by the model, along with descriptions on how to utilize their outputs within the Workflows ecosystem:

Task Descriptions:

  • Custom Prompt (custom) - Use free-form prompt to generate a response. Useful with finetuned models.

  • Text Recognition (OCR) (ocr) - Model recognizes text in the image

  • Text Detection & Recognition (OCR) (ocr-with-text-detection) - Model detects text regions in the image, and then performs OCR on each detected region

  • Captioning (short) (caption) - Model provides a short description of the image

  • Captioning (detailed-caption) - Model provides a long description of the image

  • Captioning (long) (more-detailed-caption) - Model provides a very long description of the image

  • Unprompted Object Detection (object-detection) - Model detects and returns the bounding boxes for prominent objects in the image

  • Object Detection (open-vocabulary-object-detection) - Model detects and returns the bounding boxes for the provided classes

  • Detection & Captioning (object-detection-and-caption) - Model detects prominent objects and captions them

  • Prompted Object Detection (phrase-grounded-object-detection) - Based on the textual prompt, model detects objects matching the descriptions

  • Prompted Instance Segmentation (phrase-grounded-instance-segmentation) - Based on the textual prompt, model segments objects matching the descriptions

  • Segment Bounding Box (detection-grounded-instance-segmentation) - Model segments the object in the provided bounding box into a polygon

  • Classification of Bounding Box (detection-grounded-classification) - Model classifies the object inside the provided bounding box

  • Captioning of Bounding Box (detection-grounded-caption) - Model captions the object in the provided bounding box

  • Text Recognition (OCR) for Bounding Box (detection-grounded-ocr) - Model performs OCR on the text inside the provided bounding box

  • Regions of Interest proposal (region-proposal) - Model proposes Regions of Interest (Bounding Boxes) in the image

Type identifier

Use the following identifier in step "type" field: roboflow_core/florence_2@v2to add the block as as step in your workflow.

Properties

Name Type Description Refs
name str Enter a unique identifier for this step..
task_type str Task type to be performed by model. Value determines required parameters and output response..
prompt str Text prompt to the Florence-2 model.
classes List[str] List of classes to be used.
grounding_detection Optional[List[float], List[int]] Detection to ground Florence-2 model. May be statically provided bounding box [left_top_x, left_top_y, right_bottom_x, right_bottom_y] or result of object-detection model. If the latter is true, one box will be selected based on grounding_selection_mode..
grounding_selection_mode str .
model_id str Model to be used.

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Runtime compatibility

hard — runtime self_hosted_cpu; execution local
Requires a GPU; run_locally() loads a model that needs CUDA.

Available Connections

Compatible Blocks

Check what blocks you can connect to Florence-2 Model in version v2.

Input and Output Bindings

The available connections depend on its binding kinds. Check what binding kinds Florence-2 Model in version v2 has.

Bindings
Example JSON definition of step Florence-2 Model in version v2
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/florence_2@v2",
    "images": "$inputs.image",
    "task_type": "<block_does_not_provide_example>",
    "prompt": "my prompt",
    "classes": [
        "class-a",
        "class-b"
    ],
    "grounding_detection": "$steps.detection.predictions",
    "grounding_selection_mode": "first",
    "model_id": "florence-2-base"
}

v1

Class: Florence2BlockV1 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.models.foundation.florence2.v1.Florence2BlockV1

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Dedicated inference server required (GPU recommended) - you may want to use dedicated deployment

This Workflow block introduces Florence 2, a Visual Language Model (VLM) capable of performing a wide range of tasks, including:

  • Object Detection

  • Instance Segmentation

  • Image Captioning

  • Optical Character Recognition (OCR)

  • and more...

Below is a comprehensive list of tasks supported by the model, along with descriptions on how to utilize their outputs within the Workflows ecosystem:

Task Descriptions:

  • Custom Prompt (custom) - Use free-form prompt to generate a response. Useful with finetuned models.

  • Text Recognition (OCR) (ocr) - Model recognizes text in the image

  • Text Detection & Recognition (OCR) (ocr-with-text-detection) - Model detects text regions in the image, and then performs OCR on each detected region

  • Captioning (short) (caption) - Model provides a short description of the image

  • Captioning (detailed-caption) - Model provides a long description of the image

  • Captioning (long) (more-detailed-caption) - Model provides a very long description of the image

  • Unprompted Object Detection (object-detection) - Model detects and returns the bounding boxes for prominent objects in the image

  • Object Detection (open-vocabulary-object-detection) - Model detects and returns the bounding boxes for the provided classes

  • Detection & Captioning (object-detection-and-caption) - Model detects prominent objects and captions them

  • Prompted Object Detection (phrase-grounded-object-detection) - Based on the textual prompt, model detects objects matching the descriptions

  • Prompted Instance Segmentation (phrase-grounded-instance-segmentation) - Based on the textual prompt, model segments objects matching the descriptions

  • Segment Bounding Box (detection-grounded-instance-segmentation) - Model segments the object in the provided bounding box into a polygon

  • Classification of Bounding Box (detection-grounded-classification) - Model classifies the object inside the provided bounding box

  • Captioning of Bounding Box (detection-grounded-caption) - Model captions the object in the provided bounding box

  • Text Recognition (OCR) for Bounding Box (detection-grounded-ocr) - Model performs OCR on the text inside the provided bounding box

  • Regions of Interest proposal (region-proposal) - Model proposes Regions of Interest (Bounding Boxes) in the image

Type identifier

Use the following identifier in step "type" field: roboflow_core/florence_2@v1to add the block as as step in your workflow.

Properties

Name Type Description Refs
name str Enter a unique identifier for this step..
task_type str Task type to be performed by model. Value determines required parameters and output response..
prompt str Text prompt to the Florence-2 model.
classes List[str] List of classes to be used.
grounding_detection Optional[List[float], List[int]] Detection to ground Florence-2 model. May be statically provided bounding box [left_top_x, left_top_y, right_bottom_x, right_bottom_y] or result of object-detection model. If the latter is true, one box will be selected based on grounding_selection_mode..
grounding_selection_mode str .
model_version str Model to be used.

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Runtime compatibility

hard — runtime self_hosted_cpu; execution local
Requires a GPU; run_locally() loads a model that needs CUDA.

Available Connections

Compatible Blocks

Check what blocks you can connect to Florence-2 Model in version v1.

Input and Output Bindings

The available connections depend on its binding kinds. Check what binding kinds Florence-2 Model in version v1 has.

Bindings
Example JSON definition of step Florence-2 Model in version v1
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/florence_2@v1",
    "images": "$inputs.image",
    "task_type": "<block_does_not_provide_example>",
    "prompt": "my prompt",
    "classes": [
        "class-a",
        "class-b"
    ],
    "grounding_detection": "$steps.detection.predictions",
    "grounding_selection_mode": "first",
    "model_version": "florence-2-base"
}