VLM as Detector¶

v2¶

Class: VLMAsDetectorBlockV2 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v2.VLMAsDetectorBlockV2

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.

Accepted formats:

valid JSON strings
JSON documents wrapped with Markdown tags

Example

{"my": "json"}

Details regarding block behavior:

error_status is set True whenever parsing cannot be completed
in case of multiple markdown blocks with raw JSON content - only first will be parsed

Type identifier¶

Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v2to add the block as as step in your workflow.

Properties¶

Name	Type	Description	Refs
`name`	`str`	Enter a unique identifier for this step..	❌
`classes`	`List[str]`	List of all classes used by the model, required to generate mapping between class name and class id..	✅
`model_type`	`str`	Type of the model that generated prediction.	❌
`task_type`	`str`	Task type to performed by model..	❌

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections¶

Compatible Blocks

Check what blocks you can connect to VLM as Detector in version v2.

Input and Output Bindings¶

The available connections depend on its binding kinds. Check what binding kinds VLM as Detector in version v2 has.

Bindings

input
- image (image): The image which was the base to generate VLM prediction.
- vlm_output (language_model_output): The string with raw classification prediction to parse..
- classes (list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
output
- error_status (boolean): Boolean flag.
- predictions (object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.
- inference_id (inference_id): Inference identifier.

Example JSON definition of step VLM as Detector in version v2

{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/vlm_as_detector@v2",
    "image": "$inputs.image",
    "vlm_output": [
        "$steps.lmm.output"
    ],
    "classes": [
        "$steps.lmm.classes",
        "$inputs.classes",
        [
            "class_a",
            "class_b"
        ]
    ],
    "model_type": [
        "google-gemini",
        "anthropic-claude",
        "florence-2"
    ],
    "task_type": "<block_does_not_provide_example>"
}

v1¶

Class: VLMAsDetectorBlockV1 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.formatters.vlm_as_detector.v1.VLMAsDetectorBlockV1

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.

Accepted formats:

valid JSON strings
JSON documents wrapped with Markdown tags

Example

{"my": "json"}

Details regarding block behavior:

error_status is set True whenever parsing cannot be completed
in case of multiple markdown blocks with raw JSON content - only first will be parsed

Type identifier¶

Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v1to add the block as as step in your workflow.

Properties¶

Name	Type	Description	Refs
`name`	`str`	Enter a unique identifier for this step..	❌
`classes`	`List[str]`	List of all classes used by the model, required to generate mapping between class name and class id..	✅
`model_type`	`str`	Type of the model that generated prediction.	❌
`task_type`	`str`	Task type to performed by model..	❌

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections¶

Compatible Blocks

Check what blocks you can connect to VLM as Detector in version v1.

Input and Output Bindings¶

The available connections depend on its binding kinds. Check what binding kinds VLM as Detector in version v1 has.

Bindings

input
- image (image): The image which was the base to generate VLM prediction.
- vlm_output (language_model_output): The string with raw classification prediction to parse..
- classes (list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
output
- error_status (boolean): Boolean flag.
- predictions (object_detection_prediction): Prediction with detected bounding boxes in form of sv.Detections(...) object.
- inference_id (string): String value.

Example JSON definition of step VLM as Detector in version v1

{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/vlm_as_detector@v1",
    "image": "$inputs.image",
    "vlm_output": [
        "$steps.lmm.output"
    ],
    "classes": [
        "$steps.lmm.classes",
        "$inputs.classes",
        [
            "class_a",
            "class_b"
        ]
    ],
    "model_type": [
        "google-gemini",
        "anthropic-claude",
        "florence-2"
    ],
    "task_type": "<block_does_not_provide_example>"
}