Skip to content

VLM as Detector

Version v1

The block expects string input that would be produced by blocks exposing Large Language Models (LLMs) and Visual Language Models (VLMs). Input is parsed to object-detection prediction and returned as block output.

Accepted formats:

  • valid JSON strings

  • JSON documents wrapped with Markdown tags

Example

{"my": "json"}

Details regarding block behavior:

  • error_status is set True whenever parsing cannot be completed

  • in case of multiple markdown blocks with raw JSON content - only first will be parsed

Type identifier

Use the following identifier in step "type" field: roboflow_core/vlm_as_detector@v1to add the block as as step in your workflow.

Properties

Name Type Description Refs
name str The unique name of this step..
classes List[str] List of all classes used by the model, required to generate mapping between class name and class id..
model_type str Type of the model that generated prediction.
task_type str Task type to performed by model..

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections

Check what blocks you can connect to VLM as Detector in version v1.

The available connections depend on its binding kinds. Check what binding kinds VLM as Detector in version v1 has.

Bindings
  • input

    • image (image): The image which was the base to generate VLM prediction.
    • vlm_output (language_model_output): The string with raw classification prediction to parse..
    • classes (list_of_values): List of all classes used by the model, required to generate mapping between class name and class id..
  • output

Example JSON definition of step VLM as Detector in version v1
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/vlm_as_detector@v1",
    "image": "$inputs.image",
    "vlm_output": [
        "$steps.lmm.output"
    ],
    "classes": [
        "$steps.lmm.classes",
        "$inputs.classes",
        [
            "class_a",
            "class_b"
        ]
    ],
    "model_type": [
        "google-gemini",
        "anthropic-claude",
        "florence-2"
    ],
    "task_type": "<block_does_not_provide_example>"
}