Workflows with Visual Language Models¶

Below you can find example workflows you can use as inspiration to build your apps.

Prompting Anthropic Claude with arbitrary prompt¶

In this example, Anthropic Claude model is prompted with arbitrary text from user

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$inputs.image",
            "task_type": "unconstrained",
            "prompt": "Give me dominant color of the image",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.claude.output"
        }
    ]
}

Using Anthropic Claude as OCR model¶

In this example, Anthropic Claude model is used as OCR system. User just points task type and do not need to provide any prompt.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$inputs.image",
            "task_type": "ocr",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.claude.output"
        }
    ]
}

Using Anthropic Claude as Visual Question Answering system¶

In this example, Anthropic Claude model is used as VQA system. User provides question via prompt.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "prompt"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$inputs.image",
            "task_type": "visual-question-answering",
            "prompt": "$inputs.prompt",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.claude.output"
        }
    ]
}

Using Anthropic Claude as Image Captioning system¶

In this example, Anthropic Claude model is used as Image Captioning system.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$inputs.image",
            "task_type": "caption",
            "api_key": "$inputs.api_key",
            "temperature": 1.0
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.claude.output"
        }
    ]
}

Using Anthropic Claude as multi-class classifier¶

In this example, Anthropic Claude model is used as classifier. Output from the model is parsed by special roboflow_core/vlm_as_classifier@v2 block which turns model output text into full-blown prediction, which can later be used by other blocks compatible with classification predictions - in this case we extract top-class property.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$inputs.image",
            "task_type": "classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$inputs.image",
            "vlm_output": "$steps.claude.output",
            "classes": "$steps.claude.classes"
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "top_class",
            "operations": [
                {
                    "type": "ClassificationPropertyExtract",
                    "property_name": "top_class"
                }
            ],
            "data": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "claude_result",
            "selector": "$steps.claude.output"
        },
        {
            "type": "JsonField",
            "name": "top_class",
            "selector": "$steps.top_class.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.*"
        }
    ]
}

Using Anthropic Claude as multi-label classifier¶

In this example, Anthropic Claude model is used as multi-label classifier. Output from the model is parsed by special roboflow_core/vlm_as_classifier@v2 block which turns model output text into full-blown prediction, which can later be used by other blocks compatible with classification predictions - in this case we extract top-class property.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$inputs.image",
            "task_type": "multi-label-classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$inputs.image",
            "vlm_output": "$steps.claude.output",
            "classes": "$steps.claude.classes"
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "top_class",
            "operations": [
                {
                    "type": "ClassificationPropertyExtract",
                    "property_name": "top_class"
                }
            ],
            "data": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.top_class.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.*"
        }
    ]
}

Using Anthropic Claude to provide structured JSON¶

In this example, Anthropic Claude model is expected to provide structured output in JSON, which can later be parsed by dedicated roboflow_core/json_parser@v1 block which transforms string into dictionary and expose it's keys to other blocks for further processing. In this case, parsed output is transformed using roboflow_core/property_definition@v1 block.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$inputs.image",
            "task_type": "structured-answering",
            "output_structure": {
                "dogs_count": "count of dogs instances in the image",
                "cats_count": "count of cats instances in the image"
            },
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/json_parser@v1",
            "name": "parser",
            "raw_json": "$steps.claude.output",
            "expected_fields": [
                "dogs_count",
                "cats_count"
            ]
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "property_definition",
            "operations": [
                {
                    "type": "ToString"
                }
            ],
            "data": "$steps.parser.dogs_count"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.property_definition.output"
        }
    ]
}

Using Anthropic Claude as object-detection model¶

In this example, Anthropic Claude model is expected to provide output, which can later be parsed by dedicated roboflow_core/vlm_as_detector@v1 block which transforms string into sv.Detections, which can later be used by other blocks processing object-detection predictions.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$inputs.image",
            "task_type": "object-detection",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_detector@v2",
            "name": "parser",
            "vlm_output": "$steps.claude.output",
            "image": "$inputs.image",
            "classes": "$steps.claude.classes",
            "model_type": "anthropic-claude",
            "task_type": "object-detection"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "claude_result",
            "selector": "$steps.claude.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.predictions"
        }
    ]
}

Using Anthropic Claude as secondary classifier¶

In this example, Anthropic Claude model is used as secondary classifier - first, YOLO model detects dogs, then for each dog we run classification with VLM and at the end we replace detections classes to have bounding boxes with dogs breeds labels.

Breeds that we classify: russell-terrier, wirehaired-pointing-griffon, beagle

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes",
            "default_value": [
                "russell-terrier",
                "wirehaired-pointing-griffon",
                "beagle"
            ]
        }
    ],
    "steps": [
        {
            "type": "ObjectDetectionModel",
            "name": "general_detection",
            "image": "$inputs.image",
            "model_id": "yolov8n-640",
            "class_filter": [
                "dog"
            ]
        },
        {
            "type": "Crop",
            "name": "cropping",
            "image": "$inputs.image",
            "predictions": "$steps.general_detection.predictions"
        },
        {
            "type": "roboflow_core/anthropic_claude@v1",
            "name": "claude",
            "images": "$steps.cropping.crops",
            "task_type": "classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$steps.cropping.crops",
            "vlm_output": "$steps.claude.output",
            "classes": "$steps.claude.classes"
        },
        {
            "type": "roboflow_core/detections_classes_replacement@v1",
            "name": "classes_replacement",
            "object_detection_predictions": "$steps.general_detection.predictions",
            "classification_predictions": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "predictions",
            "selector": "$steps.classes_replacement.predictions"
        }
    ]
}

Florence 2 - grounded classification¶

THIS EXAMPLE CAN ONLY BE RUN LOCALLY OR USING DEDICATED DEPLOYMENT

In this example, we use object detection model to find regions of interest in the input image, which are later classified by Florence 2 model. With Workflows it is possible to pass grounding_detection as an input for all of the tasks named detection-grounded-*.

Grounding detection can either be input parameter or output of detection model. If the latter is true, one should choose grounding_selection_mode - as Florence do only support a single bounding box as grounding - when multiple detections can be provided, block will select one based on parameter.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "InferenceImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "confidence",
            "default_value": 0.4
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/roboflow_object_detection_model@v2",
            "name": "model_1",
            "images": "$inputs.image",
            "model_id": "yolov8n-640",
            "confidence": "$inputs.confidence"
        },
        {
            "type": "roboflow_core/florence_2@v1",
            "name": "model",
            "images": "$inputs.image",
            "task_type": "detection-grounded-classification",
            "grounding_detection": "$steps.model_1.predictions",
            "grounding_selection_mode": "most-confident"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "model_predictions",
            "coordinates_system": "own",
            "selector": "$steps.model.*"
        }
    ]
}

Florence 2 - grounded segmentation¶

THIS EXAMPLE CAN ONLY BE RUN LOCALLY OR USING DEDICATED DEPLOYMENT

In this example, we use object detection model to find regions of interest in the input image and run segmentation of selected region with Florence 2. With Workflows it is possible to pass grounding_detection as an input for all of the tasks named detection-grounded-*.

Grounding detection can either be input parameter or output of detection model. If the latter is true, one should choose grounding_selection_mode - as Florence do only support a single bounding box as grounding - when multiple detections can be provided, block will select one based on parameter.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "InferenceImage",
            "name": "image"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/roboflow_object_detection_model@v2",
            "name": "model_1",
            "images": "$inputs.image",
            "model_id": "yolov8n-640"
        },
        {
            "type": "roboflow_core/florence_2@v1",
            "name": "model",
            "images": "$inputs.image",
            "task_type": "detection-grounded-instance-segmentation",
            "grounding_detection": "$steps.model_1.predictions",
            "grounding_selection_mode": "most-confident"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "model_predictions",
            "coordinates_system": "own",
            "selector": "$steps.model.*"
        }
    ]
}

Florence 2 - grounded captioning¶

THIS EXAMPLE CAN ONLY BE RUN LOCALLY OR USING DEDICATED DEPLOYMENT

In this example, we use object detection model to find regions of interest in the input image and run captioning of selected region with Florence 2. With Workflows it is possible to pass grounding_detection as an input for all of the tasks named detection-grounded-*.

Grounding detection can either be input parameter or output of detection model. If the latter is true, one should choose grounding_selection_mode - as Florence do only support a single bounding box as grounding - when multiple detections can be provided, block will select one based on parameter.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "InferenceImage",
            "name": "image"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/roboflow_object_detection_model@v2",
            "name": "model_1",
            "images": "$inputs.image",
            "model_id": "yolov8n-640"
        },
        {
            "type": "roboflow_core/florence_2@v1",
            "name": "model",
            "images": "$inputs.image",
            "task_type": "detection-grounded-instance-segmentation",
            "grounding_detection": "$steps.model_1.predictions",
            "grounding_selection_mode": "most-confident"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "model_predictions",
            "coordinates_system": "own",
            "selector": "$steps.model.*"
        }
    ]
}

Florence 2 - object detection¶

THIS EXAMPLE CAN ONLY BE RUN LOCALLY OR USING DEDICATED DEPLOYMENT

In this example, we use Florence 2 as zero-shot object detection model, specifically performing open-vocabulary detection. Input parameter classes can be used to provide list of objects that model should find. Beware that Florence 2 is prone to seek for all of the classes provided in your list - so if you select class which is not visible in the image, you can expect either big bounding box covering whole image, or multiple bounding boxes over one of detected instance, with auxiliary boxes providing not meaningful labels for all of the objects you specified in class list.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "InferenceImage",
            "name": "image"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/roboflow_object_detection_model@v2",
            "name": "model_1",
            "images": "$inputs.image",
            "model_id": "yolov8n-640"
        },
        {
            "type": "roboflow_core/florence_2@v1",
            "name": "model",
            "images": "$inputs.image",
            "task_type": "detection-grounded-instance-segmentation",
            "grounding_detection": "$steps.model_1.predictions",
            "grounding_selection_mode": "most-confident"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "model_predictions",
            "coordinates_system": "own",
            "selector": "$steps.model.*"
        }
    ]
}

Prompting Google's Gemini with arbitrary prompt¶

In this example, Google's Gemini model is prompted with arbitrary text from user

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "unconstrained",
            "prompt": "Give me dominant color of the image",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gemini.output"
        }
    ]
}

Using Google's Gemini as OCR model¶

In this example, Google's Gemini model is used as OCR system. User just points task type and do not need to provide any prompt.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "ocr",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gemini.output"
        }
    ]
}

Using Google's Gemini as Visual Question Answering system¶

In this example, Google's Gemini model is used as VQA system. User provides question via prompt.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "prompt"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "visual-question-answering",
            "prompt": "$inputs.prompt",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gemini.output"
        }
    ]
}

Using Google's Gemini as Image Captioning system¶

In this example, Google's Gemini model is used as Image Captioning system.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "caption",
            "api_key": "$inputs.api_key",
            "temperature": 1.0
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gemini.output"
        }
    ]
}

Using Google's Gemini as multi-class classifier¶

In this example, Google's Gemini model is used as classifier. Output from the model is parsed by special roboflow_core/vlm_as_classifier@v2 block which turns model output text into full-blown prediction, which can later be used by other blocks compatible with classification predictions - in this case we extract top-class property.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$inputs.image",
            "vlm_output": "$steps.gemini.output",
            "classes": "$steps.gemini.classes"
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "top_class",
            "operations": [
                {
                    "type": "ClassificationPropertyExtract",
                    "property_name": "top_class"
                }
            ],
            "data": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "gemini_result",
            "selector": "$steps.gemini.output"
        },
        {
            "type": "JsonField",
            "name": "top_class",
            "selector": "$steps.top_class.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.*"
        }
    ]
}

Using Google's Gemini as multi-label classifier¶

In this example, Google's Gemini model is used as multi-label classifier. Output from the model is parsed by special roboflow_core/vlm_as_classifier@v2 block which turns model output text into full-blown prediction, which can later be used by other blocks compatible with classification predictions - in this case we extract top-class property.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "multi-label-classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$inputs.image",
            "vlm_output": "$steps.gemini.output",
            "classes": "$steps.gemini.classes"
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "top_class",
            "operations": [
                {
                    "type": "ClassificationPropertyExtract",
                    "property_name": "top_class"
                }
            ],
            "data": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.top_class.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.*"
        }
    ]
}

Using Google's Gemini to provide structured JSON¶

In this example, Google's Gemini model is expected to provide structured output in JSON, which can later be parsed by dedicated roboflow_core/json_parser@v1 block which transforms string into dictionary and expose it's keys to other blocks for further processing. In this case, parsed output is transformed using roboflow_core/property_definition@v1 block.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "structured-answering",
            "output_structure": {
                "dogs_count": "count of dogs instances in the image",
                "cats_count": "count of cats instances in the image"
            },
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/json_parser@v1",
            "name": "parser",
            "raw_json": "$steps.gemini.output",
            "expected_fields": [
                "dogs_count",
                "cats_count"
            ]
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "property_definition",
            "operations": [
                {
                    "type": "ToString"
                }
            ],
            "data": "$steps.parser.dogs_count"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.property_definition.output"
        }
    ]
}

Using Google's Gemini as object-detection model¶

In this example, Google's Gemini model is expected to provide output, which can later be parsed by dedicated roboflow_core/vlm_as_detector@v1 block which transforms string into sv.Detections, which can later be used by other blocks processing object-detection predictions.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "object-detection",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_detector@v2",
            "name": "parser",
            "vlm_output": "$steps.gemini.output",
            "image": "$inputs.image",
            "classes": "$steps.gemini.classes",
            "model_type": "google-gemini",
            "task_type": "object-detection"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "gemini_result",
            "selector": "$steps.gemini.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.predictions"
        }
    ]
}

Using different versions of Google's Gemini for Image Captioning¶

In this example, we test different Gemini model versions for image captioning. This workflow allows specifying any supported Gemini model version as input parameter.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "model_version"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$inputs.image",
            "task_type": "caption",
            "api_key": "$inputs.api_key",
            "model_version": "$inputs.model_version"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gemini.output"
        }
    ]
}

Using Google's Gemini as secondary classifier¶

In this example, Google's Gemini model is used as secondary classifier - first, YOLO model detects dogs, then for each dog we run classification with VLM and at the end we replace detections classes to have bounding boxes with dogs breeds labels.

Breeds that we classify: russell-terrier, wirehaired-pointing-griffon, beagle

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes",
            "default_value": [
                "russell-terrier",
                "wirehaired-pointing-griffon",
                "beagle"
            ]
        }
    ],
    "steps": [
        {
            "type": "ObjectDetectionModel",
            "name": "general_detection",
            "image": "$inputs.image",
            "model_id": "yolov8n-640",
            "class_filter": [
                "dog"
            ]
        },
        {
            "type": "Crop",
            "name": "cropping",
            "image": "$inputs.image",
            "predictions": "$steps.general_detection.predictions"
        },
        {
            "type": "roboflow_core/google_gemini@v1",
            "name": "gemini",
            "images": "$steps.cropping.crops",
            "task_type": "classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$steps.cropping.crops",
            "vlm_output": "$steps.gemini.output",
            "classes": "$steps.gemini.classes"
        },
        {
            "type": "roboflow_core/detections_classes_replacement@v1",
            "name": "classes_replacement",
            "object_detection_predictions": "$steps.general_detection.predictions",
            "classification_predictions": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "predictions",
            "selector": "$steps.classes_replacement.predictions"
        }
    ]
}

Prompting LLama Vision 3.2 with arbitrary prompt¶

In this example, LLama Vision 3.2 model is prompted with arbitrary text from user

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "prompt"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/llama_3_2_vision@v1",
            "name": "llama",
            "images": "$inputs.image",
            "task_type": "unconstrained",
            "prompt": "$inputs.prompt",
            "api_key": "$inputs.api_key",
            "model_version": "11B (Regular) - OpenRouter"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.llama.output"
        }
    ]
}

Using LLama Vision 3.2 as OCR model¶

In this example, LLama Vision 3.2 model is used as OCR system. User just points task type and do not need to provide any prompt.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/llama_3_2_vision@v1",
            "name": "llama",
            "images": "$inputs.image",
            "task_type": "ocr",
            "api_key": "$inputs.api_key",
            "model_version": "11B (Regular) - OpenRouter"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.llama.output"
        }
    ]
}

Using LLama Vision 3.2 as Visual Question Answering system¶

In this example, LLama Vision 3.2 model is used as VQA system. User provides question via prompt.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "prompt"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/llama_3_2_vision@v1",
            "name": "llama",
            "images": "$inputs.image",
            "task_type": "visual-question-answering",
            "prompt": "$inputs.prompt",
            "api_key": "$inputs.api_key",
            "model_version": "11B (Regular) - OpenRouter"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.llama.output"
        }
    ]
}

Using LLama Vision 3.2 as Image Captioning system¶

In this example, LLama Vision 3.2 model is used as Image Captioning system.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/llama_3_2_vision@v1",
            "name": "llama",
            "images": "$inputs.image",
            "task_type": "caption",
            "api_key": "$inputs.api_key",
            "model_version": "11B (Regular) - OpenRouter"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.llama.output"
        }
    ]
}

Using LLama Vision 3.2 as multi-class classifier¶

In this example, LLama Vision 3.2 model is used as classifier. Output from the model is parsed by special roboflow_core/vlm_as_classifier@v2 block which turns LLama Vision 3.2 output text into full-blown prediction, which can later be used by other blocks compatible with classification predictions - in this case we extract top-class property.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/llama_3_2_vision@v1",
            "name": "llama",
            "images": "$inputs.image",
            "task_type": "classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key",
            "model_version": "11B (Regular) - OpenRouter"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$inputs.image",
            "vlm_output": "$steps.llama.output",
            "classes": "$steps.llama.classes"
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "top_class",
            "operations": [
                {
                    "type": "ClassificationPropertyExtract",
                    "property_name": "top_class"
                }
            ],
            "data": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "llama_result",
            "selector": "$steps.llama.output"
        },
        {
            "type": "JsonField",
            "name": "top_class",
            "selector": "$steps.top_class.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.*"
        }
    ]
}

Using LLama Vision 3.2 as multi-label classifier¶

In this example, LLama Vision 3.2 model is used as multi-label classifier. Output from the model is parsed by special roboflow_core/vlm_as_classifier@v1 block which turns LLama Vision 3.2 output text into full-blown prediction, which can later be used by other blocks compatible with classification predictions - in this case we extract top-class property.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/llama_3_2_vision@v1",
            "name": "llama",
            "images": "$inputs.image",
            "task_type": "multi-label-classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key",
            "model_version": "11B (Regular) - OpenRouter"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$inputs.image",
            "vlm_output": "$steps.llama.output",
            "classes": "$steps.llama.classes"
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "top_class",
            "operations": [
                {
                    "type": "ClassificationPropertyExtract",
                    "property_name": "top_class"
                }
            ],
            "data": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.top_class.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.*"
        }
    ]
}

Using LLama Vision 3.2 to provide structured JSON¶

In this example, LLama Vision 3.2 model is expected to provide structured output in JSON, which can later be parsed by dedicated roboflow_core/json_parser@v1 block which transforms string into dictionary and expose it's keys to other blocks for further processing. In this case, parsed output is transformed using roboflow_core/property_definition@v1 block.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/llama_3_2_vision@v1",
            "name": "llama",
            "images": "$inputs.image",
            "task_type": "structured-answering",
            "output_structure": {
                "dogs_count": "count of dogs instances in the image",
                "cats_count": "count of cats instances in the image"
            },
            "api_key": "$inputs.api_key",
            "model_version": "11B (Regular) - OpenRouter"
        },
        {
            "type": "roboflow_core/json_parser@v1",
            "name": "parser",
            "raw_json": "$steps.llama.output",
            "expected_fields": [
                "dogs_count",
                "cats_count"
            ]
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "property_definition",
            "operations": [
                {
                    "type": "ToString"
                }
            ],
            "data": "$steps.parser.dogs_count"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "llama_output",
            "selector": "$steps.llama.output"
        },
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.property_definition.output"
        }
    ]
}

Using LLama Vision 3.2 as secondary classifier¶

In this example, LLama Vision 3.2 model is used as secondary classifier - first, YOLO model detects dogs, then for each dog we run classification with VLM and at the end we replace detections classes to have bounding boxes with dogs breeds labels.

Breeds that we classify: russell-terrier, wirehaired-pointing-griffon, beagle

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes",
            "default_value": [
                "russell-terrier",
                "wirehaired-pointing-griffon",
                "beagle"
            ]
        }
    ],
    "steps": [
        {
            "type": "ObjectDetectionModel",
            "name": "general_detection",
            "image": "$inputs.image",
            "model_id": "yolov8n-640",
            "class_filter": [
                "dog"
            ]
        },
        {
            "type": "Crop",
            "name": "cropping",
            "image": "$inputs.image",
            "predictions": "$steps.general_detection.predictions"
        },
        {
            "type": "roboflow_core/llama_3_2_vision@v1",
            "name": "llama",
            "images": "$steps.cropping.crops",
            "task_type": "classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key",
            "model_version": "11B (Regular) - OpenRouter"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$steps.cropping.crops",
            "vlm_output": "$steps.llama.output",
            "classes": "$steps.llama.classes"
        },
        {
            "type": "roboflow_core/detections_classes_replacement@v1",
            "name": "classes_replacement",
            "object_detection_predictions": "$steps.general_detection.predictions",
            "classification_predictions": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "predictions",
            "selector": "$steps.classes_replacement.predictions"
        }
    ]
}

Moondream 2 - object detection¶

Use Moondream2 to detect objects in an image.

You can pass in a prompt to the model to specify what you want to detect. The model will return a list of detection coordinates corresponding to the prompt.

This block only works with one class at a time. This is because Moondream2 does not allow zero shot detection on more than one class at once.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "InferenceImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "confidence",
            "default_value": 0.4
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/moondream2@v1",
            "name": "model",
            "images": "$inputs.image",
            "prompt": "dog"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "model_predictions",
            "coordinates_system": "own",
            "selector": "$steps.model.*"
        }
    ]
}

Prompting GPT with arbitrary prompt¶

In this example, GPT model is prompted with arbitrary text from user

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "prompt"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/open_ai@v2",
            "name": "gpt",
            "images": "$inputs.image",
            "task_type": "unconstrained",
            "prompt": "$inputs.prompt",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gpt.output"
        }
    ]
}

Using GPT as OCR model¶

In this example, GPT model is used as OCR system. User just points task type and do not need to provide any prompt.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/open_ai@v2",
            "name": "gpt",
            "images": "$inputs.image",
            "task_type": "ocr",
            "api_key": "$inputs.api_key",
            "model_version": "gpt-4o-mini"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gpt.output"
        }
    ]
}

Using GPT as Visual Question Answering system¶

In this example, GPT model is used as VQA system. User provides question via prompt.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "prompt"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/open_ai@v2",
            "name": "gpt",
            "images": "$inputs.image",
            "task_type": "visual-question-answering",
            "prompt": "$inputs.prompt",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gpt.output"
        }
    ]
}

Using GPT as Image Captioning system¶

In this example, GPT model is used as Image Captioning system.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/open_ai@v2",
            "name": "gpt",
            "images": "$inputs.image",
            "task_type": "caption",
            "api_key": "$inputs.api_key"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.gpt.output"
        }
    ]
}

Using GPT as multi-class classifier¶

In this example, GPT model is used as classifier. Output from the model is parsed by special roboflow_core/vlm_as_classifier@v2 block which turns GPT output text into full-blown prediction, which can later be used by other blocks compatible with classification predictions - in this case we extract top-class property.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/open_ai@v2",
            "name": "gpt",
            "images": "$inputs.image",
            "task_type": "classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$inputs.image",
            "vlm_output": "$steps.gpt.output",
            "classes": "$steps.gpt.classes"
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "top_class",
            "operations": [
                {
                    "type": "ClassificationPropertyExtract",
                    "property_name": "top_class"
                }
            ],
            "data": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "gpt_result",
            "selector": "$steps.gpt.output"
        },
        {
            "type": "JsonField",
            "name": "top_class",
            "selector": "$steps.top_class.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.*"
        }
    ]
}

Using GPT as multi-label classifier¶

In this example, GPT model is used as multi-label classifier. Output from the model is parsed by special roboflow_core/vlm_as_classifier@v1 block which turns GPT output text into full-blown prediction, which can later be used by other blocks compatible with classification predictions - in this case we extract top-class property.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/open_ai@v2",
            "name": "gpt",
            "images": "$inputs.image",
            "task_type": "multi-label-classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$inputs.image",
            "vlm_output": "$steps.gpt.output",
            "classes": "$steps.gpt.classes"
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "top_class",
            "operations": [
                {
                    "type": "ClassificationPropertyExtract",
                    "property_name": "top_class"
                }
            ],
            "data": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.top_class.output"
        },
        {
            "type": "JsonField",
            "name": "parsed_prediction",
            "selector": "$steps.parser.*"
        }
    ]
}

Using GPT to provide structured JSON¶

In this example, GPT model is expected to provide structured output in JSON, which can later be parsed by dedicated roboflow_core/json_parser@v1 block which transforms string into dictionary and expose it's keys to other blocks for further processing. In this case, parsed output is transformed using roboflow_core/property_definition@v1 block.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/open_ai@v2",
            "name": "gpt",
            "images": "$inputs.image",
            "task_type": "structured-answering",
            "output_structure": {
                "dogs_count": "count of dogs instances in the image",
                "cats_count": "count of cats instances in the image"
            },
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/json_parser@v1",
            "name": "parser",
            "raw_json": "$steps.gpt.output",
            "expected_fields": [
                "dogs_count",
                "cats_count"
            ]
        },
        {
            "type": "roboflow_core/property_definition@v1",
            "name": "property_definition",
            "operations": [
                {
                    "type": "ToString"
                }
            ],
            "data": "$steps.parser.dogs_count"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "result",
            "selector": "$steps.property_definition.output"
        }
    ]
}

Using GPT as secondary classifier¶

In this example, GPT model is used as secondary classifier - first, YOLO model detects dogs, then for each dog we run classification with VLM and at the end we replace detections classes to have bounding boxes with dogs breeds labels.

Breeds that we classify: russell-terrier, wirehaired-pointing-griffon, beagle

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "classes",
            "default_value": [
                "russell-terrier",
                "wirehaired-pointing-griffon",
                "beagle"
            ]
        }
    ],
    "steps": [
        {
            "type": "ObjectDetectionModel",
            "name": "general_detection",
            "image": "$inputs.image",
            "model_id": "yolov8n-640",
            "class_filter": [
                "dog"
            ]
        },
        {
            "type": "Crop",
            "name": "cropping",
            "image": "$inputs.image",
            "predictions": "$steps.general_detection.predictions"
        },
        {
            "type": "roboflow_core/open_ai@v2",
            "name": "gpt",
            "images": "$steps.cropping.crops",
            "task_type": "classification",
            "classes": "$inputs.classes",
            "api_key": "$inputs.api_key"
        },
        {
            "type": "roboflow_core/vlm_as_classifier@v2",
            "name": "parser",
            "image": "$steps.cropping.crops",
            "vlm_output": "$steps.gpt.output",
            "classes": "$steps.gpt.classes"
        },
        {
            "type": "roboflow_core/detections_classes_replacement@v1",
            "name": "classes_replacement",
            "object_detection_predictions": "$steps.general_detection.predictions",
            "classification_predictions": "$steps.parser.predictions"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "predictions",
            "selector": "$steps.classes_replacement.predictions"
        }
    ]
}

SmolVLM2¶

THIS EXAMPLE CAN ONLY BE RUN LOCALLY OR USING DEDICATED DEPLOYMENT

Use SmolVLM2 to ask questions about images, including documents and photos, and get answers in natural language.

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "InferenceImage",
            "name": "image"
        },
        {
            "type": "WorkflowParameter",
            "name": "confidence",
            "default_value": 0.4
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/smolvlm2@v1",
            "name": "model",
            "images": "$inputs.image",
            "task_type": "lmm",
            "prompt": "What is in this image?"
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "model_predictions",
            "coordinates_system": "own",
            "selector": "$steps.model.*"
        }
    ]
}

Prompting Stability-AI with arbitrary prompt¶

In this example, Stability-AI image generation model is prompted with arbitrary text from user

Workflow definition

{
    "version": "1.0",
    "inputs": [
        {
            "type": "WorkflowParameter",
            "name": "api_key"
        },
        {
            "type": "WorkflowParameter",
            "name": "prompt",
            "default_value": "Raccoon in space suit"
        },
        {
            "type": "InferenceImage",
            "name": "image"
        }
    ],
    "steps": [
        {
            "type": "roboflow_core/stability_ai_image_gen@v1",
            "name": "stability_ai_image_generation",
            "prompt": "$inputs.prompt",
            "api_key": "$inputs.api_key",
            "image": "$inputs.image",
            "strength": 0.3
        }
    ],
    "outputs": [
        {
            "type": "JsonField",
            "name": "stability_ai_image_generation",
            "coordinates_system": "own",
            "selector": "$steps.stability_ai_image_generation.image"
        }
    ]
}