Skip to content

semantic_segmentation_prediction Kind

Prediction with per-pixel class label and confidence for semantic segmentation

Data representation

Data representation

This kind has a different internal and external representation. External representation is relevant for integration with your workflow, whereas internal one is an implementation detail useful for Workflows blocks development.

External

External data representation is relevant for Workflows clients - it dictates what is the input and output format of data.

Type: dict

Internal

Internal data representation is relevant for Workflows blocks creators - this is the type that will be provided by Execution Engine in runtime to the block that consumes input of this kind.

Type: sv.Detections

Details

This kind represents a single semantic segmentation prediction as an sv.Detections(...) object with one detection per predicted class. Each detection carries an RLE-encoded mask covering all pixels assigned to that class.

Why RLE and not polygons:

Semantic segmentation assigns a class label to every pixel in the image. A single class can appear in multiple spatially disconnected regions (e.g., two separate "person" regions on opposite sides of the frame). Polygon-based serialization uses cv2.findContours(), which only retains the first contiguous contour and silently discards all others — causing irreversible data loss for non-contiguous masks. RLE (Run-Length Encoding, COCO standard) is a pixel-level encoding that represents the complete mask regardless of spatial topology, making it the only correct serialization format for semantic segmentation masks.

Internal representation: sv.Detections with: - xyxy — tight bounding box enclosing all pixels of the class - class_id — integer class ID - confidence — mean confidence over all pixels of the class - data["class_name"] — class label string - data["rle_mask"] — numpy object array of COCO RLE dicts {"size": [H, W], "counts": "..."}

Serialised format (one entry per class in predictions):

{
    "image": {"width": 640, "height": 480},
    "predictions": [
        {
            "x": 320.0, "y": 240.0, "width": 200.0, "height": 180.0,
            "confidence": 0.92,
            "class_id": 1,
            "class": "person",
            "detection_id": "a1b2c3d4-...",
            "rle_mask": {"size": [480, 640], "counts": "XYZ..."}
        }
    ]
}

Decoding RLE masks:

import pycocotools.mask as mask_utils
import numpy as np

rle = prediction["rle_mask"]
binary_mask = mask_utils.decode(rle).astype(bool)  # shape: (H, W)