Skip to content

Stitch OCR Detections

Class: StitchOCRDetectionsBlockV1

Source: inference.core.workflows.core_steps.transformations.stitch_ocr_detections.v1.StitchOCRDetectionsBlockV1

Combine individual OCR detection results (words, characters, or text regions) into coherent text strings by organizing detections spatially according to reading direction, grouping detections into lines, sorting them within lines, and concatenating text in proper reading order to reconstruct readable text from OCR model outputs.

How This Block Works

This block reconstructs readable text from individual OCR detections by organizing them spatially and concatenating text in proper reading order. The block:

  1. Receives OCR detection predictions containing individual text detections with bounding boxes and class names (text content)
  2. Prepares coordinates based on reading direction:
  3. For vertical reading directions, swaps x and y coordinates to enable vertical line processing
  4. For horizontal reading directions, uses coordinates as-is
  5. Groups detections into lines:
  6. Groups detections based on vertical position (or horizontal position for vertical text) using the tolerance parameter
  7. Detections within the tolerance distance are considered part of the same line
  8. Higher tolerance values group detections that are further apart, useful for text with variable line spacing
  9. Sorts lines based on reading direction:
  10. For left-to-right and vertical top-to-bottom: sorts lines from top to bottom
  11. For right-to-left and vertical bottom-to-top: sorts lines in reverse order (bottom to top)
  12. Sorts detections within each line:
  13. For left-to-right and vertical top-to-bottom: sorts detections by horizontal position (left to right, or top to bottom for vertical)
  14. For right-to-left and vertical bottom-to-top: sorts detections in reverse order (right to left, or bottom to top for vertical)
  15. Concatenates text in reading order:
  16. Extracts class names (text content) from detections in sorted order
  17. Adds line separators (newline for horizontal text, space for vertical text) between lines
  18. Optionally inserts a delimiter between each text element if specified
  19. Produces a single coherent text string with proper reading order
  20. Handles automatic reading direction detection (if "auto" is selected):
  21. Analyzes average width and height of detection bounding boxes
  22. If average width > average height: detects horizontal text (left-to-right)
  23. If average height >= average width: detects vertical text (top-to-bottom)
  24. Returns the stitched text string:
  25. Outputs a single text string under the ocr_text key
  26. Text is formatted with proper line breaks and spacing according to reading direction

The block enables reconstruction of multi-line text from individual OCR detections, maintaining proper reading order for different languages and writing systems. It handles both horizontal (left-to-right, right-to-left) and vertical (top-to-bottom, bottom-to-top) text orientations, making it useful for processing text in various languages and formats.

Common Use Cases

  • Text Reconstruction: Convert individual word or character detections from OCR models into readable text blocks (e.g., reconstruct documents from word detections, combine character detections into words, stitch OCR results into paragraphs), enabling text reconstruction workflows
  • Multi-Line Text Processing: Reconstruct multi-line text from OCR results with proper line breaks and formatting (e.g., extract paragraphs from OCR results, reconstruct formatted text, process multi-line documents), enabling multi-line text workflows
  • Multi-Language OCR: Process OCR results from different languages and writing systems (e.g., process Arabic right-to-left text, handle vertical Chinese/Japanese text, support multiple reading directions), enabling multi-language OCR workflows
  • Document Processing: Extract and reconstruct text from documents and images (e.g., extract text from scanned documents, process invoice text, extract text from forms), enabling document processing workflows
  • Text Extraction and Formatting: Extract text from images and format it for downstream use (e.g., extract text for database storage, format text for API responses, prepare text for analysis), enabling text extraction workflows
  • OCR Result Post-Processing: Post-process OCR model outputs to produce usable text strings (e.g., format OCR outputs, organize OCR results, prepare text for downstream blocks), enabling OCR post-processing workflows

Connecting to Other Blocks

This block receives OCR detection predictions and produces stitched text strings:

  • After OCR model blocks to convert detection results into readable text (e.g., OCR model to text string, OCR detections to formatted text, OCR results to text output), enabling OCR-to-text workflows
  • Before data storage blocks to store extracted text (e.g., store OCR text in databases, save extracted text, log OCR results), enabling text storage workflows
  • Before notification blocks to send extracted text in notifications (e.g., send OCR text in alerts, include extracted text in messages, notify with OCR results), enabling text notification workflows
  • Before text processing blocks to process stitched text (e.g., process text with NLP models, analyze extracted text, apply text transformations), enabling text processing workflows
  • Before API output blocks to provide text in API responses (e.g., return OCR text in API, format text for responses, provide extracted text output), enabling text output workflows
  • In workflow outputs to provide stitched text as final output (e.g., text extraction workflows, OCR output workflows, document processing workflows), enabling text output workflows

Requirements

This block requires OCR detection predictions (object detection format) with bounding boxes and class names containing text content. The tolerance parameter must be greater than zero and controls the vertical (or horizontal for vertical text) distance threshold for grouping detections into lines. The reading_direction parameter supports five modes: "left_to_right" (standard horizontal), "right_to_left" (Arabic-style), "vertical_top_to_bottom" (vertical), "vertical_bottom_to_top" (vertical reversed), and "auto" (automatic detection based on bounding box dimensions). The delimiter parameter is optional and inserts a delimiter between each text element (empty string by default, meaning no delimiter). The block outputs a single text string under the ocr_text key.

Type identifier

Use the following identifier in step "type" field: roboflow_core/stitch_ocr_detections@v1to add the block as as step in your workflow.

Properties

Name Type Description Refs
name str Enter a unique identifier for this step..
reading_direction str Direction to read and organize text detections. 'left_to_right': Standard horizontal reading (English, most languages). 'right_to_left': Right-to-left reading (Arabic, Hebrew). 'vertical_top_to_bottom': Vertical reading from top to bottom (Traditional Chinese, Japanese). 'vertical_bottom_to_top': Vertical reading from bottom to top (rare vertical formats). 'auto': Automatically detects reading direction based on average bounding box dimensions (width > height = horizontal, height >= width = vertical). Determines how detections are grouped into lines and sorted within lines..
tolerance int Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..
delimiter str Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections

Compatible Blocks

Check what blocks you can connect to Stitch OCR Detections in version v1.

Input and Output Bindings

The available connections depend on its binding kinds. Check what binding kinds Stitch OCR Detections in version v1 has.

Bindings
  • input

    • predictions (object_detection_prediction): OCR detection predictions from an OCR model. Should contain bounding boxes and class names with text content. Each detection represents a word, character, or text region that will be stitched together into coherent text. Supports object detection format with bounding boxes (xyxy) and class names in the data dictionary..
    • tolerance (integer): Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..
    • delimiter (string): Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..
  • output

    • ocr_text (string): String value.
Example JSON definition of step Stitch OCR Detections in version v1
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/stitch_ocr_detections@v1",
    "predictions": "$steps.ocr_model.predictions",
    "reading_direction": "left_to_right",
    "tolerance": 10,
    "delimiter": ""
}