Stitch OCR Detections¶
Class: StitchOCRDetectionsBlockV1
Combine individual OCR detection results (words, characters, or text regions) into coherent text strings by organizing detections spatially according to reading direction, grouping detections into lines, sorting them within lines, and concatenating text in proper reading order to reconstruct readable text from OCR model outputs.
How This Block Works¶
This block reconstructs readable text from individual OCR detections by organizing them spatially and concatenating text in proper reading order. The block:
- Receives OCR detection predictions containing individual text detections with bounding boxes and class names (text content)
- Prepares coordinates based on reading direction:
- For vertical reading directions, swaps x and y coordinates to enable vertical line processing
- For horizontal reading directions, uses coordinates as-is
- Groups detections into lines:
- Groups detections based on vertical position (or horizontal position for vertical text) using the tolerance parameter
- Detections within the tolerance distance are considered part of the same line
- Higher tolerance values group detections that are further apart, useful for text with variable line spacing
- Sorts lines based on reading direction:
- For left-to-right and vertical top-to-bottom: sorts lines from top to bottom
- For right-to-left and vertical bottom-to-top: sorts lines in reverse order (bottom to top)
- Sorts detections within each line:
- For left-to-right and vertical top-to-bottom: sorts detections by horizontal position (left to right, or top to bottom for vertical)
- For right-to-left and vertical bottom-to-top: sorts detections in reverse order (right to left, or bottom to top for vertical)
- Concatenates text in reading order:
- Extracts class names (text content) from detections in sorted order
- Adds line separators (newline for horizontal text, space for vertical text) between lines
- Optionally inserts a delimiter between each text element if specified
- Produces a single coherent text string with proper reading order
- Handles automatic reading direction detection (if "auto" is selected):
- Analyzes average width and height of detection bounding boxes
- If average width > average height: detects horizontal text (left-to-right)
- If average height >= average width: detects vertical text (top-to-bottom)
- Returns the stitched text string:
- Outputs a single text string under the
ocr_textkey - Text is formatted with proper line breaks and spacing according to reading direction
The block enables reconstruction of multi-line text from individual OCR detections, maintaining proper reading order for different languages and writing systems. It handles both horizontal (left-to-right, right-to-left) and vertical (top-to-bottom, bottom-to-top) text orientations, making it useful for processing text in various languages and formats.
Common Use Cases¶
- Text Reconstruction: Convert individual word or character detections from OCR models into readable text blocks (e.g., reconstruct documents from word detections, combine character detections into words, stitch OCR results into paragraphs), enabling text reconstruction workflows
- Multi-Line Text Processing: Reconstruct multi-line text from OCR results with proper line breaks and formatting (e.g., extract paragraphs from OCR results, reconstruct formatted text, process multi-line documents), enabling multi-line text workflows
- Multi-Language OCR: Process OCR results from different languages and writing systems (e.g., process Arabic right-to-left text, handle vertical Chinese/Japanese text, support multiple reading directions), enabling multi-language OCR workflows
- Document Processing: Extract and reconstruct text from documents and images (e.g., extract text from scanned documents, process invoice text, extract text from forms), enabling document processing workflows
- Text Extraction and Formatting: Extract text from images and format it for downstream use (e.g., extract text for database storage, format text for API responses, prepare text for analysis), enabling text extraction workflows
- OCR Result Post-Processing: Post-process OCR model outputs to produce usable text strings (e.g., format OCR outputs, organize OCR results, prepare text for downstream blocks), enabling OCR post-processing workflows
Connecting to Other Blocks¶
This block receives OCR detection predictions and produces stitched text strings:
- After OCR model blocks to convert detection results into readable text (e.g., OCR model to text string, OCR detections to formatted text, OCR results to text output), enabling OCR-to-text workflows
- Before data storage blocks to store extracted text (e.g., store OCR text in databases, save extracted text, log OCR results), enabling text storage workflows
- Before notification blocks to send extracted text in notifications (e.g., send OCR text in alerts, include extracted text in messages, notify with OCR results), enabling text notification workflows
- Before text processing blocks to process stitched text (e.g., process text with NLP models, analyze extracted text, apply text transformations), enabling text processing workflows
- Before API output blocks to provide text in API responses (e.g., return OCR text in API, format text for responses, provide extracted text output), enabling text output workflows
- In workflow outputs to provide stitched text as final output (e.g., text extraction workflows, OCR output workflows, document processing workflows), enabling text output workflows
Requirements¶
This block requires OCR detection predictions (object detection format) with bounding boxes and class names containing text content. The tolerance parameter must be greater than zero and controls the vertical (or horizontal for vertical text) distance threshold for grouping detections into lines. The reading_direction parameter supports five modes: "left_to_right" (standard horizontal), "right_to_left" (Arabic-style), "vertical_top_to_bottom" (vertical), "vertical_bottom_to_top" (vertical reversed), and "auto" (automatic detection based on bounding box dimensions). The delimiter parameter is optional and inserts a delimiter between each text element (empty string by default, meaning no delimiter). The block outputs a single text string under the ocr_text key.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/stitch_ocr_detections@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
reading_direction |
str |
Direction to read and organize text detections. 'left_to_right': Standard horizontal reading (English, most languages). 'right_to_left': Right-to-left reading (Arabic, Hebrew). 'vertical_top_to_bottom': Vertical reading from top to bottom (Traditional Chinese, Japanese). 'vertical_bottom_to_top': Vertical reading from bottom to top (rare vertical formats). 'auto': Automatically detects reading direction based on average bounding box dimensions (width > height = horizontal, height >= width = vertical). Determines how detections are grouped into lines and sorted within lines.. | ❌ |
tolerance |
int |
Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero.. | ✅ |
delimiter |
str |
Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Stitch OCR Detections in version v1.
- inputs:
Clip Comparison,Florence-2 Model,Google Gemini,LMM,Instance Segmentation Model,Motion Detection,Email Notification,Detections Stitch,Roboflow Custom Metadata,Anthropic Claude,Multi-Label Classification Model,Detections Merge,Detection Offset,Pixel Color Count,LMM For Classification,Keypoint Detection Model,Anthropic Claude,Email Notification,Stitch OCR Detections,Overlap Filter,OpenAI,Time in Zone,Detection Event Log,Roboflow Dataset Upload,Detections Transformation,YOLO-World Model,Google Gemini,CogVLM,Local File Sink,VLM as Detector,Florence-2 Model,Time in Zone,Byte Tracker,Dynamic Crop,Time in Zone,Moondream2,OCR Model,PTZ Tracking (ONVIF).md),Twilio SMS Notification,Path Deviation,Perspective Correction,Twilio SMS/MMS Notification,EasyOCR,Detections List Roll-Up,Google Gemini,Line Counter,Object Detection Model,Detections Consensus,OpenAI,Roboflow Dataset Upload,Webhook Sink,Object Detection Model,VLM as Detector,Single-Label Classification Model,SIFT Comparison,Byte Tracker,Byte Tracker,Slack Notification,OpenAI,Image Contours,VLM as Classifier,Google Vision OCR,Llama 3.2 Vision,Path Deviation,Detections Combine,OpenAI,Detections Classes Replacement,Template Matching,Velocity,Model Monitoring Inference Aggregator,Line Counter,Detections Stabilizer,SIFT Comparison,Distance Measurement,CSV Formatter,Detections Filter - outputs:
Instance Segmentation Model,Clip Comparison,Florence-2 Model,Morphological Transformation,Google Gemini,LMM,Instance Segmentation Model,Email Notification,Polygon Zone Visualization,Detections Stitch,Keypoint Visualization,Roboflow Custom Metadata,Anthropic Claude,Pixel Color Count,Image Threshold,LMM For Classification,Anthropic Claude,Email Notification,Reference Path Visualization,Stitch OCR Detections,Stability AI Image Generation,Stability AI Outpainting,Time in Zone,OpenAI,Roboflow Dataset Upload,Depth Estimation,YOLO-World Model,Google Gemini,CogVLM,Image Preprocessing,Local File Sink,Florence-2 Model,SAM 3,Time in Zone,Dynamic Crop,Time in Zone,Perception Encoder Embedding Model,Moondream2,Dot Visualization,Triangle Visualization,Seg Preview,Crop Visualization,PTZ Tracking (ONVIF).md),Twilio SMS Notification,Path Deviation,Twilio SMS/MMS Notification,Perspective Correction,SAM 3,Google Gemini,Line Counter,Trace Visualization,QR Code Generator,OpenAI,CLIP Embedding Model,Roboflow Dataset Upload,Webhook Sink,Bounding Box Visualization,Cache Set,Contrast Equalization,Halo Visualization,Model Comparison Visualization,Slack Notification,Label Visualization,OpenAI,Circle Visualization,Cache Get,Image Blur,Background Color Visualization,Mask Visualization,Size Measurement,Google Vision OCR,Llama 3.2 Vision,Path Deviation,Corner Visualization,Color Visualization,Classification Label Visualization,Segment Anything 2 Model,OpenAI,Detections Classes Replacement,Line Counter Visualization,Icon Visualization,Ellipse Visualization,Model Monitoring Inference Aggregator,Line Counter,Polygon Visualization,SIFT Comparison,Stability AI Inpainting,SAM 3,Distance Measurement,Text Display
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Stitch OCR Detections in version v1 has.
Bindings
-
input
predictions(object_detection_prediction): OCR detection predictions from an OCR model. Should contain bounding boxes and class names with text content. Each detection represents a word, character, or text region that will be stitched together into coherent text. Supports object detection format with bounding boxes (xyxy) and class names in the data dictionary..tolerance(integer): Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..delimiter(string): Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..
-
output
ocr_text(string): String value.
Example JSON definition of step Stitch OCR Detections in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/stitch_ocr_detections@v1",
"predictions": "$steps.ocr_model.predictions",
"reading_direction": "left_to_right",
"tolerance": 10,
"delimiter": ""
}