Stitch OCR Detections¶
v2¶
Class: StitchOCRDetectionsBlockV2 (there are multiple versions of this block)
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Combine individual OCR detection results (words, characters, or text regions) into coherent text strings by organizing detections spatially, grouping them into lines, and concatenating text in proper reading order.
Stitching Algorithms¶
This block supports three algorithms for reconstructing text from OCR detections:
Tolerance-based (default)¶
Groups detections into lines using a fixed pixel tolerance. Detections within the tolerance distance vertically (or horizontally for vertical text) are grouped into the same line, then sorted by position within each line.
- Best for: Consistent font sizes and well-aligned horizontal/vertical text
- Parameters:
tolerance(pixel threshold for line grouping)
Otsu Thresholding¶
Uses Otsu's method on normalized gap distances to automatically find the optimal threshold separating character gaps from word gaps. Gaps are normalized by local character width, making it resolution-invariant.
- Best for: Variable font sizes, automatic word boundary detection
- Parameters:
otsu_threshold_multiplier(adjust threshold sensitivity) - Key feature: Detects bimodal distributions to distinguish single words from multi-word text
Collimate (Skewed Text)¶
Uses greedy parent-child traversal to follow text flow. Starting from the first detection, it finds subsequent detections that "follow" in reading order (similar alignment + correct direction), building lines through traversal rather than bucketing.
- Best for: Skewed, curved, or non-axis-aligned text
- Parameters:
collimate_tolerance(alignment tolerance in pixels) - Note: Does not detect word boundaries - use
delimiterparameter if spacing is needed
Reading Directions¶
All algorithms support multiple reading directions:
- left_to_right: Standard horizontal (English, most languages)
- right_to_left: Right-to-left (Arabic, Hebrew)
- vertical_top_to_bottom: Vertical top-to-bottom (Traditional Chinese, Japanese)
- vertical_bottom_to_top: Vertical bottom-to-top
- auto: Automatically detect based on bounding box dimensions
Common Use Cases¶
- Document OCR: Reconstruct paragraphs and lines from character/word detections
- Multi-language support: Handle different reading directions and writing systems
- Skewed text processing: Use collimate algorithm for tilted or curved text
- Word detection: Use Otsu algorithm to automatically insert spaces between words
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/stitch_ocr_detections@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
stitching_algorithm |
str |
Algorithm for grouping detections into words/lines. 'tolerance': Uses fixed pixel tolerance for line grouping (original algorithm). Good for consistent font sizes and line spacing. 'otsu': Uses Otsu's method on normalized gaps to find natural breaks between words. Resolution-invariant and works well with bimodal gap distributions. 'collimate': Uses greedy parent-child traversal to group detections. Good for skewed or curved text where bucket-based approaches fail.. | ❌ |
reading_direction |
str |
Direction to read and organize text detections. 'left_to_right': Standard horizontal reading (English, most languages). 'right_to_left': Right-to-left reading (Arabic, Hebrew). 'vertical_top_to_bottom': Vertical reading from top to bottom (Traditional Chinese, Japanese). 'vertical_bottom_to_top': Vertical reading from bottom to top (rare vertical formats). 'auto': Automatically detects reading direction based on average bounding box dimensions (width > height = horizontal, height >= width = vertical). Determines how detections are grouped into lines and sorted within lines.. | ❌ |
tolerance |
int |
Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero.. | ✅ |
delimiter |
str |
Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas.. | ✅ |
otsu_threshold_multiplier |
float |
Multiplier applied to the Otsu-computed threshold when using the 'otsu' stitching algorithm. Values > 1.0 make word breaks less frequent (more conservative, fewer splits), values < 1.0 make word breaks more frequent (more aggressive, more splits). Default is 1.0 (use Otsu threshold as-is). Try 1.3-1.5 if words are being incorrectly split, or 0.7-0.9 if words are being incorrectly merged.. | ✅ |
collimate_tolerance |
int |
Pixel tolerance for the 'collimate' stitching algorithm. Controls how much vertical (for horizontal text) or horizontal (for vertical text) deviation is allowed when determining if a detection follows another in reading order. Higher values handle more skewed text but may incorrectly merge separate lines. Default is 10 pixels.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Stitch OCR Detections in version v2.
- inputs:
Anthropic Claude,Detections Consensus,Detections Merge,Instance Segmentation Model,Webhook Sink,Multi-Label Classification Model,Email Notification,Dynamic Crop,VLM As Detector,VLM As Detector,Google Gemini,LMM,Path Deviation,Detection Offset,Line Counter,Byte Tracker,Object Detection Model,Template Matching,Image Contours,Path Deviation,Google Vision OCR,Detections Stitch,CSV Formatter,Detections Filter,Google Gemini,Local File Sink,Slack Notification,Detections Stabilizer,VLM As Classifier,PTZ Tracking (ONVIF).md),Roboflow Dataset Upload,Camera Focus,Detections Combine,Object Detection Model,Anthropic Claude,LMM For Classification,Llama 3.2 Vision,Keypoint Detection Model,Byte Tracker,Distance Measurement,Identify Changes,Detections Classes Replacement,SIFT Comparison,Camera Focus,Time in Zone,Velocity,Moondream2,SIFT Comparison,Florence-2 Model,Florence-2 Model,Twilio SMS/MMS Notification,Clip Comparison,Email Notification,OpenAI,Byte Tracker,Model Monitoring Inference Aggregator,Single-Label Classification Model,Detections List Roll-Up,OpenAI,OpenAI,Cosine Similarity,Time in Zone,Line Counter,CogVLM,Roboflow Custom Metadata,Gaze Detection,EasyOCR,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Google Gemini,Twilio SMS Notification,Detection Event Log,OCR Model,YOLO-World Model,Overlap Filter,Time in Zone,Pixel Color Count,Stitch OCR Detections,Motion Detection,OpenAI,Detections Transformation,Roboflow Dataset Upload - outputs:
Anthropic Claude,Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Webhook Sink,Email Notification,QR Code Generator,Dynamic Crop,CLIP Embedding Model,Google Gemini,LMM,SAM 3,Path Deviation,Image Blur,Corner Visualization,Line Counter,Stability AI Outpainting,Cache Set,Segment Anything 2 Model,Halo Visualization,Stability AI Inpainting,Path Deviation,Trace Visualization,Google Vision OCR,Morphological Transformation,Triangle Visualization,Instance Segmentation Model,Detections Stitch,Text Display,Google Gemini,Slack Notification,Local File Sink,Roboflow Dataset Upload,PTZ Tracking (ONVIF).md),Color Visualization,Dot Visualization,Polygon Visualization,Anthropic Claude,Llama 3.2 Vision,Line Counter Visualization,LMM For Classification,Contrast Equalization,Distance Measurement,Detections Classes Replacement,SIFT Comparison,Perception Encoder Embedding Model,Time in Zone,Circle Visualization,Moondream2,Seg Preview,Halo Visualization,Florence-2 Model,Twilio SMS/MMS Notification,Label Visualization,Clip Comparison,Email Notification,Ellipse Visualization,OpenAI,Image Preprocessing,Model Monitoring Inference Aggregator,SAM 3,OpenAI,Image Threshold,Model Comparison Visualization,Background Color Visualization,Size Measurement,OpenAI,Depth Estimation,Cache Get,Line Counter,Time in Zone,CogVLM,Roboflow Custom Metadata,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,Twilio SMS Notification,Polygon Visualization,SAM 3,Bounding Box Visualization,Polygon Zone Visualization,YOLO-World Model,Icon Visualization,Time in Zone,Stitch OCR Detections,Crop Visualization,Google Gemini,Pixel Color Count,OpenAI,Florence-2 Model,Roboflow Dataset Upload
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Stitch OCR Detections in version v2 has.
Bindings
-
input
predictions(object_detection_prediction): OCR detection predictions from an OCR model. Should contain bounding boxes and class names with text content. Each detection represents a word, character, or text region that will be stitched together into coherent text. Supports object detection format with bounding boxes (xyxy) and class names in the data dictionary..tolerance(integer): Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..delimiter(string): Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..otsu_threshold_multiplier(float): Multiplier applied to the Otsu-computed threshold when using the 'otsu' stitching algorithm. Values > 1.0 make word breaks less frequent (more conservative, fewer splits), values < 1.0 make word breaks more frequent (more aggressive, more splits). Default is 1.0 (use Otsu threshold as-is). Try 1.3-1.5 if words are being incorrectly split, or 0.7-0.9 if words are being incorrectly merged..collimate_tolerance(integer): Pixel tolerance for the 'collimate' stitching algorithm. Controls how much vertical (for horizontal text) or horizontal (for vertical text) deviation is allowed when determining if a detection follows another in reading order. Higher values handle more skewed text but may incorrectly merge separate lines. Default is 10 pixels..
-
output
ocr_text(string): String value.
Example JSON definition of step Stitch OCR Detections in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/stitch_ocr_detections@v2",
"stitching_algorithm": "tolerance",
"predictions": "$steps.ocr_model.predictions",
"reading_direction": "left_to_right",
"tolerance": 10,
"delimiter": "",
"otsu_threshold_multiplier": 1.0,
"collimate_tolerance": 5
}
v1¶
Class: StitchOCRDetectionsBlockV1 (there are multiple versions of this block)
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Combine individual OCR detection results (words, characters, or text regions) into coherent text strings by organizing detections spatially according to reading direction, grouping detections into lines, sorting them within lines, and concatenating text in proper reading order to reconstruct readable text from OCR model outputs.
How This Block Works¶
This block reconstructs readable text from individual OCR detections by organizing them spatially and concatenating text in proper reading order. The block:
- Receives OCR detection predictions containing individual text detections with bounding boxes and class names (text content)
- Prepares coordinates based on reading direction:
- For vertical reading directions, swaps x and y coordinates to enable vertical line processing
- For horizontal reading directions, uses coordinates as-is
- Groups detections into lines:
- Groups detections based on vertical position (or horizontal position for vertical text) using the tolerance parameter
- Detections within the tolerance distance are considered part of the same line
- Higher tolerance values group detections that are further apart, useful for text with variable line spacing
- Sorts lines based on reading direction:
- For left-to-right and vertical top-to-bottom: sorts lines from top to bottom
- For right-to-left and vertical bottom-to-top: sorts lines in reverse order (bottom to top)
- Sorts detections within each line:
- For left-to-right and vertical top-to-bottom: sorts detections by horizontal position (left to right, or top to bottom for vertical)
- For right-to-left and vertical bottom-to-top: sorts detections in reverse order (right to left, or bottom to top for vertical)
- Concatenates text in reading order:
- Extracts class names (text content) from detections in sorted order
- Adds line separators (newline for horizontal text, space for vertical text) between lines
- Optionally inserts a delimiter between each text element if specified
- Produces a single coherent text string with proper reading order
- Handles automatic reading direction detection (if "auto" is selected):
- Analyzes average width and height of detection bounding boxes
- If average width > average height: detects horizontal text (left-to-right)
- If average height >= average width: detects vertical text (top-to-bottom)
- Returns the stitched text string:
- Outputs a single text string under the
ocr_textkey - Text is formatted with proper line breaks and spacing according to reading direction
The block enables reconstruction of multi-line text from individual OCR detections, maintaining proper reading order for different languages and writing systems. It handles both horizontal (left-to-right, right-to-left) and vertical (top-to-bottom, bottom-to-top) text orientations, making it useful for processing text in various languages and formats.
Common Use Cases¶
- Text Reconstruction: Convert individual word or character detections from OCR models into readable text blocks (e.g., reconstruct documents from word detections, combine character detections into words, stitch OCR results into paragraphs), enabling text reconstruction workflows
- Multi-Line Text Processing: Reconstruct multi-line text from OCR results with proper line breaks and formatting (e.g., extract paragraphs from OCR results, reconstruct formatted text, process multi-line documents), enabling multi-line text workflows
- Multi-Language OCR: Process OCR results from different languages and writing systems (e.g., process Arabic right-to-left text, handle vertical Chinese/Japanese text, support multiple reading directions), enabling multi-language OCR workflows
- Document Processing: Extract and reconstruct text from documents and images (e.g., extract text from scanned documents, process invoice text, extract text from forms), enabling document processing workflows
- Text Extraction and Formatting: Extract text from images and format it for downstream use (e.g., extract text for database storage, format text for API responses, prepare text for analysis), enabling text extraction workflows
- OCR Result Post-Processing: Post-process OCR model outputs to produce usable text strings (e.g., format OCR outputs, organize OCR results, prepare text for downstream blocks), enabling OCR post-processing workflows
Connecting to Other Blocks¶
This block receives OCR detection predictions and produces stitched text strings:
- After OCR model blocks to convert detection results into readable text (e.g., OCR model to text string, OCR detections to formatted text, OCR results to text output), enabling OCR-to-text workflows
- Before data storage blocks to store extracted text (e.g., store OCR text in databases, save extracted text, log OCR results), enabling text storage workflows
- Before notification blocks to send extracted text in notifications (e.g., send OCR text in alerts, include extracted text in messages, notify with OCR results), enabling text notification workflows
- Before text processing blocks to process stitched text (e.g., process text with NLP models, analyze extracted text, apply text transformations), enabling text processing workflows
- Before API output blocks to provide text in API responses (e.g., return OCR text in API, format text for responses, provide extracted text output), enabling text output workflows
- In workflow outputs to provide stitched text as final output (e.g., text extraction workflows, OCR output workflows, document processing workflows), enabling text output workflows
Requirements¶
This block requires OCR detection predictions (object detection format) with bounding boxes and class names containing text content. The tolerance parameter must be greater than zero and controls the vertical (or horizontal for vertical text) distance threshold for grouping detections into lines. The reading_direction parameter supports five modes: "left_to_right" (standard horizontal), "right_to_left" (Arabic-style), "vertical_top_to_bottom" (vertical), "vertical_bottom_to_top" (vertical reversed), and "auto" (automatic detection based on bounding box dimensions). The delimiter parameter is optional and inserts a delimiter between each text element (empty string by default, meaning no delimiter). The block outputs a single text string under the ocr_text key.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/stitch_ocr_detections@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
reading_direction |
str |
Direction to read and organize text detections. 'left_to_right': Standard horizontal reading (English, most languages). 'right_to_left': Right-to-left reading (Arabic, Hebrew). 'vertical_top_to_bottom': Vertical reading from top to bottom (Traditional Chinese, Japanese). 'vertical_bottom_to_top': Vertical reading from bottom to top (rare vertical formats). 'auto': Automatically detects reading direction based on average bounding box dimensions (width > height = horizontal, height >= width = vertical). Determines how detections are grouped into lines and sorted within lines.. | ❌ |
tolerance |
int |
Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero.. | ✅ |
delimiter |
str |
Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Stitch OCR Detections in version v1.
- inputs:
Anthropic Claude,Detections Consensus,Detections Merge,Instance Segmentation Model,Webhook Sink,Multi-Label Classification Model,Email Notification,Dynamic Crop,VLM As Detector,VLM As Detector,Google Gemini,LMM,Path Deviation,Detection Offset,Line Counter,Byte Tracker,Object Detection Model,Template Matching,Image Contours,Path Deviation,Google Vision OCR,Detections Stitch,CSV Formatter,Detections Filter,Google Gemini,Local File Sink,Slack Notification,Detections Stabilizer,VLM As Classifier,PTZ Tracking (ONVIF).md),Roboflow Dataset Upload,Detections Combine,Object Detection Model,Anthropic Claude,LMM For Classification,Llama 3.2 Vision,Keypoint Detection Model,Byte Tracker,Distance Measurement,Detections Classes Replacement,SIFT Comparison,Time in Zone,Velocity,Moondream2,SIFT Comparison,Florence-2 Model,Florence-2 Model,Twilio SMS/MMS Notification,Clip Comparison,Email Notification,OpenAI,Byte Tracker,Model Monitoring Inference Aggregator,Single-Label Classification Model,Detections List Roll-Up,OpenAI,OpenAI,Time in Zone,Line Counter,CogVLM,Roboflow Custom Metadata,EasyOCR,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Google Gemini,Twilio SMS Notification,Detection Event Log,OCR Model,YOLO-World Model,Overlap Filter,Time in Zone,Pixel Color Count,Stitch OCR Detections,Motion Detection,OpenAI,Detections Transformation,Roboflow Dataset Upload - outputs:
Anthropic Claude,Mask Visualization,Classification Label Visualization,Instance Segmentation Model,Webhook Sink,Email Notification,QR Code Generator,Dynamic Crop,CLIP Embedding Model,Google Gemini,LMM,SAM 3,Path Deviation,Image Blur,Corner Visualization,Line Counter,Stability AI Outpainting,Cache Set,Segment Anything 2 Model,Halo Visualization,Stability AI Inpainting,Path Deviation,Trace Visualization,Google Vision OCR,Morphological Transformation,Triangle Visualization,Instance Segmentation Model,Detections Stitch,Text Display,Google Gemini,Slack Notification,Local File Sink,Roboflow Dataset Upload,PTZ Tracking (ONVIF).md),Color Visualization,Dot Visualization,Polygon Visualization,Anthropic Claude,Llama 3.2 Vision,Line Counter Visualization,LMM For Classification,Contrast Equalization,Distance Measurement,Detections Classes Replacement,SIFT Comparison,Perception Encoder Embedding Model,Time in Zone,Circle Visualization,Moondream2,Seg Preview,Halo Visualization,Florence-2 Model,Twilio SMS/MMS Notification,Label Visualization,Clip Comparison,Email Notification,Ellipse Visualization,OpenAI,Image Preprocessing,Model Monitoring Inference Aggregator,SAM 3,OpenAI,Image Threshold,Model Comparison Visualization,Background Color Visualization,Size Measurement,OpenAI,Depth Estimation,Cache Get,Line Counter,Time in Zone,CogVLM,Roboflow Custom Metadata,Stitch OCR Detections,Perspective Correction,Anthropic Claude,Stability AI Image Generation,Reference Path Visualization,Keypoint Visualization,Twilio SMS Notification,Polygon Visualization,SAM 3,Bounding Box Visualization,Polygon Zone Visualization,YOLO-World Model,Icon Visualization,Time in Zone,Stitch OCR Detections,Crop Visualization,Google Gemini,Pixel Color Count,OpenAI,Florence-2 Model,Roboflow Dataset Upload
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Stitch OCR Detections in version v1 has.
Bindings
-
input
predictions(object_detection_prediction): OCR detection predictions from an OCR model. Should contain bounding boxes and class names with text content. Each detection represents a word, character, or text region that will be stitched together into coherent text. Supports object detection format with bounding boxes (xyxy) and class names in the data dictionary..tolerance(integer): Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..delimiter(string): Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..
-
output
ocr_text(string): String value.
Example JSON definition of step Stitch OCR Detections in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/stitch_ocr_detections@v1",
"predictions": "$steps.ocr_model.predictions",
"reading_direction": "left_to_right",
"tolerance": 10,
"delimiter": ""
}