Stitch OCR Detections¶
v2¶
Class: StitchOCRDetectionsBlockV2 (there are multiple versions of this block)
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Combine individual OCR detection results (words, characters, or text regions) into coherent text strings by organizing detections spatially, grouping them into lines, and concatenating text in proper reading order.
Stitching Algorithms¶
This block supports three algorithms for reconstructing text from OCR detections:
Tolerance-based (default)¶
Groups detections into lines using a fixed pixel tolerance. Detections within the tolerance distance vertically (or horizontally for vertical text) are grouped into the same line, then sorted by position within each line.
- Best for: Consistent font sizes and well-aligned horizontal/vertical text
- Parameters:
tolerance(pixel threshold for line grouping)
Otsu Thresholding¶
Uses Otsu's method on normalized gap distances to automatically find the optimal threshold separating character gaps from word gaps. Gaps are normalized by local character width, making it resolution-invariant.
- Best for: Variable font sizes, automatic word boundary detection
- Parameters:
otsu_threshold_multiplier(adjust threshold sensitivity) - Key feature: Detects bimodal distributions to distinguish single words from multi-word text
Collimate (Skewed Text)¶
Uses greedy parent-child traversal to follow text flow. Starting from the first detection, it finds subsequent detections that "follow" in reading order (similar alignment + correct direction), building lines through traversal rather than bucketing.
- Best for: Skewed, curved, or non-axis-aligned text
- Parameters:
collimate_tolerance(alignment tolerance in pixels) - Note: Does not detect word boundaries - use
delimiterparameter if spacing is needed
Reading Directions¶
All algorithms support multiple reading directions:
- left_to_right: Standard horizontal (English, most languages)
- right_to_left: Right-to-left (Arabic, Hebrew)
- vertical_top_to_bottom: Vertical top-to-bottom (Traditional Chinese, Japanese)
- vertical_bottom_to_top: Vertical bottom-to-top
- auto: Automatically detect based on bounding box dimensions
Common Use Cases¶
- Document OCR: Reconstruct paragraphs and lines from character/word detections
- Multi-language support: Handle different reading directions and writing systems
- Skewed text processing: Use collimate algorithm for tilted or curved text
- Word detection: Use Otsu algorithm to automatically insert spaces between words
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/stitch_ocr_detections@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
stitching_algorithm |
str |
Algorithm for grouping detections into words/lines. 'tolerance': Uses fixed pixel tolerance for line grouping (original algorithm). Good for consistent font sizes and line spacing. 'otsu': Uses Otsu's method on normalized gaps to find natural breaks between words. Resolution-invariant and works well with bimodal gap distributions. 'collimate': Uses greedy parent-child traversal to group detections. Good for skewed or curved text where bucket-based approaches fail.. | ❌ |
reading_direction |
str |
Direction to read and organize text detections. 'left_to_right': Standard horizontal reading (English, most languages). 'right_to_left': Right-to-left reading (Arabic, Hebrew). 'vertical_top_to_bottom': Vertical reading from top to bottom (Traditional Chinese, Japanese). 'vertical_bottom_to_top': Vertical reading from bottom to top (rare vertical formats). 'auto': Automatically detects reading direction based on average bounding box dimensions (width > height = horizontal, height >= width = vertical). Determines how detections are grouped into lines and sorted within lines.. | ❌ |
tolerance |
int |
Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero.. | ✅ |
delimiter |
str |
Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas.. | ✅ |
otsu_threshold_multiplier |
float |
Multiplier applied to the Otsu-computed threshold when using the 'otsu' stitching algorithm. Values > 1.0 make word breaks less frequent (more conservative, fewer splits), values < 1.0 make word breaks more frequent (more aggressive, more splits). Default is 1.0 (use Otsu threshold as-is). Try 1.3-1.5 if words are being incorrectly split, or 0.7-0.9 if words are being incorrectly merged.. | ✅ |
collimate_tolerance |
int |
Pixel tolerance for the 'collimate' stitching algorithm. Controls how much vertical (for horizontal text) or horizontal (for vertical text) deviation is allowed when determining if a detection follows another in reading order. Higher values handle more skewed text but may incorrectly merge separate lines. Default is 10 pixels.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Stitch OCR Detections in version v2.
- inputs:
CSV Formatter,SIFT Comparison,Florence-2 Model,OCR Model,Google Gemini,Single-Label Classification Model,Webhook Sink,Camera Focus,Detections Filter,Detection Offset,Line Counter,Distance Measurement,Object Detection Model,Detection Event Log,Detections Stabilizer,OpenAI,OpenAI,Detections List Roll-Up,Slack Notification,Path Deviation,Roboflow Dataset Upload,YOLO-World Model,LMM For Classification,VLM as Classifier,Pixel Color Count,Twilio SMS/MMS Notification,Model Monitoring Inference Aggregator,Object Detection Model,Clip Comparison,Detections Merge,Roboflow Dataset Upload,Anthropic Claude,Time in Zone,Detections Classes Replacement,Template Matching,Line Counter,OpenAI,Byte Tracker,Email Notification,Image Contours,Google Gemini,Stitch OCR Detections,Roboflow Custom Metadata,Google Vision OCR,VLM as Detector,Detections Consensus,Camera Focus,Detections Combine,Byte Tracker,Multi-Label Classification Model,PTZ Tracking (ONVIF).md),LMM,Anthropic Claude,Time in Zone,Stitch OCR Detections,Dynamic Crop,Anthropic Claude,Keypoint Detection Model,Detections Transformation,Florence-2 Model,Time in Zone,Detections Stitch,Google Gemini,Byte Tracker,Moondream2,Overlap Filter,Twilio SMS Notification,Gaze Detection,Instance Segmentation Model,Identify Changes,Perspective Correction,Email Notification,Motion Detection,Cosine Similarity,SIFT Comparison,Path Deviation,EasyOCR,Local File Sink,CogVLM,OpenAI,Velocity,Llama 3.2 Vision,VLM as Detector - outputs:
Corner Visualization,Label Visualization,Image Blur,Florence-2 Model,SIFT Comparison,Google Gemini,Ellipse Visualization,Halo Visualization,Webhook Sink,Contrast Equalization,Stability AI Outpainting,Perception Encoder Embedding Model,Model Comparison Visualization,Line Counter,Distance Measurement,Polygon Visualization,Stability AI Inpainting,Reference Path Visualization,OpenAI,OpenAI,Slack Notification,Circle Visualization,Stability AI Image Generation,Roboflow Dataset Upload,Path Deviation,Icon Visualization,LMM For Classification,YOLO-World Model,Pixel Color Count,Cache Get,Twilio SMS/MMS Notification,Model Monitoring Inference Aggregator,Color Visualization,Clip Comparison,SAM 3,Mask Visualization,Roboflow Dataset Upload,Anthropic Claude,Time in Zone,Detections Classes Replacement,Line Counter,OpenAI,CLIP Embedding Model,Instance Segmentation Model,Email Notification,Google Gemini,Stitch OCR Detections,Text Display,Roboflow Custom Metadata,Triangle Visualization,Google Vision OCR,SAM 3,Classification Label Visualization,Image Threshold,LMM,PTZ Tracking (ONVIF).md),Dot Visualization,Anthropic Claude,Time in Zone,Stitch OCR Detections,Background Color Visualization,Seg Preview,Polygon Zone Visualization,Keypoint Visualization,Anthropic Claude,Dynamic Crop,Trace Visualization,SAM 3,Crop Visualization,Line Counter Visualization,Florence-2 Model,Time in Zone,Google Gemini,Detections Stitch,Segment Anything 2 Model,Moondream2,Twilio SMS Notification,Image Preprocessing,Instance Segmentation Model,Perspective Correction,Email Notification,Halo Visualization,Path Deviation,Local File Sink,Depth Estimation,CogVLM,Morphological Transformation,Polygon Visualization,OpenAI,QR Code Generator,Cache Set,Llama 3.2 Vision,Bounding Box Visualization,Size Measurement
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Stitch OCR Detections in version v2 has.
Bindings
-
input
predictions(object_detection_prediction): OCR detection predictions from an OCR model. Should contain bounding boxes and class names with text content. Each detection represents a word, character, or text region that will be stitched together into coherent text. Supports object detection format with bounding boxes (xyxy) and class names in the data dictionary..tolerance(integer): Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..delimiter(string): Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..otsu_threshold_multiplier(float): Multiplier applied to the Otsu-computed threshold when using the 'otsu' stitching algorithm. Values > 1.0 make word breaks less frequent (more conservative, fewer splits), values < 1.0 make word breaks more frequent (more aggressive, more splits). Default is 1.0 (use Otsu threshold as-is). Try 1.3-1.5 if words are being incorrectly split, or 0.7-0.9 if words are being incorrectly merged..collimate_tolerance(integer): Pixel tolerance for the 'collimate' stitching algorithm. Controls how much vertical (for horizontal text) or horizontal (for vertical text) deviation is allowed when determining if a detection follows another in reading order. Higher values handle more skewed text but may incorrectly merge separate lines. Default is 10 pixels..
-
output
ocr_text(string): String value.
Example JSON definition of step Stitch OCR Detections in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/stitch_ocr_detections@v2",
"stitching_algorithm": "tolerance",
"predictions": "$steps.ocr_model.predictions",
"reading_direction": "left_to_right",
"tolerance": 10,
"delimiter": "",
"otsu_threshold_multiplier": 1.0,
"collimate_tolerance": 5
}
v1¶
Class: StitchOCRDetectionsBlockV1 (there are multiple versions of this block)
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Combine individual OCR detection results (words, characters, or text regions) into coherent text strings by organizing detections spatially according to reading direction, grouping detections into lines, sorting them within lines, and concatenating text in proper reading order to reconstruct readable text from OCR model outputs.
How This Block Works¶
This block reconstructs readable text from individual OCR detections by organizing them spatially and concatenating text in proper reading order. The block:
- Receives OCR detection predictions containing individual text detections with bounding boxes and class names (text content)
- Prepares coordinates based on reading direction:
- For vertical reading directions, swaps x and y coordinates to enable vertical line processing
- For horizontal reading directions, uses coordinates as-is
- Groups detections into lines:
- Groups detections based on vertical position (or horizontal position for vertical text) using the tolerance parameter
- Detections within the tolerance distance are considered part of the same line
- Higher tolerance values group detections that are further apart, useful for text with variable line spacing
- Sorts lines based on reading direction:
- For left-to-right and vertical top-to-bottom: sorts lines from top to bottom
- For right-to-left and vertical bottom-to-top: sorts lines in reverse order (bottom to top)
- Sorts detections within each line:
- For left-to-right and vertical top-to-bottom: sorts detections by horizontal position (left to right, or top to bottom for vertical)
- For right-to-left and vertical bottom-to-top: sorts detections in reverse order (right to left, or bottom to top for vertical)
- Concatenates text in reading order:
- Extracts class names (text content) from detections in sorted order
- Adds line separators (newline for horizontal text, space for vertical text) between lines
- Optionally inserts a delimiter between each text element if specified
- Produces a single coherent text string with proper reading order
- Handles automatic reading direction detection (if "auto" is selected):
- Analyzes average width and height of detection bounding boxes
- If average width > average height: detects horizontal text (left-to-right)
- If average height >= average width: detects vertical text (top-to-bottom)
- Returns the stitched text string:
- Outputs a single text string under the
ocr_textkey - Text is formatted with proper line breaks and spacing according to reading direction
The block enables reconstruction of multi-line text from individual OCR detections, maintaining proper reading order for different languages and writing systems. It handles both horizontal (left-to-right, right-to-left) and vertical (top-to-bottom, bottom-to-top) text orientations, making it useful for processing text in various languages and formats.
Common Use Cases¶
- Text Reconstruction: Convert individual word or character detections from OCR models into readable text blocks (e.g., reconstruct documents from word detections, combine character detections into words, stitch OCR results into paragraphs), enabling text reconstruction workflows
- Multi-Line Text Processing: Reconstruct multi-line text from OCR results with proper line breaks and formatting (e.g., extract paragraphs from OCR results, reconstruct formatted text, process multi-line documents), enabling multi-line text workflows
- Multi-Language OCR: Process OCR results from different languages and writing systems (e.g., process Arabic right-to-left text, handle vertical Chinese/Japanese text, support multiple reading directions), enabling multi-language OCR workflows
- Document Processing: Extract and reconstruct text from documents and images (e.g., extract text from scanned documents, process invoice text, extract text from forms), enabling document processing workflows
- Text Extraction and Formatting: Extract text from images and format it for downstream use (e.g., extract text for database storage, format text for API responses, prepare text for analysis), enabling text extraction workflows
- OCR Result Post-Processing: Post-process OCR model outputs to produce usable text strings (e.g., format OCR outputs, organize OCR results, prepare text for downstream blocks), enabling OCR post-processing workflows
Connecting to Other Blocks¶
This block receives OCR detection predictions and produces stitched text strings:
- After OCR model blocks to convert detection results into readable text (e.g., OCR model to text string, OCR detections to formatted text, OCR results to text output), enabling OCR-to-text workflows
- Before data storage blocks to store extracted text (e.g., store OCR text in databases, save extracted text, log OCR results), enabling text storage workflows
- Before notification blocks to send extracted text in notifications (e.g., send OCR text in alerts, include extracted text in messages, notify with OCR results), enabling text notification workflows
- Before text processing blocks to process stitched text (e.g., process text with NLP models, analyze extracted text, apply text transformations), enabling text processing workflows
- Before API output blocks to provide text in API responses (e.g., return OCR text in API, format text for responses, provide extracted text output), enabling text output workflows
- In workflow outputs to provide stitched text as final output (e.g., text extraction workflows, OCR output workflows, document processing workflows), enabling text output workflows
Requirements¶
This block requires OCR detection predictions (object detection format) with bounding boxes and class names containing text content. The tolerance parameter must be greater than zero and controls the vertical (or horizontal for vertical text) distance threshold for grouping detections into lines. The reading_direction parameter supports five modes: "left_to_right" (standard horizontal), "right_to_left" (Arabic-style), "vertical_top_to_bottom" (vertical), "vertical_bottom_to_top" (vertical reversed), and "auto" (automatic detection based on bounding box dimensions). The delimiter parameter is optional and inserts a delimiter between each text element (empty string by default, meaning no delimiter). The block outputs a single text string under the ocr_text key.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/stitch_ocr_detections@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
reading_direction |
str |
Direction to read and organize text detections. 'left_to_right': Standard horizontal reading (English, most languages). 'right_to_left': Right-to-left reading (Arabic, Hebrew). 'vertical_top_to_bottom': Vertical reading from top to bottom (Traditional Chinese, Japanese). 'vertical_bottom_to_top': Vertical reading from bottom to top (rare vertical formats). 'auto': Automatically detects reading direction based on average bounding box dimensions (width > height = horizontal, height >= width = vertical). Determines how detections are grouped into lines and sorted within lines.. | ❌ |
tolerance |
int |
Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero.. | ✅ |
delimiter |
str |
Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Stitch OCR Detections in version v1.
- inputs:
CSV Formatter,SIFT Comparison,Florence-2 Model,OCR Model,Google Gemini,Single-Label Classification Model,Webhook Sink,Detections Filter,Detection Offset,Line Counter,Distance Measurement,Object Detection Model,Detection Event Log,Detections Stabilizer,OpenAI,OpenAI,Detections List Roll-Up,Slack Notification,Path Deviation,Roboflow Dataset Upload,YOLO-World Model,LMM For Classification,VLM as Classifier,Pixel Color Count,Twilio SMS/MMS Notification,Model Monitoring Inference Aggregator,Object Detection Model,Clip Comparison,Detections Merge,Roboflow Dataset Upload,Anthropic Claude,Time in Zone,Detections Classes Replacement,Template Matching,Line Counter,OpenAI,Byte Tracker,Email Notification,Image Contours,Google Gemini,Stitch OCR Detections,Roboflow Custom Metadata,Google Vision OCR,VLM as Detector,Detections Consensus,Detections Combine,Byte Tracker,Multi-Label Classification Model,PTZ Tracking (ONVIF).md),LMM,Anthropic Claude,Time in Zone,Stitch OCR Detections,Dynamic Crop,Anthropic Claude,Keypoint Detection Model,Detections Transformation,Florence-2 Model,Time in Zone,Detections Stitch,Google Gemini,Byte Tracker,Moondream2,Overlap Filter,Twilio SMS Notification,Instance Segmentation Model,Perspective Correction,Email Notification,Motion Detection,SIFT Comparison,Path Deviation,EasyOCR,Local File Sink,CogVLM,OpenAI,Velocity,Llama 3.2 Vision,VLM as Detector - outputs:
Corner Visualization,Label Visualization,Image Blur,Florence-2 Model,SIFT Comparison,Google Gemini,Ellipse Visualization,Halo Visualization,Webhook Sink,Contrast Equalization,Stability AI Outpainting,Perception Encoder Embedding Model,Model Comparison Visualization,Line Counter,Distance Measurement,Polygon Visualization,Stability AI Inpainting,Reference Path Visualization,OpenAI,OpenAI,Slack Notification,Circle Visualization,Stability AI Image Generation,Roboflow Dataset Upload,Path Deviation,Icon Visualization,LMM For Classification,YOLO-World Model,Pixel Color Count,Cache Get,Twilio SMS/MMS Notification,Model Monitoring Inference Aggregator,Color Visualization,Clip Comparison,SAM 3,Mask Visualization,Roboflow Dataset Upload,Anthropic Claude,Time in Zone,Detections Classes Replacement,Line Counter,OpenAI,CLIP Embedding Model,Instance Segmentation Model,Email Notification,Google Gemini,Stitch OCR Detections,Text Display,Roboflow Custom Metadata,Triangle Visualization,Google Vision OCR,SAM 3,Classification Label Visualization,Image Threshold,LMM,PTZ Tracking (ONVIF).md),Dot Visualization,Anthropic Claude,Time in Zone,Stitch OCR Detections,Background Color Visualization,Seg Preview,Polygon Zone Visualization,Keypoint Visualization,Anthropic Claude,Dynamic Crop,Trace Visualization,SAM 3,Crop Visualization,Line Counter Visualization,Florence-2 Model,Time in Zone,Google Gemini,Detections Stitch,Segment Anything 2 Model,Moondream2,Twilio SMS Notification,Image Preprocessing,Instance Segmentation Model,Perspective Correction,Email Notification,Halo Visualization,Path Deviation,Local File Sink,Depth Estimation,CogVLM,Morphological Transformation,Polygon Visualization,OpenAI,QR Code Generator,Cache Set,Llama 3.2 Vision,Bounding Box Visualization,Size Measurement
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Stitch OCR Detections in version v1 has.
Bindings
-
input
predictions(object_detection_prediction): OCR detection predictions from an OCR model. Should contain bounding boxes and class names with text content. Each detection represents a word, character, or text region that will be stitched together into coherent text. Supports object detection format with bounding boxes (xyxy) and class names in the data dictionary..tolerance(integer): Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..delimiter(string): Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..
-
output
ocr_text(string): String value.
Example JSON definition of step Stitch OCR Detections in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/stitch_ocr_detections@v1",
"predictions": "$steps.ocr_model.predictions",
"reading_direction": "left_to_right",
"tolerance": 10,
"delimiter": ""
}