Stitch OCR Detections¶
v2¶
Class: StitchOCRDetectionsBlockV2 (there are multiple versions of this block)
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Combine individual OCR detection results (words, characters, or text regions) into coherent text strings by organizing detections spatially, grouping them into lines, and concatenating text in proper reading order.
Stitching Algorithms¶
This block supports three algorithms for reconstructing text from OCR detections:
Tolerance-based (default)¶
Groups detections into lines using a fixed pixel tolerance. Detections within the tolerance distance vertically (or horizontally for vertical text) are grouped into the same line, then sorted by position within each line.
- Best for: Consistent font sizes and well-aligned horizontal/vertical text
- Parameters:
tolerance(pixel threshold for line grouping)
Otsu Thresholding¶
Uses Otsu's method on normalized gap distances to automatically find the optimal threshold separating character gaps from word gaps. Gaps are normalized by local character width, making it resolution-invariant.
- Best for: Variable font sizes, automatic word boundary detection
- Parameters:
otsu_threshold_multiplier(adjust threshold sensitivity) - Key feature: Detects bimodal distributions to distinguish single words from multi-word text
Collimate (Skewed Text)¶
Uses greedy parent-child traversal to follow text flow. Starting from the first detection, it finds subsequent detections that "follow" in reading order (similar alignment + correct direction), building lines through traversal rather than bucketing.
- Best for: Skewed, curved, or non-axis-aligned text
- Parameters:
collimate_tolerance(alignment tolerance in pixels) - Note: Does not detect word boundaries - use
delimiterparameter if spacing is needed
Reading Directions¶
All algorithms support multiple reading directions:
- left_to_right: Standard horizontal (English, most languages)
- right_to_left: Right-to-left (Arabic, Hebrew)
- vertical_top_to_bottom: Vertical top-to-bottom (Traditional Chinese, Japanese)
- vertical_bottom_to_top: Vertical bottom-to-top
- auto: Automatically detect based on bounding box dimensions
Common Use Cases¶
- Document OCR: Reconstruct paragraphs and lines from character/word detections
- Multi-language support: Handle different reading directions and writing systems
- Skewed text processing: Use collimate algorithm for tilted or curved text
- Word detection: Use Otsu algorithm to automatically insert spaces between words
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/stitch_ocr_detections@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
stitching_algorithm |
str |
Algorithm for grouping detections into words/lines. 'tolerance': Uses fixed pixel tolerance for line grouping (original algorithm). Good for consistent font sizes and line spacing. 'otsu': Uses Otsu's method on normalized gaps to find natural breaks between words. Resolution-invariant and works well with bimodal gap distributions. 'collimate': Uses greedy parent-child traversal to group detections. Good for skewed or curved text where bucket-based approaches fail.. | ❌ |
reading_direction |
str |
Direction to read and organize text detections. 'left_to_right': Standard horizontal reading (English, most languages). 'right_to_left': Right-to-left reading (Arabic, Hebrew). 'vertical_top_to_bottom': Vertical reading from top to bottom (Traditional Chinese, Japanese). 'vertical_bottom_to_top': Vertical reading from bottom to top (rare vertical formats). 'auto': Automatically detects reading direction based on average bounding box dimensions (width > height = horizontal, height >= width = vertical). Determines how detections are grouped into lines and sorted within lines.. | ❌ |
tolerance |
int |
Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero.. | ✅ |
delimiter |
str |
Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas.. | ✅ |
otsu_threshold_multiplier |
float |
Multiplier applied to the Otsu-computed threshold when using the 'otsu' stitching algorithm. Values > 1.0 make word breaks less frequent (more conservative, fewer splits), values < 1.0 make word breaks more frequent (more aggressive, more splits). Default is 1.0 (use Otsu threshold as-is). Try 1.3-1.5 if words are being incorrectly split, or 0.7-0.9 if words are being incorrectly merged.. | ✅ |
collimate_tolerance |
int |
Pixel tolerance for the 'collimate' stitching algorithm. Controls how much vertical (for horizontal text) or horizontal (for vertical text) deviation is allowed when determining if a detection follows another in reading order. Higher values handle more skewed text but may incorrectly merge separate lines. Default is 10 pixels.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Stitch OCR Detections in version v2.
- inputs:
BoT-SORT Tracker,SIFT Comparison,OpenAI-Compatible LLM,Object Detection Model,Motion Detection,Camera Focus,Qwen-VL,Cosine Similarity,Object Detection Model,Stitch OCR Detections,Velocity,Google Gemma API,Roboflow Vision Events,Qwen 3.5 API,OC-SORT Tracker,Path Deviation,Slack Notification,Anthropic Claude,Qwen 3.6 API,Time in Zone,OpenAI,Webhook Sink,Email Notification,Byte Tracker,Detections Consensus,Google Gemma,Detection Event Log,YOLO-World Model,Path Deviation,CogVLM,Llama 3.2 Vision,Byte Tracker,Qwen3.5-VL,Dynamic Crop,Camera Focus,Google Vision OCR,GLM-OCR,Google Gemini,Llama 3.2 Vision,Distance Measurement,OpenRouter,SORT Tracker,Twilio SMS Notification,Detections Stabilizer,Moondream2,Model Monitoring Inference Aggregator,Clip Comparison,Anthropic Claude,Detections Stitch,OpenAI,Time in Zone,Detections List Roll-Up,Multi-Label Classification Model,Gaze Detection,Template Matching,Google Gemini,PTZ Tracking (ONVIF),EasyOCR,Florence-2 Model,MoonshotAI Kimi,Time in Zone,Instance Segmentation Model,MoonshotAI Kimi,Line Counter,Roboflow Dataset Upload,Anthropic Claude,Per-Class Confidence Filter,Roboflow Custom Metadata,Line Counter,Detections Classes Replacement,Florence-2 Model,Local File Sink,Detection Offset,Overlap Filter,Detections Merge,Image Contours,Single-Label Classification Model,OpenAI,Detections Filter,Google Gemini,ByteTrack Tracker,VLM As Classifier,VLM As Detector,Email Notification,OpenAI,Pixel Color Count,Object Detection Model,Detections Transformation,LMM,LMM For Classification,Mask Area Measurement,Byte Tracker,Stitch OCR Detections,OCR Model,Detections Combine,Keypoint Detection Model,SIFT Comparison,VLM As Detector,Image Stack,Identify Changes,Twilio SMS/MMS Notification,Roboflow Dataset Upload,CSV Formatter,S3 Sink,Perspective Correction - outputs:
Cache Set,Stability AI Outpainting,OpenAI-Compatible LLM,Morphological Transformation,Object Detection Model,CLIP Embedding Model,SAM 3,Mask Visualization,Stability AI Image Generation,Qwen-VL,Seg Preview,Corner Visualization,Ellipse Visualization,Image Preprocessing,Roboflow Vision Events,Crop Visualization,Stitch OCR Detections,Heatmap Visualization,Google Gemma API,Qwen 3.5 API,Trace Visualization,Path Deviation,Perception Encoder Embedding Model,Background Color Visualization,Slack Notification,Anthropic Claude,Qwen 3.6 API,Time in Zone,OpenAI,Webhook Sink,Email Notification,Color Visualization,Bounding Box Visualization,Keypoint Visualization,Model Comparison Visualization,Google Gemma,YOLO-World Model,CogVLM,Path Deviation,Polygon Zone Visualization,Llama 3.2 Vision,Instance Segmentation Model,Dynamic Crop,Polygon Visualization,Instance Segmentation Model,QR Code Generator,GLM-OCR,Google Vision OCR,Google Gemini,OpenRouter,SAM 3,Distance Measurement,Single-Label Classification Model,Llama 3.2 Vision,Twilio SMS Notification,Model Monitoring Inference Aggregator,Clip Comparison,Image Blur,Moondream2,Anthropic Claude,Cache Get,Detections Stitch,OpenAI,Depth Estimation,Time in Zone,Segment Anything 2 Model,Instance Segmentation Model,Google Gemini,Classification Label Visualization,PTZ Tracking (ONVIF),MoonshotAI Kimi,Florence-2 Model,Contrast Equalization,Image Threshold,Time in Zone,Instance Segmentation Model,MoonshotAI Kimi,Line Counter,Dot Visualization,Polygon Visualization,Roboflow Dataset Upload,Anthropic Claude,Halo Visualization,Stability AI Inpainting,Roboflow Custom Metadata,Keypoint Detection Model,Semantic Segmentation Model,Line Counter,Detections Classes Replacement,Florence-2 Model,Local File Sink,Label Visualization,Icon Visualization,OpenAI,Google Gemini,SAM 3,Email Notification,Halo Visualization,Size Measurement,OpenAI,Multi-Label Classification Model,Pixel Color Count,LMM,LMM For Classification,Text Display,Reference Path Visualization,Circle Visualization,Line Counter Visualization,Stitch OCR Detections,SIFT Comparison,Twilio SMS/MMS Notification,Morphological Transformation,Roboflow Dataset Upload,S3 Sink,Triangle Visualization,Perspective Correction
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Stitch OCR Detections in version v2 has.
Bindings
-
input
predictions(object_detection_prediction): OCR detection predictions from an OCR model. Should contain bounding boxes and class names with text content. Each detection represents a word, character, or text region that will be stitched together into coherent text. Supports object detection format with bounding boxes (xyxy) and class names in the data dictionary..tolerance(integer): Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..delimiter(string): Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..otsu_threshold_multiplier(float): Multiplier applied to the Otsu-computed threshold when using the 'otsu' stitching algorithm. Values > 1.0 make word breaks less frequent (more conservative, fewer splits), values < 1.0 make word breaks more frequent (more aggressive, more splits). Default is 1.0 (use Otsu threshold as-is). Try 1.3-1.5 if words are being incorrectly split, or 0.7-0.9 if words are being incorrectly merged..collimate_tolerance(integer): Pixel tolerance for the 'collimate' stitching algorithm. Controls how much vertical (for horizontal text) or horizontal (for vertical text) deviation is allowed when determining if a detection follows another in reading order. Higher values handle more skewed text but may incorrectly merge separate lines. Default is 10 pixels..
-
output
ocr_text(string): String value.
Example JSON definition of step Stitch OCR Detections in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/stitch_ocr_detections@v2",
"stitching_algorithm": "tolerance",
"predictions": "$steps.ocr_model.predictions",
"reading_direction": "left_to_right",
"tolerance": 10,
"delimiter": "",
"otsu_threshold_multiplier": 1.0,
"collimate_tolerance": 5
}
v1¶
Class: StitchOCRDetectionsBlockV1 (there are multiple versions of this block)
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Combine individual OCR detection results (words, characters, or text regions) into coherent text strings by organizing detections spatially according to reading direction, grouping detections into lines, sorting them within lines, and concatenating text in proper reading order to reconstruct readable text from OCR model outputs.
How This Block Works¶
This block reconstructs readable text from individual OCR detections by organizing them spatially and concatenating text in proper reading order. The block:
- Receives OCR detection predictions containing individual text detections with bounding boxes and class names (text content)
- Prepares coordinates based on reading direction:
- For vertical reading directions, swaps x and y coordinates to enable vertical line processing
- For horizontal reading directions, uses coordinates as-is
- Groups detections into lines:
- Groups detections based on vertical position (or horizontal position for vertical text) using the tolerance parameter
- Detections within the tolerance distance are considered part of the same line
- Higher tolerance values group detections that are further apart, useful for text with variable line spacing
- Sorts lines based on reading direction:
- For left-to-right and vertical top-to-bottom: sorts lines from top to bottom
- For right-to-left and vertical bottom-to-top: sorts lines in reverse order (bottom to top)
- Sorts detections within each line:
- For left-to-right and vertical top-to-bottom: sorts detections by horizontal position (left to right, or top to bottom for vertical)
- For right-to-left and vertical bottom-to-top: sorts detections in reverse order (right to left, or bottom to top for vertical)
- Concatenates text in reading order:
- Extracts class names (text content) from detections in sorted order
- Adds line separators (newline for horizontal text, space for vertical text) between lines
- Optionally inserts a delimiter between each text element if specified
- Produces a single coherent text string with proper reading order
- Handles automatic reading direction detection (if "auto" is selected):
- Analyzes average width and height of detection bounding boxes
- If average width > average height: detects horizontal text (left-to-right)
- If average height >= average width: detects vertical text (top-to-bottom)
- Returns the stitched text string:
- Outputs a single text string under the
ocr_textkey - Text is formatted with proper line breaks and spacing according to reading direction
The block enables reconstruction of multi-line text from individual OCR detections, maintaining proper reading order for different languages and writing systems. It handles both horizontal (left-to-right, right-to-left) and vertical (top-to-bottom, bottom-to-top) text orientations, making it useful for processing text in various languages and formats.
Common Use Cases¶
- Text Reconstruction: Convert individual word or character detections from OCR models into readable text blocks (e.g., reconstruct documents from word detections, combine character detections into words, stitch OCR results into paragraphs), enabling text reconstruction workflows
- Multi-Line Text Processing: Reconstruct multi-line text from OCR results with proper line breaks and formatting (e.g., extract paragraphs from OCR results, reconstruct formatted text, process multi-line documents), enabling multi-line text workflows
- Multi-Language OCR: Process OCR results from different languages and writing systems (e.g., process Arabic right-to-left text, handle vertical Chinese/Japanese text, support multiple reading directions), enabling multi-language OCR workflows
- Document Processing: Extract and reconstruct text from documents and images (e.g., extract text from scanned documents, process invoice text, extract text from forms), enabling document processing workflows
- Text Extraction and Formatting: Extract text from images and format it for downstream use (e.g., extract text for database storage, format text for API responses, prepare text for analysis), enabling text extraction workflows
- OCR Result Post-Processing: Post-process OCR model outputs to produce usable text strings (e.g., format OCR outputs, organize OCR results, prepare text for downstream blocks), enabling OCR post-processing workflows
Connecting to Other Blocks¶
This block receives OCR detection predictions and produces stitched text strings:
- After OCR model blocks to convert detection results into readable text (e.g., OCR model to text string, OCR detections to formatted text, OCR results to text output), enabling OCR-to-text workflows
- Before data storage blocks to store extracted text (e.g., store OCR text in databases, save extracted text, log OCR results), enabling text storage workflows
- Before notification blocks to send extracted text in notifications (e.g., send OCR text in alerts, include extracted text in messages, notify with OCR results), enabling text notification workflows
- Before text processing blocks to process stitched text (e.g., process text with NLP models, analyze extracted text, apply text transformations), enabling text processing workflows
- Before API output blocks to provide text in API responses (e.g., return OCR text in API, format text for responses, provide extracted text output), enabling text output workflows
- In workflow outputs to provide stitched text as final output (e.g., text extraction workflows, OCR output workflows, document processing workflows), enabling text output workflows
Requirements¶
This block requires OCR detection predictions (object detection format) with bounding boxes and class names containing text content. The tolerance parameter must be greater than zero and controls the vertical (or horizontal for vertical text) distance threshold for grouping detections into lines. The reading_direction parameter supports five modes: "left_to_right" (standard horizontal), "right_to_left" (Arabic-style), "vertical_top_to_bottom" (vertical), "vertical_bottom_to_top" (vertical reversed), and "auto" (automatic detection based on bounding box dimensions). The delimiter parameter is optional and inserts a delimiter between each text element (empty string by default, meaning no delimiter). The block outputs a single text string under the ocr_text key.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/stitch_ocr_detections@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
reading_direction |
str |
Direction to read and organize text detections. 'left_to_right': Standard horizontal reading (English, most languages). 'right_to_left': Right-to-left reading (Arabic, Hebrew). 'vertical_top_to_bottom': Vertical reading from top to bottom (Traditional Chinese, Japanese). 'vertical_bottom_to_top': Vertical reading from bottom to top (rare vertical formats). 'auto': Automatically detects reading direction based on average bounding box dimensions (width > height = horizontal, height >= width = vertical). Determines how detections are grouped into lines and sorted within lines.. | ❌ |
tolerance |
int |
Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero.. | ✅ |
delimiter |
str |
Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Stitch OCR Detections in version v1.
- inputs:
BoT-SORT Tracker,SIFT Comparison,OpenAI-Compatible LLM,Object Detection Model,Motion Detection,Qwen-VL,Object Detection Model,Stitch OCR Detections,Velocity,Google Gemma API,Roboflow Vision Events,Qwen 3.5 API,OC-SORT Tracker,Path Deviation,Slack Notification,Anthropic Claude,Qwen 3.6 API,Time in Zone,OpenAI,Webhook Sink,Email Notification,Byte Tracker,Detections Consensus,Google Gemma,Detection Event Log,YOLO-World Model,Path Deviation,CogVLM,Llama 3.2 Vision,Byte Tracker,Qwen3.5-VL,Dynamic Crop,Google Vision OCR,GLM-OCR,Google Gemini,Llama 3.2 Vision,Distance Measurement,OpenRouter,SORT Tracker,Twilio SMS Notification,Detections Stabilizer,Moondream2,Model Monitoring Inference Aggregator,Clip Comparison,Anthropic Claude,Detections Stitch,OpenAI,Time in Zone,Detections List Roll-Up,Multi-Label Classification Model,Template Matching,Google Gemini,PTZ Tracking (ONVIF),EasyOCR,Florence-2 Model,MoonshotAI Kimi,Time in Zone,Instance Segmentation Model,MoonshotAI Kimi,Line Counter,Roboflow Dataset Upload,Anthropic Claude,Per-Class Confidence Filter,Roboflow Custom Metadata,Line Counter,Detections Classes Replacement,Florence-2 Model,Local File Sink,Detection Offset,Overlap Filter,Detections Merge,Image Contours,Single-Label Classification Model,OpenAI,Detections Filter,Google Gemini,ByteTrack Tracker,VLM As Classifier,VLM As Detector,Email Notification,OpenAI,Pixel Color Count,Object Detection Model,Detections Transformation,LMM,LMM For Classification,Mask Area Measurement,Byte Tracker,Stitch OCR Detections,OCR Model,Detections Combine,Keypoint Detection Model,SIFT Comparison,VLM As Detector,Image Stack,Twilio SMS/MMS Notification,Roboflow Dataset Upload,CSV Formatter,S3 Sink,Perspective Correction - outputs:
Cache Set,Stability AI Outpainting,OpenAI-Compatible LLM,Morphological Transformation,Object Detection Model,CLIP Embedding Model,SAM 3,Mask Visualization,Stability AI Image Generation,Qwen-VL,Seg Preview,Corner Visualization,Ellipse Visualization,Image Preprocessing,Roboflow Vision Events,Crop Visualization,Stitch OCR Detections,Heatmap Visualization,Google Gemma API,Qwen 3.5 API,Trace Visualization,Path Deviation,Perception Encoder Embedding Model,Background Color Visualization,Slack Notification,Anthropic Claude,Qwen 3.6 API,Time in Zone,OpenAI,Webhook Sink,Email Notification,Color Visualization,Bounding Box Visualization,Keypoint Visualization,Model Comparison Visualization,Google Gemma,YOLO-World Model,CogVLM,Path Deviation,Polygon Zone Visualization,Llama 3.2 Vision,Instance Segmentation Model,Dynamic Crop,Polygon Visualization,Instance Segmentation Model,QR Code Generator,GLM-OCR,Google Vision OCR,Google Gemini,OpenRouter,SAM 3,Distance Measurement,Single-Label Classification Model,Llama 3.2 Vision,Twilio SMS Notification,Model Monitoring Inference Aggregator,Clip Comparison,Image Blur,Moondream2,Anthropic Claude,Cache Get,Detections Stitch,OpenAI,Depth Estimation,Time in Zone,Segment Anything 2 Model,Instance Segmentation Model,Google Gemini,Classification Label Visualization,PTZ Tracking (ONVIF),MoonshotAI Kimi,Florence-2 Model,Contrast Equalization,Image Threshold,Time in Zone,Instance Segmentation Model,MoonshotAI Kimi,Line Counter,Dot Visualization,Polygon Visualization,Roboflow Dataset Upload,Anthropic Claude,Halo Visualization,Stability AI Inpainting,Roboflow Custom Metadata,Keypoint Detection Model,Semantic Segmentation Model,Line Counter,Detections Classes Replacement,Florence-2 Model,Local File Sink,Label Visualization,Icon Visualization,OpenAI,Google Gemini,SAM 3,Email Notification,Halo Visualization,Size Measurement,OpenAI,Multi-Label Classification Model,Pixel Color Count,LMM,LMM For Classification,Text Display,Reference Path Visualization,Circle Visualization,Line Counter Visualization,Stitch OCR Detections,SIFT Comparison,Twilio SMS/MMS Notification,Morphological Transformation,Roboflow Dataset Upload,S3 Sink,Triangle Visualization,Perspective Correction
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Stitch OCR Detections in version v1 has.
Bindings
-
input
predictions(object_detection_prediction): OCR detection predictions from an OCR model. Should contain bounding boxes and class names with text content. Each detection represents a word, character, or text region that will be stitched together into coherent text. Supports object detection format with bounding boxes (xyxy) and class names in the data dictionary..tolerance(integer): Vertical (or horizontal for vertical text) distance threshold in pixels for grouping detections into the same line. Detections within this tolerance distance are grouped into the same line. Higher values group detections that are further apart (useful for text with variable line spacing or slanted text). Lower values create more lines (useful for tightly spaced text). Must be greater than zero..delimiter(string): Optional delimiter string to insert between each text element (word/character) when stitching. Empty string (default) means no delimiter - text elements are concatenated directly. Useful for adding spaces between words, commas between elements, or custom separators. Example: use ' ' (space) to add spaces between words, or ',' to add commas..
-
output
ocr_text(string): String value.
Example JSON definition of step Stitch OCR Detections in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/stitch_ocr_detections@v1",
"predictions": "$steps.ocr_model.predictions",
"reading_direction": "left_to_right",
"tolerance": 10,
"delimiter": ""
}