Stitch Images¶
Class: StitchImagesBlockV1
Source: inference.core.workflows.core_steps.transformations.stitch_images.v1.StitchImagesBlockV1
Stitch two overlapping images together into a single panoramic image using SIFT (Scale Invariant Feature Transform) feature matching and homography-based image alignment, automatically detecting common features, calculating geometric transformations, and blending images to create seamless panoramic compositions from overlapping scenes.
How This Block Works¶
This block stitches two overlapping images together by detecting common features, calculating geometric transformations, and aligning the images into a single panoramic result. The block:
- Receives two input images (image1 and image2) that contain overlapping regions with sufficient detail for feature matching
- Detects keypoints and computes descriptors using SIFT (Scale Invariant Feature Transform) for both images:
- Identifies distinctive feature points (keypoints) in each image that are invariant to scale and rotation
- Computes feature descriptors (128-dimensional vectors) describing the visual characteristics around each keypoint
- Matches keypoints between the two images using brute force matching:
- Finds the best matching descriptors for each keypoint in image1 among all keypoints in image2
- Uses k-nearest neighbor matching (configurable via count_of_best_matches_per_query_descriptor) to find multiple potential matches per query keypoint
- Filters good matches using Lowe's ratio test:
- Compares the distance to the best match with the distance to the second-best match
- Keeps matches where the best match distance is less than 0.75 times the second-best match distance (reduces false matches)
- Determines image ordering based on keypoint positions (identifies which image should be placed first based on spatial distribution of matched features)
- Calculates homography transformation matrix using RANSAC (Random Sample Consensus):
- Finds a perspective transformation matrix that maps points from one image to the other
- Uses RANSAC to robustly estimate the transformation while filtering out outlier matches
- Configurable maximum reprojection error (max_allowed_reprojection_error) controls which point pairs are considered inliers
- Calculates canvas size and translation:
- Determines the size needed to contain both images after transformation
- Calculates translation needed to ensure both images fit within the canvas boundaries
- Warps the second image using the homography transformation:
- Applies perspective transformation to align the second image with the first
- Combines homography matrix with translation matrix for correct positioning
- Stitches images together:
- Places the first image onto the warped second image canvas
- Creates the final stitched panoramic image containing both input images aligned and blended
- Returns the stitched image, or None if stitching fails (e.g., insufficient matches, transformation calculation failure)
The block uses SIFT for robust feature detection that works well with images containing sufficient detail and texture. The RANSAC-based homography calculation handles perspective distortions and ensures robust alignment even with some incorrect matches. The reprojection error threshold controls the sensitivity of the alignment - lower values require more precise matches, while higher values (useful for low-detail images) allow more tolerance for matching variations.
Common Use Cases¶
- Panoramic Image Creation: Stitch overlapping images together to create wide panoramic views (e.g., create panoramic photos from overlapping camera shots, stitch together images from rotating cameras, combine multiple overlapping images into panoramas), enabling panoramic image generation workflows
- Wide-Area Scene Reconstruction: Combine multiple overlapping views of a scene into a single comprehensive image (e.g., reconstruct wide scenes from multiple camera angles, combine overlapping surveillance camera views, stitch together images from multiple viewpoints), enabling wide-area scene visualization
- Multi-Image Mosaicking: Create image mosaics from overlapping image tiles or sections (e.g., stitch together image tiles for large-scale mapping, combine overlapping satellite image sections, create mosaics from overlapping image captures), enabling image mosaic creation workflows
- Scene Documentation: Combine multiple overlapping images to document large scenes or areas (e.g., document large spaces with multiple overlapping photos, combine overlapping views for scene documentation, stitch together images for comprehensive scene capture), enabling comprehensive scene documentation
- Video Frame Stitching: Stitch together overlapping frames from video sequences (e.g., create panoramic views from video frames, combine overlapping frames from moving cameras, stitch together consecutive video frames), enabling video-based panoramic workflows
- Multi-Camera View Combination: Combine overlapping views from multiple cameras into a single unified view (e.g., stitch together overlapping camera feeds, combine multi-camera views for monitoring, merge overlapping camera perspectives), enabling multi-camera view integration workflows
Connecting to Other Blocks¶
This block receives two images and produces a single stitched image:
- After image input blocks or image preprocessing blocks to stitch preprocessed images together (e.g., stitch images after preprocessing, combine images after enhancement, merge images after filtering), enabling image stitching workflows
- After crop blocks to stitch together cropped image regions from different sources (e.g., stitch cropped regions from different images, combine cropped sections from multiple sources, merge cropped regions into panoramas), enabling cropped region stitching workflows
- After transformation blocks to stitch images that have been transformed or adjusted (e.g., stitch images after perspective correction, combine images after geometric transformations, merge images after adjustments), enabling transformed image stitching workflows
- Before detection or analysis blocks that benefit from panoramic views (e.g., detect objects in stitched panoramic images, analyze wide-area stitched scenes, process comprehensive stitched views), enabling panoramic analysis workflows
- Before visualization blocks to display stitched panoramic images (e.g., visualize stitched panoramas, display wide-area stitched views, show comprehensive stitched scenes), enabling panoramic visualization outputs
- In multi-stage image processing workflows where images need to be stitched before further processing (e.g., stitch images before detection, combine images before analysis, merge images for comprehensive processing), enabling multi-stage panoramic processing workflows
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/stitch_images@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | โ |
max_allowed_reprojection_error |
float |
Maximum allowed reprojection error (in pixels) to treat a point pair as an inlier during RANSAC homography calculation. This corresponds to cv.findHomography's ransacReprojThreshold parameter. Lower values require more precise matches (stricter alignment) but may fail with noisy matches. Higher values allow more tolerance for matching variations (more lenient alignment) and can improve results for low-detail images or images with imperfect feature matches. Default is 3 pixels. Increase this value (e.g., 5-10) for images with less detail or when stitching fails with default settings.. | โ |
count_of_best_matches_per_query_descriptor |
int |
Number of best matches to find per query descriptor during keypoint matching. This corresponds to cv.BFMatcher.knnMatch's k parameter. Must be greater than 0. The block finds the k nearest neighbor matches for each keypoint descriptor in image1 among all descriptors in image2. Then uses Lowe's ratio test to filter good matches (comparing best match distance with second-best match distance). Higher values provide more candidate matches but increase computation. Default is 2 (finds 2 best matches per descriptor). Typical values range from 2-5. Use higher values if you need more match candidates for difficult images.. | โ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Stitch Images in version v1.
- inputs:
Image Slicer,Polygon Zone Visualization,Line Counter,Contrast Enhancement,Stability AI Image Generation,Image Threshold,Line Counter Visualization,Trace Visualization,Distance Measurement,Image Stack,Camera Calibration,QR Code Generator,Detection Event Log,Icon Visualization,SIFT Comparison,Morphological Transformation,Color Visualization,Perspective Correction,Corner Visualization,Halo Visualization,Image Blur,Morphological Transformation,Camera Focus,Halo Visualization,Stability AI Inpainting,Classification Label Visualization,Grid Visualization,Background Color Visualization,Mask Visualization,Ellipse Visualization,Reference Path Visualization,Image Slicer,Label Visualization,Identify Outliers,Text Display,SIFT Comparison,Dot Visualization,Polygon Visualization,Identify Changes,Crop Visualization,Dynamic Crop,Absolute Static Crop,Circle Visualization,Image Preprocessing,Template Matching,Relative Static Crop,Camera Focus,Heatmap Visualization,Blur Visualization,Depth Estimation,Stability AI Outpainting,Clip Comparison,Background Subtraction,Keypoint Visualization,Detections Consensus,Bounding Box Visualization,Stitch Images,Image Convert Grayscale,Contrast Equalization,Line Counter,Roboflow Visual Search,Triangle Visualization,Pixelate Visualization,SIFT,Image Contours,Polygon Visualization,Pixel Color Count,Model Comparison Visualization - outputs:
VLM As Classifier,MoonshotAI Kimi,Stability AI Image Generation,Trace Visualization,Qwen2.5-VL,Image Stack,Anthropic Claude,Icon Visualization,SIFT Comparison,Morphological Transformation,Color Visualization,SmolVLM2,LMM For Classification,Single-Label Classification Model,Perspective Correction,Corner Visualization,Clip Comparison,Halo Visualization,Qwen-VL,Keypoint Detection Model,Halo Visualization,Object Detection Model,Google Gemma,Background Color Visualization,Ellipse Visualization,Email Notification,Twilio SMS/MMS Notification,Text Display,Polygon Visualization,Crop Visualization,Absolute Static Crop,Image Preprocessing,Template Matching,Relative Static Crop,OpenRouter,OpenAI,Florence-2 Model,VLM As Detector,OpenAI,Motion Detection,Heatmap Visualization,OCR Model,Perception Encoder Embedding Model,Blur Visualization,Barcode Detection,Depth Estimation,Instance Segmentation Model,Stability AI Outpainting,Anthropic Claude,YOLO-World Model,Google Gemini,Clip Comparison,Google Gemini,Background Subtraction,Keypoint Visualization,Buffer,Stitch Images,Florence-2 Model,Contrast Equalization,Mask Edge Snap,OpenAI,Qwen3-VL,Moondream2,VLM As Detector,Google Gemini,Triangle Visualization,CLIP Embedding Model,Detections Stabilizer,SIFT,Multi-Label Classification Model,Image Contours,Keypoint Detection Model,VLM As Classifier,Pixel Color Count,GLM-OCR,Image Slicer,Polygon Zone Visualization,Contrast Enhancement,Google Gemma API,Time in Zone,Semantic Segmentation Model,Image Threshold,Line Counter Visualization,Semantic Segmentation Model,Multi-Label Classification Model,Camera Calibration,ByteTrack Tracker,Google Vision OCR,Image Blur,Morphological Transformation,Camera Focus,Roboflow Vision Events,Stability AI Inpainting,Classification Label Visualization,SAM2 Video Tracker,Event Writer,Qwen3.5-VL,Mask Visualization,Llama 3.2 Vision,Dominant Color,Reference Path Visualization,Image Slicer,Label Visualization,Byte Tracker,Dot Visualization,Dynamic Crop,Detections Stitch,Circle Visualization,Llama 3.2 Vision,BoT-SORT Tracker,SAM3 Video Tracker,Camera Focus,Gaze Detection,Segment Anything 2 Model,MoonshotAI Kimi,Single-Label Classification Model,QR Code Detection,Qwen3.5,CogVLM,Object Detection Model,SAM 3 Interactive,Qwen 3.6 API,Bounding Box Visualization,Multi-Label Classification Model,LMM,OpenAI,SAM 3,Image Convert Grayscale,Instance Segmentation Model,EasyOCR,Roboflow Visual Search,Roboflow Dataset Upload,SAM 3,Instance Segmentation Model,Keypoint Detection Model,Pixelate Visualization,Roboflow Dataset Upload,SORT Tracker,Instance Segmentation Model,Track Class Lock,Qwen 3.5 API,Object Detection Model,Anthropic Claude,Polygon Visualization,OC-SORT Tracker,SAM 3,Model Comparison Visualization,Single-Label Classification Model,Seg Preview
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Stitch Images in version v1 has.
Bindings
-
input
image1(image): First input image to stitch. Should contain overlapping regions with image2 and sufficient detail/texture for SIFT feature detection. The images must have overlapping content for successful stitching. The block will determine the optimal positioning and alignment of this image relative to image2 during stitching. Images with rich texture and detail work best for SIFT-based feature matching..image2(image): Second input image to stitch. Should contain overlapping regions with image1 and sufficient detail/texture for SIFT feature detection. The images must have overlapping content for successful stitching. The block will warp and align this image to match image1's perspective during stitching. Images with rich texture and detail work best for SIFT-based feature matching..max_allowed_reprojection_error(float_zero_to_one): Maximum allowed reprojection error (in pixels) to treat a point pair as an inlier during RANSAC homography calculation. This corresponds to cv.findHomography's ransacReprojThreshold parameter. Lower values require more precise matches (stricter alignment) but may fail with noisy matches. Higher values allow more tolerance for matching variations (more lenient alignment) and can improve results for low-detail images or images with imperfect feature matches. Default is 3 pixels. Increase this value (e.g., 5-10) for images with less detail or when stitching fails with default settings..count_of_best_matches_per_query_descriptor(integer): Number of best matches to find per query descriptor during keypoint matching. This corresponds to cv.BFMatcher.knnMatch's k parameter. Must be greater than 0. The block finds the k nearest neighbor matches for each keypoint descriptor in image1 among all descriptors in image2. Then uses Lowe's ratio test to filter good matches (comparing best match distance with second-best match distance). Higher values provide more candidate matches but increase computation. Default is 2 (finds 2 best matches per descriptor). Typical values range from 2-5. Use higher values if you need more match candidates for difficult images..
-
output
stitched_image(image): Image in workflows.
Example JSON definition of step Stitch Images in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/stitch_images@v1",
"image1": "$inputs.image1",
"image2": "$inputs.image2",
"max_allowed_reprojection_error": 3,
"count_of_best_matches_per_query_descriptor": 2
}