Image Stack¶
Class: ImageStackBlockV1
Source: inference.core.workflows.core_steps.fusion.image_stack.v1.ImageStackBlockV1
Accumulate compressed video frames into a fixed-size stack, returning the most recent N frames as JPEG-encoded binary blobs. Designed for shared-hosting safety: frames are always JPEG-compressed and downsampled to fit within resolution limits, preventing out-of-memory conditions.
How This Block Works¶
- Receives a video frame (WorkflowImageData) each workflow cycle.
- Downsamples the frame if it exceeds the configured resolution limits (default 1920x1080), preserving aspect ratio.
- JPEG-encodes the frame at quality 75 and stores the resulting bytes.
- Maintains a per-camera FIFO buffer (deque) of up to
stack_sizecompressed frames. When the buffer is full the oldest frame is automatically evicted. - If
stack_sizechanges between calls (e.g. via a dynamic selector), the buffer is resized and existing frames are preserved up to the new limit. - If the
clearinput is True the buffer is flushed before the current frame is added. - Outputs the list of JPEG byte blobs (newest first) and the current frame count.
Common Use Cases¶
- Action / activity recognition: accumulate a clip of N frames and pass them to a vision-language model (e.g. Google Gemini, Qwen) that can reason over multiple images to classify actions, detect events, or describe what is happening in a scene.
- Time-lapse snapshots: collect the last N frames for periodic visual comparison.
- Event buffering: keep a rolling window of frames around an event of interest.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/image_stack@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
stack_size |
int |
Maximum number of frames to keep in the stack (1-64). When the stack is full the oldest frame is evicted.. | ✅ |
resolution_width |
int |
Maximum frame width in pixels (64-1920). Frames wider than this are downsampled preserving aspect ratio.. | ✅ |
resolution_height |
int |
Maximum frame height in pixels (64-1080). Frames taller than this are downsampled preserving aspect ratio.. | ✅ |
clear |
bool |
When True the entire frame buffer is flushed before the current frame is added. Useful for resetting state on scene changes.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
soft— runtimehosted_serverless,dedicated_deployment; executionremote; inputvideo - Frame stack is stored in process memory per video_identifier. With remote step execution on stateless or multi-replica HTTP runtimes, successive frames may be served by different worker processes, so the stack resets or contains only a partial frame history. Use local step execution in an InferencePipeline for stable cross-frame results.
-
soft— inputimage - Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Image Stack in version v1.
- inputs:
Circle Visualization,Roboflow Asset Library Attributes,Twilio SMS Notification,Image Blur,S3 Sink,Email Notification,Reference Path Visualization,Camera Focus,SIFT Comparison,PTZ Tracking (ONVIF),Event Writer,Identify Changes,Slack Notification,Halo Visualization,VLM As Classifier,Image Stack,Dot Visualization,Image Slicer,Label Visualization,Background Color Visualization,Email Notification,Pixelate Visualization,Heatmap Visualization,JSON Parser,Stitch Images,Identify Outliers,Morphological Transformation,Blur Visualization,Trace Visualization,Detection Event Log,Camera Focus,Corner Visualization,Pixel Color Count,Model Comparison Visualization,MQTT Writer,SIFT Comparison,Webhook Sink,Model Monitoring Inference Aggregator,Image Threshold,Image Contours,Local File Sink,Motion Detection,Polygon Visualization,Polygon Visualization,SIFT,Stability AI Image Generation,Classification Label Visualization,Line Counter Visualization,Line Counter,Relative Static Crop,Grid Visualization,Image Preprocessing,Keypoint Visualization,Template Matching,OPC UA Writer Sink,Icon Visualization,Color Visualization,Dynamic Zone,Triangle Visualization,QR Code Generator,Contrast Enhancement,Roboflow Dataset Upload,Absolute Static Crop,Dynamic Crop,Stability AI Inpainting,Background Subtraction,Bounding Box Visualization,Polygon Zone Visualization,Stability AI Outpainting,Crop Visualization,Image Convert Grayscale,Mask Visualization,Halo Visualization,Image Slicer,Distance Measurement,Perspective Correction,Twilio SMS/MMS Notification,Text Display,Morphological Transformation,Roboflow Vision Events,Microsoft SQL Server Sink,VLM As Classifier,Roboflow Dataset Upload,VLM As Detector,Depth Estimation,Detections Consensus,Roboflow Custom Metadata,Contrast Equalization,Camera Calibration,Ellipse Visualization,VLM As Detector,Line Counter - outputs:
Cache Set,MoonshotAI Kimi,Roboflow Asset Library Attributes,Path Deviation,Image Blur,Keypoint Detection Model,Reference Path Visualization,PTZ Tracking (ONVIF),SIFT Comparison,Event Writer,Slack Notification,SAM2 Video Tracker,Halo Visualization,VLM As Classifier,Image Stack,Clip Comparison,Qwen 3.6 API,Google Gemma,Object Detection Model,Dot Visualization,Label Visualization,Llama 3.2 Vision,Email Notification,Pixelate Visualization,Google Gemini,Anthropic Claude,Track Class Lock,OpenAI,Trace Visualization,Llama 3.2 Vision,ByteTrack Tracker,Clip Comparison,OpenAI,Buffer,MQTT Writer,Webhook Sink,SIFT Comparison,Image Contours,Motion Detection,Google Gemini,MoonshotAI Kimi,Polygon Visualization,Classification Label Visualization,Instance Segmentation Model,Keypoint Detection Model,Keypoint Visualization,Instance Segmentation Model,Icon Visualization,Seg Preview,Stability AI Inpainting,Bounding Box Visualization,Polygon Zone Visualization,BoT-SORT Tracker,Stability AI Outpainting,Crop Visualization,Byte Tracker,Mask Visualization,Halo Visualization,Detection Offset,SORT Tracker,PLC EthernetIP,Anthropic Claude,Text Display,Morphological Transformation,VLM As Classifier,Roboflow Dataset Upload,VLM As Detector,Detections Consensus,Object Detection Model,Ellipse Visualization,Keypoint Detection Model,SAM3 Video Tracker,Time in Zone,SAM 3,Size Measurement,Circle Visualization,Path Deviation,Twilio SMS Notification,Email Notification,Identify Changes,Byte Tracker,SAM 3,Image Slicer,LMM For Classification,Dominant Color,Heatmap Visualization,Google Gemma API,Stitch Images,Identify Outliers,Time in Zone,Morphological Transformation,YOLO-World Model,Blur Visualization,Stitch OCR Detections,Detections List Roll-Up,Florence-2 Model,Google Gemini,Corner Visualization,OpenRouter,Detections Stabilizer,Pixel Color Count,SAM 3,Byte Tracker,Image Threshold,Instance Segmentation Model,Polygon Visualization,Time in Zone,Mask Edge Snap,Line Counter,Line Counter Visualization,Grid Visualization,Image Preprocessing,Stitch OCR Detections,Anthropic Claude,OPC UA Writer Sink,Color Visualization,Dynamic Zone,Triangle Visualization,QR Code Generator,Qwen 3.5 API,Roboflow Dataset Upload,Absolute Static Crop,Background Subtraction,OC-SORT Tracker,OpenAI,Image Slicer,Qwen-VL,Florence-2 Model,Perspective Correction,Twilio SMS/MMS Notification,Roboflow Vision Events,Instance Segmentation Model,Detections Classes Replacement,VLM As Detector,Line Counter,Object Detection Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Image Stack in version v1 has.
Bindings
-
input
image(image): Video frame to add to the stack..stack_size(integer): Maximum number of frames to keep in the stack (1-64). When the stack is full the oldest frame is evicted..resolution_width(integer): Maximum frame width in pixels (64-1920). Frames wider than this are downsampled preserving aspect ratio..resolution_height(integer): Maximum frame height in pixels (64-1080). Frames taller than this are downsampled preserving aspect ratio..clear(boolean): When True the entire frame buffer is flushed before the current frame is added. Useful for resetting state on scene changes..
-
output
frames(list_of_values): List of values of any type.frames_count(integer): Integer value.
Example JSON definition of step Image Stack in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/image_stack@v1",
"image": "$inputs.image",
"stack_size": 5,
"resolution_width": 640,
"resolution_height": 480,
"clear": false
}