Roboflow Dataset Upload¶
v2¶
Class: RoboflowDatasetUploadBlockV2 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.sinks.roboflow.dataset_upload.v2.RoboflowDatasetUploadBlockV2
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Upload images and model predictions to a Roboflow dataset for active learning, model improvement, and data collection, with configurable usage quotas, probabilistic sampling, batch organization, image compression, and optional annotation persistence.
How This Block Works¶
This block uploads workflow images and predictions to your Roboflow dataset for storage, labeling, and model training. The block:
- Takes images and optional model predictions (object detection, instance segmentation, keypoint detection, or classification) as input
- Validates the Roboflow API key is available (required for uploading)
- Applies probabilistic sampling based on
data_percentagesetting, randomly selecting a percentage of inputs to upload (e.g., 50% uploads half the data, 100% uploads everything) - Checks usage quotas (minutely, hourly, daily limits) to ensure uploads stay within configured rate limits for active learning strategies
- Prepares images by resizing if they exceed maximum size (maintaining aspect ratio) and compressing to specified quality level
- Generates labeling batch names based on the prefix and batch creation frequency (never, daily, weekly, or monthly), organizing uploaded data into batches
- Optionally persists model predictions as annotations if
persist_predictionsis enabled, allowing predictions to serve as pre-labels for review and correction - Attaches registration tags to images for organization and filtering in the Roboflow platform
- Registers the image (and annotations if enabled) to the specified Roboflow project via the Roboflow API
- Executes synchronously or asynchronously based on
fire_and_forgetsetting, allowing non-blocking uploads for faster workflow execution - Returns error status and messages indicating upload success, failure, or sampling skip
The block supports active learning workflows by implementing usage quotas that prevent excessive data collection, helping focus on collecting valuable training data within rate limits. The probabilistic sampling feature (new in v2) allows you to randomly sample a percentage of data for upload, enabling cost-effective data collection strategies where you want to collect representative samples rather than all data. Images are organized into labeling batches that can be automatically recreated on a schedule (daily, weekly, monthly), making it easier to manage and review collected data over time. The block can operate in fire-and-forget mode for asynchronous execution, allowing workflows to continue processing without waiting for uploads to complete, or synchronously for debugging and error handling.
Version Differences (v2 vs v1)¶
New Features in v2:
-
Probabilistic Data Sampling: Added
data_percentageparameter (0-100%) that enables random sampling of data for upload. This allows you to upload only a percentage of workflow inputs (e.g., 25% samples one in four images), reducing storage and annotation costs while still collecting representative data. When sampling skips an upload, the block returns a message indicating the skip. -
Improved Default Settings:
max_image_sizedefault increased from (512, 512) to (1920, 1080) for higher resolution data collectioncompression_leveldefault increased from 75 to 95 for better image quality preservation
Behavior Changes:
- By default,
data_percentageis set to 100, so v2 behaves identically to v1 unless sampling is explicitly configured - The block now uses probabilistic sampling before quota checking and image preparation, allowing efficient filtering before resource-intensive operations
Requirements¶
API Key Required: This block requires a valid Roboflow API key to upload data. The API key must be configured in your environment or workflow configuration. Visit https://docs.roboflow.com/api-reference/authentication#retrieve-an-api-key to learn how to retrieve an API key.
Common Use Cases¶
- Active Learning Data Collection: Collect images and predictions from production environments where models struggle or are uncertain (e.g., low-confidence detections, edge cases), enabling iterative model improvement by gathering challenging examples for retraining
- Probabilistic Data Sampling: Use
data_percentageto randomly sample a subset of data for upload (e.g., upload 20% of all detections, 50% of low-confidence cases), enabling cost-effective data collection strategies that reduce storage and annotation overhead while maintaining dataset diversity - Production Data Logging: Continuously upload production inference data to Roboflow datasets for monitoring, analysis, and future model training, creating a growing dataset from real-world deployments
- Pre-Labeled Data Collection: Upload images with model predictions as pre-labels (when
persist_predictionsis enabled), accelerating annotation workflows by providing initial labels that can be reviewed and corrected rather than starting from scratch - Stratified Data Sampling: Combine probabilistic sampling with rate limiting and quotas to selectively collect data based on specific criteria (e.g., sample 30% of detections that pass filters), ensuring diverse and balanced dataset collection without overwhelming storage or annotation resources
- Batch-Based Labeling Workflows: Organize uploaded data into batches with automatic recreation schedules (daily, weekly, monthly), making it easier to manage labeling tasks, track progress, and organize data collection efforts over time
Connecting to Other Blocks¶
This block receives data from workflow steps and uploads it to Roboflow:
- After detection or analysis blocks (e.g., Object Detection Model, Instance Segmentation Model, Classification Model, Keypoint Detection Model) to upload images along with their predictions, enabling active learning by collecting inference data with model outputs for annotation and retraining
- After filtering or analytics blocks (e.g., Detections Filter, Continue If, Overlap Filter) to selectively upload only specific types of data (e.g., low-confidence detections, overlapping objects, specific classes), focusing data collection on valuable edge cases or interesting scenarios
- After rate limiter blocks (e.g., Rate Limiter) to throttle upload frequency and stay within usage quotas, ensuring controlled data collection that respects rate limits and prevents excessive storage usage
- Image inputs or preprocessing blocks to upload raw images or processed images (e.g., crops, transformed images) without predictions, enabling collection of image data for future labeling or analysis
- Conditional workflows using flow control blocks (e.g., Continue If) to upload data only when certain conditions are met (e.g., upload only when detection count exceeds threshold, upload only errors or failures), enabling selective data collection based on workflow state
- Batch processing workflows where multiple images or predictions are generated, allowing bulk upload of workflow outputs to Roboflow datasets with probabilistic sampling for organized and cost-effective data collection
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/roboflow_dataset_upload@v2to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
target_project |
str |
Roboflow project identifier where uploaded images and annotations will be saved. Must be a valid project in your Roboflow workspace. The project name can be specified directly or referenced from workflow inputs.. | ✅ |
data_percentage |
float |
Percentage of input data (0.0 to 100.0) to randomly sample for upload. This enables probabilistic data collection where only a subset of inputs are uploaded, reducing storage and annotation costs. For example, 25.0 uploads approximately 25% of images (one in four on average), 50.0 uploads half, and 100.0 uploads everything (no sampling). Random sampling occurs before quota checking and image processing, making it efficient for large-scale data collection workflows.. | ✅ |
minutely_usage_limit |
int |
Maximum number of image uploads allowed per minute for this quota. Part of the usage quota system that enforces rate limits for active learning data collection. Uploads exceeding this limit are skipped to prevent excessive data collection. Works together with hourly_usage_limit and daily_usage_limit to provide multi-level rate limiting. Note: This quota is checked after probabilistic sampling via data_percentage.. | ❌ |
hourly_usage_limit |
int |
Maximum number of image uploads allowed per hour for this quota. Part of the usage quota system that enforces rate limits for active learning data collection. Uploads exceeding this limit are skipped to prevent excessive data collection. Works together with minutely_usage_limit and daily_usage_limit to provide multi-level rate limiting. Note: This quota is checked after probabilistic sampling via data_percentage.. | ❌ |
daily_usage_limit |
int |
Maximum number of image uploads allowed per day for this quota. Part of the usage quota system that enforces rate limits for active learning data collection. Uploads exceeding this limit are skipped to prevent excessive data collection. Works together with minutely_usage_limit and hourly_usage_limit to provide multi-level rate limiting. Note: This quota is checked after probabilistic sampling via data_percentage.. | ❌ |
usage_quota_name |
str |
Unique identifier for tracking usage quotas (minutely, hourly, daily limits). Used internally to manage rate limiting across multiple upload operations. Each unique quota name maintains separate counters, allowing different upload strategies or data collection workflows to have independent rate limits.. | ❌ |
max_image_size |
Tuple[int, int] |
Maximum dimensions (width, height) for uploaded images. Images exceeding these dimensions are automatically resized while preserving aspect ratio before uploading. Default is (1920, 1080) for higher resolution data collection. Use smaller sizes (e.g., (512, 512)) for efficient storage and faster uploads, or keep the default for preserving image quality.. | ❌ |
compression_level |
int |
JPEG compression quality level for uploaded images, ranging from 1 (highest compression, smallest file size, lower quality) to 100 (no compression, largest file size, highest quality). Default is 95 for better image quality preservation. Higher values preserve more image quality but increase storage and bandwidth usage. Typical values range from 70-95 for balanced quality and size.. | ❌ |
registration_tags |
List[str] |
List of tags to attach to uploaded images for organization and filtering in Roboflow. Tags can be static strings (e.g., 'location-florida', 'camera-1') or dynamic values from workflow inputs. Tags help organize collected data, filter images in Roboflow, and add metadata for dataset management. Can be an empty list if no tags are needed.. | ✅ |
persist_predictions |
bool |
If True, model predictions are saved as annotations (pre-labels) in the Roboflow dataset alongside images. This enables predictions to serve as starting points for annotation, allowing reviewers to correct or approve labels rather than creating them from scratch. If False, only images are uploaded without annotations. Enabling this accelerates annotation workflows by providing initial labels.. | ✅ |
disable_sink |
bool |
If True, the block execution is disabled and no uploads occur. This allows temporarily disabling data collection without removing the block from workflows, useful for testing, debugging, or conditional data collection. When disabled, returns a message indicating the sink was disabled. Default is False (uploads enabled).. | ✅ |
fire_and_forget |
bool |
If True, uploads execute asynchronously (fire-and-forget mode), allowing the workflow to continue immediately without waiting for upload completion. This improves workflow performance but prevents error handling. If False, uploads execute synchronously, blocking workflow execution until completion and allowing proper error handling and status reporting. Use async mode (True) for production workflows where speed is prioritized, and sync mode (False) for debugging or when error handling is critical.. | ✅ |
labeling_batch_prefix |
str |
Prefix used to generate labeling batch names for organizing uploaded images in Roboflow. Combined with the batch recreation frequency and timestamps to create batch names like 'workflows_data_collector_2024_01_15'. Batches help organize collected data for labeling, making it easier to manage and review uploaded images in groups. Can be customized to match your organization scheme.. | ✅ |
labeling_batches_recreation_frequency |
str |
Frequency at which new labeling batches are automatically created for uploaded images. Options: 'never' (all images go to the same batch), 'daily' (new batch each day), 'weekly' (new batch each week), 'monthly' (new batch each month). Batch timestamps are appended to the labeling_batch_prefix to create unique batch names. Automatically organizing uploads into time-based batches simplifies dataset management and makes it easier to track and review collected data over time.. | ❌ |
image_name |
str |
Optional custom name for the uploaded image. This is useful when you want to preserve the original filename or use a meaningful identifier (e.g., serial number, timestamp) for the image in the Roboflow dataset. The name should not include file extension. If not provided, a UUID will be generated automatically.. | ✅ |
metadata |
Dict[str, Union[bool, float, int, str]] |
Optional key-value metadata to attach to uploaded images. Metadata is stored as user_metadata on the image in Roboflow and can be used for filtering and organization. Values can be static strings, numbers, booleans, or references to workflow inputs/steps.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
requires_internet— air-gapped / offline deployments - This block depends on a service that is not reachable from fully offline / air-gapped deployments.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Roboflow Dataset Upload in version v2.
- inputs:
Cache Set,MoonshotAI Kimi,Path Deviation,Image Blur,SmolVLM2,Overlap Filter,Reference Path Visualization,PTZ Tracking (ONVIF),Event Writer,SAM2 Video Tracker,Clip Comparison,Object Detection Model,Email Notification,Pixelate Visualization,Qwen3-VL,JSON Parser,Anthropic Claude,Cache Get,OpenAI,Trace Visualization,Llama 3.2 Vision,Detection Event Log,ByteTrack Tracker,Camera Focus,GLM-OCR,PLC ModbusTCP,Buffer,QR Code Detection,SIFT Comparison,CSV Formatter,Webhook Sink,Image Contours,Motion Detection,Local File Sink,Google Gemini,Dimension Collapse,First Non Empty Or Default,Keypoint Visualization,Template Matching,Instance Segmentation Model,Icon Visualization,Seg Preview,Polygon Zone Visualization,Detections Transformation,Stability AI Outpainting,BoT-SORT Tracker,Byte Tracker,Halo Visualization,Expression,Distance Measurement,Morphological Transformation,VLM As Detector,Detections Consensus,Ellipse Visualization,Time in Zone,SAM 3,Size Measurement,Twilio SMS Notification,Email Notification,S3 Sink,Environment Secrets Store,Single-Label Classification Model,SAM 3,LMM For Classification,Dominant Color,Mask Area Measurement,Switch Case,OpenAI,Stitch Images,Identify Outliers,Time in Zone,Current Time,Stitch OCR Detections,Detections List Roll-Up,Pixel Color Count,Model Monitoring Inference Aggregator,Google Vision OCR,Image Threshold,Byte Tracker,Instance Segmentation Model,LMM,Polygon Visualization,Stability AI Image Generation,Line Counter Visualization,CogVLM,Relative Static Crop,Grid Visualization,Property Definition,Gaze Detection,Stitch OCR Detections,OPC UA Writer Sink,Color Visualization,Contrast Enhancement,Roboflow Dataset Upload,Absolute Static Crop,OC-SORT Tracker,OpenAI,Perspective Correction,Twilio SMS/MMS Notification,Roboflow Vision Events,Microsoft SQL Server Sink,Cosine Similarity,Perception Encoder Embedding Model,Instance Segmentation Model,Roboflow Custom Metadata,Camera Calibration,Keypoint Detection Model,Line Counter,Roboflow Asset Library Attributes,SIFT Comparison,Slack Notification,Halo Visualization,VLM As Classifier,Image Stack,CLIP Embedding Model,Google Gemma,Qwen 3.6 API,Bounding Rectangle,Dot Visualization,Label Visualization,Background Color Visualization,Llama 3.2 Vision,SAM 3 Interactive,Rate Limiter,Velocity,OpenAI-Compatible LLM,Google Gemini,Track Class Lock,Clip Comparison,OpenAI,Qwen3.5,MQTT Writer,MoonshotAI Kimi,Polygon Visualization,SIFT,Classification Label Visualization,Multi-Label Classification Model,Instance Segmentation Model,Keypoint Detection Model,Dynamic Crop,Stability AI Inpainting,Bounding Box Visualization,Multi-Label Classification Model,Crop Visualization,Continue If,Image Convert Grayscale,Mask Visualization,Delta Filter,Detections Stitch,Detection Offset,SORT Tracker,Barcode Detection,PLC EthernetIP,Text Display,Anthropic Claude,VLM As Classifier,Inner Workflow,Overlap Analysis,Roboflow Dataset Upload,Object Detection Model,Detections Filter,Detections Merge,Keypoint Detection Model,SAM3 Video Tracker,Circle Visualization,Semantic Segmentation Model,Path Deviation,Camera Focus,Identify Changes,Byte Tracker,Image Slicer,OCR Model,Heatmap Visualization,Google Gemma API,Morphological Transformation,Single-Label Classification Model,EasyOCR,YOLO-World Model,Blur Visualization,Moondream2,Florence-2 Model,Google Gemini,Corner Visualization,OpenRouter,Detections Stabilizer,Model Comparison Visualization,SAM 3,Single-Label Classification Model,Segment Anything 2 Model,Time in Zone,Mask Edge Snap,Line Counter,Qwen3.5-VL,Per-Class Confidence Filter,Image Preprocessing,Anthropic Claude,Dynamic Zone,Detections Combine,Triangle Visualization,Data Aggregator,QR Code Generator,Qwen 3.5 API,Background Subtraction,Multi-Label Classification Model,Image Slicer,Semantic Segmentation Model,Qwen-VL,Florence-2 Model,Depth Estimation,Contrast Equalization,Detections Classes Replacement,VLM As Detector,Qwen2.5-VL,Object Detection Model - outputs:
Cache Set,Roboflow Asset Library Attributes,MoonshotAI Kimi,Path Deviation,Image Blur,Reference Path Visualization,PTZ Tracking (ONVIF),Event Writer,Slack Notification,Halo Visualization,CLIP Embedding Model,Image Stack,Google Gemma,Qwen 3.6 API,Object Detection Model,Dot Visualization,Label Visualization,Background Color Visualization,Llama 3.2 Vision,Email Notification,SAM 3 Interactive,Pixelate Visualization,OpenAI-Compatible LLM,Google Gemini,Anthropic Claude,Cache Get,OpenAI,Trace Visualization,Llama 3.2 Vision,OpenAI,Clip Comparison,GLM-OCR,MQTT Writer,Webhook Sink,SIFT Comparison,Motion Detection,Local File Sink,Google Gemini,MoonshotAI Kimi,Polygon Visualization,Classification Label Visualization,Multi-Label Classification Model,Instance Segmentation Model,Keypoint Detection Model,Keypoint Visualization,Template Matching,Instance Segmentation Model,Icon Visualization,Seg Preview,Dynamic Crop,Stability AI Inpainting,BoT-SORT Tracker,Bounding Box Visualization,Multi-Label Classification Model,Polygon Zone Visualization,Crop Visualization,Stability AI Outpainting,Mask Visualization,Halo Visualization,Detections Stitch,Distance Measurement,Text Display,Anthropic Claude,Morphological Transformation,Line Counter,Roboflow Dataset Upload,Detections Consensus,Object Detection Model,Ellipse Visualization,Keypoint Detection Model,SAM3 Video Tracker,Time in Zone,SAM 3,Size Measurement,Circle Visualization,Semantic Segmentation Model,Twilio SMS Notification,Path Deviation,Email Notification,S3 Sink,Single-Label Classification Model,SAM 3,LMM For Classification,Heatmap Visualization,Google Gemma API,OpenAI,Time in Zone,Morphological Transformation,Single-Label Classification Model,YOLO-World Model,Current Time,Blur Visualization,Stitch OCR Detections,Moondream2,Florence-2 Model,Google Gemini,Corner Visualization,OpenRouter,Pixel Color Count,Model Comparison Visualization,SAM 3,Model Monitoring Inference Aggregator,Google Vision OCR,Image Threshold,Instance Segmentation Model,Single-Label Classification Model,LMM,Polygon Visualization,Segment Anything 2 Model,Time in Zone,Stability AI Image Generation,Line Counter Visualization,Line Counter,CogVLM,Qwen3.5-VL,Image Preprocessing,Gaze Detection,Stitch OCR Detections,Anthropic Claude,OPC UA Writer Sink,Color Visualization,Dynamic Zone,Triangle Visualization,QR Code Generator,Roboflow Dataset Upload,Qwen 3.5 API,Multi-Label Classification Model,OpenAI,Qwen-VL,Florence-2 Model,Perspective Correction,Roboflow Vision Events,Twilio SMS/MMS Notification,Microsoft SQL Server Sink,Perception Encoder Embedding Model,Instance Segmentation Model,Depth Estimation,Roboflow Custom Metadata,Contrast Equalization,Camera Calibration,Detections Classes Replacement,Keypoint Detection Model,Object Detection Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Roboflow Dataset Upload in version v2 has.
Bindings
-
input
images(image): Image(s) to upload to the Roboflow dataset. Can be a single image or batch of images from workflow inputs or processing steps. Images are randomly sampled based on data_percentage, resized if they exceed max_image_size, and compressed before uploading. Supports batch processing..target_project(roboflow_project): Roboflow project identifier where uploaded images and annotations will be saved. Must be a valid project in your Roboflow workspace. The project name can be specified directly or referenced from workflow inputs..predictions(Union[classification_prediction,keypoint_detection_prediction,instance_segmentation_prediction,object_detection_prediction]): Optional model predictions to upload alongside images. Predictions are saved as annotations (pre-labels) in the Roboflow dataset when persist_predictions is enabled, allowing predictions to serve as starting points for annotation review and correction. Supports object detection, instance segmentation, keypoint detection, and classification predictions. If None, only images are uploaded..data_percentage(float): Percentage of input data (0.0 to 100.0) to randomly sample for upload. This enables probabilistic data collection where only a subset of inputs are uploaded, reducing storage and annotation costs. For example, 25.0 uploads approximately 25% of images (one in four on average), 50.0 uploads half, and 100.0 uploads everything (no sampling). Random sampling occurs before quota checking and image processing, making it efficient for large-scale data collection workflows..registration_tags(Union[string,list_of_values]): List of tags to attach to uploaded images for organization and filtering in Roboflow. Tags can be static strings (e.g., 'location-florida', 'camera-1') or dynamic values from workflow inputs. Tags help organize collected data, filter images in Roboflow, and add metadata for dataset management. Can be an empty list if no tags are needed..persist_predictions(boolean): If True, model predictions are saved as annotations (pre-labels) in the Roboflow dataset alongside images. This enables predictions to serve as starting points for annotation, allowing reviewers to correct or approve labels rather than creating them from scratch. If False, only images are uploaded without annotations. Enabling this accelerates annotation workflows by providing initial labels..disable_sink(boolean): If True, the block execution is disabled and no uploads occur. This allows temporarily disabling data collection without removing the block from workflows, useful for testing, debugging, or conditional data collection. When disabled, returns a message indicating the sink was disabled. Default is False (uploads enabled)..fire_and_forget(boolean): If True, uploads execute asynchronously (fire-and-forget mode), allowing the workflow to continue immediately without waiting for upload completion. This improves workflow performance but prevents error handling. If False, uploads execute synchronously, blocking workflow execution until completion and allowing proper error handling and status reporting. Use async mode (True) for production workflows where speed is prioritized, and sync mode (False) for debugging or when error handling is critical..labeling_batch_prefix(string): Prefix used to generate labeling batch names for organizing uploaded images in Roboflow. Combined with the batch recreation frequency and timestamps to create batch names like 'workflows_data_collector_2024_01_15'. Batches help organize collected data for labeling, making it easier to manage and review uploaded images in groups. Can be customized to match your organization scheme..image_name(string): Optional custom name for the uploaded image. This is useful when you want to preserve the original filename or use a meaningful identifier (e.g., serial number, timestamp) for the image in the Roboflow dataset. The name should not include file extension. If not provided, a UUID will be generated automatically..metadata(*): Optional key-value metadata to attach to uploaded images. Metadata is stored as user_metadata on the image in Roboflow and can be used for filtering and organization. Values can be static strings, numbers, booleans, or references to workflow inputs/steps..
-
output
Example JSON definition of step Roboflow Dataset Upload in version v2
{
"name": "<your_step_name_here>",
"type": "roboflow_core/roboflow_dataset_upload@v2",
"images": "$inputs.image",
"target_project": "my_dataset",
"predictions": "$steps.object_detection_model.predictions",
"data_percentage": 100,
"minutely_usage_limit": 10,
"hourly_usage_limit": 10,
"daily_usage_limit": 10,
"usage_quota_name": "quota-for-data-sampling-1",
"max_image_size": [
1920,
1080
],
"compression_level": 95,
"registration_tags": [
"location-florida",
"factory-name",
"$inputs.dynamic_tag"
],
"persist_predictions": true,
"disable_sink": true,
"fire_and_forget": "<block_does_not_provide_example>",
"labeling_batch_prefix": "my_labeling_batch_name",
"labeling_batches_recreation_frequency": "never",
"image_name": "serial_12345",
"metadata": {
"camera_id": "cam_01",
"location": "$inputs.location"
}
}
v1¶
Class: RoboflowDatasetUploadBlockV1 (there are multiple versions of this block)
Source: inference.core.workflows.core_steps.sinks.roboflow.dataset_upload.v1.RoboflowDatasetUploadBlockV1
Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning
Upload images and model predictions to a Roboflow dataset for active learning, model improvement, and data collection, with configurable usage quotas, batch organization, image compression, and optional annotation persistence.
How This Block Works¶
This block uploads workflow images and predictions to your Roboflow dataset for storage, labeling, and model training. The block:
- Takes images and optional model predictions (object detection, instance segmentation, keypoint detection, or classification) as input
- Validates the Roboflow API key is available (required for uploading)
- Checks usage quotas (minutely, hourly, daily limits) to ensure uploads stay within configured rate limits for active learning strategies
- Prepares images by resizing if they exceed maximum size (maintaining aspect ratio) and compressing to specified quality level
- Generates labeling batch names based on the prefix and batch creation frequency (never, daily, weekly, or monthly), organizing uploaded data into batches
- Optionally persists model predictions as annotations if
persist_predictionsis enabled, allowing predictions to serve as pre-labels for review and correction - Attaches registration tags to images for organization and filtering in the Roboflow platform
- Registers the image (and annotations if enabled) to the specified Roboflow project via the Roboflow API
- Executes synchronously or asynchronously based on
fire_and_forgetsetting, allowing non-blocking uploads for faster workflow execution - Returns error status and messages indicating upload success or failure
The block supports active learning workflows by implementing usage quotas that prevent excessive data collection, helping focus on collecting valuable training data within rate limits. Images are organized into labeling batches that can be automatically recreated on a schedule (daily, weekly, monthly), making it easier to manage and review collected data over time. The block can operate in fire-and-forget mode for asynchronous execution, allowing workflows to continue processing without waiting for uploads to complete, or synchronously for debugging and error handling.
Requirements¶
API Key Required: This block requires a valid Roboflow API key to upload data. The API key must be configured in your environment or workflow configuration. Visit https://docs.roboflow.com/api-reference/authentication#retrieve-an-api-key to learn how to retrieve an API key.
Common Use Cases¶
- Active Learning Data Collection: Collect images and predictions from production environments where models struggle or are uncertain (e.g., low-confidence detections, edge cases), enabling iterative model improvement by gathering challenging examples for retraining
- Production Data Logging: Continuously upload production inference data to Roboflow datasets for monitoring, analysis, and future model training, creating a growing dataset from real-world deployments
- Pre-Labeled Data Collection: Upload images with model predictions as pre-labels (when
persist_predictionsis enabled), accelerating annotation workflows by providing initial labels that can be reviewed and corrected rather than starting from scratch - Stratified Data Sampling: Use rate limiting and quotas to selectively collect data based on specific criteria (e.g., combine with Rate Limiter or Continue If blocks), ensuring diverse and balanced dataset collection without overwhelming storage or annotation resources
- Batch-Based Labeling Workflows: Organize uploaded data into batches with automatic recreation schedules (daily, weekly, monthly), making it easier to manage labeling tasks, track progress, and organize data collection efforts over time
- Tagged Data Organization: Attach metadata tags to uploaded images (e.g., location, camera ID, time period, model version), enabling filtering and organization of collected data in Roboflow for better dataset management and analysis
Connecting to Other Blocks¶
This block receives data from workflow steps and uploads it to Roboflow:
- After detection or analysis blocks (e.g., Object Detection Model, Instance Segmentation Model, Classification Model, Keypoint Detection Model) to upload images along with their predictions, enabling active learning by collecting inference data with model outputs for annotation and retraining
- After filtering or analytics blocks (e.g., Detections Filter, Continue If, Overlap Filter) to selectively upload only specific types of data (e.g., low-confidence detections, overlapping objects, specific classes), focusing data collection on valuable edge cases or interesting scenarios
- After rate limiter blocks (e.g., Rate Limiter) to throttle upload frequency and stay within usage quotas, ensuring controlled data collection that respects rate limits and prevents excessive storage usage
- Image inputs or preprocessing blocks to upload raw images or processed images (e.g., crops, transformed images) without predictions, enabling collection of image data for future labeling or analysis
- Conditional workflows using flow control blocks (e.g., Continue If) to upload data only when certain conditions are met (e.g., upload only when detection count exceeds threshold, upload only errors or failures), enabling selective data collection based on workflow state
- Batch processing workflows where multiple images or predictions are generated, allowing bulk upload of workflow outputs to Roboflow datasets for organized data collection and management
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/roboflow_dataset_upload@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Enter a unique identifier for this step.. | ❌ |
target_project |
str |
Roboflow project identifier where uploaded images and annotations will be saved. Must be a valid project in your Roboflow workspace. The project name can be specified directly or referenced from workflow inputs.. | ✅ |
minutely_usage_limit |
int |
Maximum number of image uploads allowed per minute for this quota. Part of the usage quota system that enforces rate limits for active learning data collection. Uploads exceeding this limit are skipped to prevent excessive data collection. Works together with hourly_usage_limit and daily_usage_limit to provide multi-level rate limiting.. | ❌ |
hourly_usage_limit |
int |
Maximum number of image uploads allowed per hour for this quota. Part of the usage quota system that enforces rate limits for active learning data collection. Uploads exceeding this limit are skipped to prevent excessive data collection. Works together with minutely_usage_limit and daily_usage_limit to provide multi-level rate limiting.. | ❌ |
daily_usage_limit |
int |
Maximum number of image uploads allowed per day for this quota. Part of the usage quota system that enforces rate limits for active learning data collection. Uploads exceeding this limit are skipped to prevent excessive data collection. Works together with minutely_usage_limit and hourly_usage_limit to provide multi-level rate limiting.. | ❌ |
usage_quota_name |
str |
Unique identifier for tracking usage quotas (minutely, hourly, daily limits). Used internally to manage rate limiting across multiple upload operations. Each unique quota name maintains separate counters, allowing different upload strategies or data collection workflows to have independent rate limits.. | ❌ |
max_image_size |
Tuple[int, int] |
Maximum dimensions (width, height) for uploaded images. Images exceeding these dimensions are automatically resized while preserving aspect ratio before uploading. Smaller sizes reduce storage and bandwidth but may lose image quality. Use larger sizes (e.g., (1920, 1080)) for high-resolution data collection, or smaller sizes (e.g., (512, 512)) for efficient storage and faster uploads.. | ❌ |
compression_level |
int |
JPEG compression quality level for uploaded images, ranging from 1 (highest compression, smallest file size, lower quality) to 100 (no compression, largest file size, highest quality). Higher values preserve more image quality but increase storage and bandwidth usage. Typical values range from 70-90 for balanced quality and size. Default of 75 provides good quality with reasonable file sizes.. | ❌ |
registration_tags |
List[str] |
List of tags to attach to uploaded images for organization and filtering in Roboflow. Tags can be static strings (e.g., 'location-florida', 'camera-1') or dynamic values from workflow inputs. Tags help organize collected data, filter images in Roboflow, and add metadata for dataset management. Can be an empty list if no tags are needed.. | ✅ |
persist_predictions |
bool |
If True, model predictions are saved as annotations (pre-labels) in the Roboflow dataset alongside images. This enables predictions to serve as starting points for annotation, allowing reviewers to correct or approve labels rather than creating them from scratch. If False, only images are uploaded without annotations. Enabling this accelerates annotation workflows by providing initial labels.. | ❌ |
disable_sink |
bool |
If True, the block execution is disabled and no uploads occur. This allows temporarily disabling data collection without removing the block from workflows, useful for testing, debugging, or conditional data collection. When disabled, returns a message indicating the sink was disabled. Default is False (uploads enabled).. | ✅ |
fire_and_forget |
bool |
If True, uploads execute asynchronously (fire-and-forget mode), allowing the workflow to continue immediately without waiting for upload completion. This improves workflow performance but prevents error handling. If False, uploads execute synchronously, blocking workflow execution until completion and allowing proper error handling and status reporting. Use async mode (True) for production workflows where speed is prioritized, and sync mode (False) for debugging or when error handling is critical.. | ✅ |
labeling_batch_prefix |
str |
Prefix used to generate labeling batch names for organizing uploaded images in Roboflow. Combined with the batch recreation frequency and timestamps to create batch names like 'workflows_data_collector_2024_01_15'. Batches help organize collected data for labeling, making it easier to manage and review uploaded images in groups. Can be customized to match your organization scheme.. | ✅ |
labeling_batches_recreation_frequency |
str |
Frequency at which new labeling batches are automatically created for uploaded images. Options: 'never' (all images go to the same batch), 'daily' (new batch each day), 'weekly' (new batch each week), 'monthly' (new batch each month). Batch timestamps are appended to the labeling_batch_prefix to create unique batch names. Automatically organizing uploads into time-based batches simplifies dataset management and makes it easier to track and review collected data over time.. | ❌ |
image_name |
str |
Optional custom name for the uploaded image. If provided, this name will be used instead of an auto-generated UUID. This is useful when you want to preserve the original filename or use a meaningful identifier (e.g., serial number, timestamp) for the image in the Roboflow dataset. The name should not include file extension. If not provided, a UUID will be generated automatically.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
requires_internet— air-gapped / offline deployments - This block depends on a service that is not reachable from fully offline / air-gapped deployments.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Roboflow Dataset Upload in version v1.
- inputs:
MoonshotAI Kimi,Roboflow Asset Library Attributes,Path Deviation,Image Blur,Overlap Filter,Reference Path Visualization,PTZ Tracking (ONVIF),SIFT Comparison,Event Writer,Slack Notification,SAM2 Video Tracker,Halo Visualization,VLM As Classifier,Image Stack,Clip Comparison,Google Gemma,Bounding Rectangle,Object Detection Model,Dot Visualization,Qwen 3.6 API,Label Visualization,Background Color Visualization,Llama 3.2 Vision,Email Notification,SAM 3 Interactive,Velocity,Pixelate Visualization,OpenAI-Compatible LLM,Google Gemini,JSON Parser,Track Class Lock,Anthropic Claude,OpenAI,Trace Visualization,Llama 3.2 Vision,Detection Event Log,ByteTrack Tracker,Clip Comparison,Camera Focus,OpenAI,GLM-OCR,PLC ModbusTCP,Buffer,MQTT Writer,SIFT Comparison,CSV Formatter,Webhook Sink,Image Contours,Motion Detection,Local File Sink,Google Gemini,MoonshotAI Kimi,Polygon Visualization,Dimension Collapse,SIFT,Classification Label Visualization,Multi-Label Classification Model,Instance Segmentation Model,Keypoint Detection Model,Keypoint Visualization,Template Matching,Instance Segmentation Model,Icon Visualization,Seg Preview,Dynamic Crop,Stability AI Inpainting,Bounding Box Visualization,Polygon Zone Visualization,Stability AI Outpainting,Multi-Label Classification Model,Crop Visualization,Detections Transformation,BoT-SORT Tracker,Image Convert Grayscale,Byte Tracker,Mask Visualization,Halo Visualization,Detections Stitch,Detection Offset,SORT Tracker,PLC EthernetIP,Text Display,Morphological Transformation,Anthropic Claude,VLM As Classifier,Roboflow Dataset Upload,VLM As Detector,Detections Consensus,Object Detection Model,Detections Filter,Ellipse Visualization,Detections Merge,Keypoint Detection Model,SAM3 Video Tracker,Time in Zone,SAM 3,Size Measurement,Circle Visualization,Path Deviation,Twilio SMS Notification,Email Notification,S3 Sink,Camera Focus,Identify Changes,Single-Label Classification Model,Byte Tracker,SAM 3,Image Slicer,LMM For Classification,OCR Model,Mask Area Measurement,Heatmap Visualization,Google Gemma API,OpenAI,Stitch Images,Identify Outliers,Time in Zone,Morphological Transformation,Single-Label Classification Model,EasyOCR,YOLO-World Model,Current Time,Blur Visualization,Stitch OCR Detections,Moondream2,Detections List Roll-Up,Florence-2 Model,Google Gemini,Corner Visualization,OpenRouter,Detections Stabilizer,Model Comparison Visualization,SAM 3,Model Monitoring Inference Aggregator,Google Vision OCR,Image Threshold,Byte Tracker,Instance Segmentation Model,Single-Label Classification Model,LMM,Polygon Visualization,Segment Anything 2 Model,Time in Zone,Stability AI Image Generation,Line Counter Visualization,Mask Edge Snap,Line Counter,CogVLM,Relative Static Crop,Qwen3.5-VL,Per-Class Confidence Filter,Grid Visualization,Image Preprocessing,Gaze Detection,Stitch OCR Detections,Anthropic Claude,OPC UA Writer Sink,Color Visualization,Dynamic Zone,Detections Combine,Triangle Visualization,QR Code Generator,Contrast Enhancement,Qwen 3.5 API,Absolute Static Crop,Roboflow Dataset Upload,Background Subtraction,Multi-Label Classification Model,OC-SORT Tracker,OpenAI,Image Slicer,Qwen-VL,Florence-2 Model,Perspective Correction,Twilio SMS/MMS Notification,Roboflow Vision Events,Microsoft SQL Server Sink,Instance Segmentation Model,Depth Estimation,Roboflow Custom Metadata,Contrast Equalization,Camera Calibration,Detections Classes Replacement,VLM As Detector,Keypoint Detection Model,Object Detection Model - outputs:
Cache Set,Roboflow Asset Library Attributes,MoonshotAI Kimi,Path Deviation,Image Blur,Reference Path Visualization,PTZ Tracking (ONVIF),Event Writer,Slack Notification,Halo Visualization,CLIP Embedding Model,Image Stack,Google Gemma,Qwen 3.6 API,Object Detection Model,Dot Visualization,Label Visualization,Background Color Visualization,Llama 3.2 Vision,Email Notification,SAM 3 Interactive,Pixelate Visualization,OpenAI-Compatible LLM,Google Gemini,Anthropic Claude,Cache Get,OpenAI,Trace Visualization,Llama 3.2 Vision,OpenAI,Clip Comparison,GLM-OCR,MQTT Writer,Webhook Sink,SIFT Comparison,Motion Detection,Local File Sink,Google Gemini,MoonshotAI Kimi,Polygon Visualization,Classification Label Visualization,Multi-Label Classification Model,Instance Segmentation Model,Keypoint Detection Model,Keypoint Visualization,Template Matching,Instance Segmentation Model,Icon Visualization,Seg Preview,Dynamic Crop,Stability AI Inpainting,BoT-SORT Tracker,Bounding Box Visualization,Multi-Label Classification Model,Polygon Zone Visualization,Crop Visualization,Stability AI Outpainting,Mask Visualization,Halo Visualization,Detections Stitch,Distance Measurement,Text Display,Anthropic Claude,Morphological Transformation,Line Counter,Roboflow Dataset Upload,Detections Consensus,Object Detection Model,Ellipse Visualization,Keypoint Detection Model,SAM3 Video Tracker,Time in Zone,SAM 3,Size Measurement,Circle Visualization,Semantic Segmentation Model,Twilio SMS Notification,Path Deviation,Email Notification,S3 Sink,Single-Label Classification Model,SAM 3,LMM For Classification,Heatmap Visualization,Google Gemma API,OpenAI,Time in Zone,Morphological Transformation,Single-Label Classification Model,YOLO-World Model,Current Time,Blur Visualization,Stitch OCR Detections,Moondream2,Florence-2 Model,Google Gemini,Corner Visualization,OpenRouter,Pixel Color Count,Model Comparison Visualization,SAM 3,Model Monitoring Inference Aggregator,Google Vision OCR,Image Threshold,Instance Segmentation Model,Single-Label Classification Model,LMM,Polygon Visualization,Segment Anything 2 Model,Time in Zone,Stability AI Image Generation,Line Counter Visualization,Line Counter,CogVLM,Qwen3.5-VL,Image Preprocessing,Gaze Detection,Stitch OCR Detections,Anthropic Claude,OPC UA Writer Sink,Color Visualization,Dynamic Zone,Triangle Visualization,QR Code Generator,Roboflow Dataset Upload,Qwen 3.5 API,Multi-Label Classification Model,OpenAI,Qwen-VL,Florence-2 Model,Perspective Correction,Roboflow Vision Events,Twilio SMS/MMS Notification,Microsoft SQL Server Sink,Perception Encoder Embedding Model,Instance Segmentation Model,Depth Estimation,Roboflow Custom Metadata,Contrast Equalization,Camera Calibration,Detections Classes Replacement,Keypoint Detection Model,Object Detection Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Roboflow Dataset Upload in version v1 has.
Bindings
-
input
image(image): Image(s) to upload to the Roboflow dataset. Can be a single image or batch of images from workflow inputs or processing steps. Images are resized if they exceed max_image_size and compressed before uploading. Supports batch processing..predictions(Union[classification_prediction,keypoint_detection_prediction,instance_segmentation_prediction,object_detection_prediction]): Optional model predictions to upload alongside images. Predictions are saved as annotations (pre-labels) in the Roboflow dataset when persist_predictions is enabled, allowing predictions to serve as starting points for annotation review and correction. Supports object detection, instance segmentation, keypoint detection, and classification predictions. If None, only images are uploaded..target_project(roboflow_project): Roboflow project identifier where uploaded images and annotations will be saved. Must be a valid project in your Roboflow workspace. The project name can be specified directly or referenced from workflow inputs..registration_tags(Union[string,list_of_values]): List of tags to attach to uploaded images for organization and filtering in Roboflow. Tags can be static strings (e.g., 'location-florida', 'camera-1') or dynamic values from workflow inputs. Tags help organize collected data, filter images in Roboflow, and add metadata for dataset management. Can be an empty list if no tags are needed..disable_sink(boolean): If True, the block execution is disabled and no uploads occur. This allows temporarily disabling data collection without removing the block from workflows, useful for testing, debugging, or conditional data collection. When disabled, returns a message indicating the sink was disabled. Default is False (uploads enabled)..fire_and_forget(boolean): If True, uploads execute asynchronously (fire-and-forget mode), allowing the workflow to continue immediately without waiting for upload completion. This improves workflow performance but prevents error handling. If False, uploads execute synchronously, blocking workflow execution until completion and allowing proper error handling and status reporting. Use async mode (True) for production workflows where speed is prioritized, and sync mode (False) for debugging or when error handling is critical..labeling_batch_prefix(string): Prefix used to generate labeling batch names for organizing uploaded images in Roboflow. Combined with the batch recreation frequency and timestamps to create batch names like 'workflows_data_collector_2024_01_15'. Batches help organize collected data for labeling, making it easier to manage and review uploaded images in groups. Can be customized to match your organization scheme..image_name(string): Optional custom name for the uploaded image. If provided, this name will be used instead of an auto-generated UUID. This is useful when you want to preserve the original filename or use a meaningful identifier (e.g., serial number, timestamp) for the image in the Roboflow dataset. The name should not include file extension. If not provided, a UUID will be generated automatically..
-
output
Example JSON definition of step Roboflow Dataset Upload in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/roboflow_dataset_upload@v1",
"image": "$inputs.image",
"predictions": "$steps.object_detection_model.predictions",
"target_project": "my_project",
"minutely_usage_limit": 10,
"hourly_usage_limit": 10,
"daily_usage_limit": 10,
"usage_quota_name": "quota-for-data-sampling-1",
"max_image_size": [
512,
512
],
"compression_level": 75,
"registration_tags": [
"location-florida",
"factory-name",
"$inputs.dynamic_tag"
],
"persist_predictions": true,
"disable_sink": true,
"fire_and_forget": true,
"labeling_batch_prefix": "my_labeling_batch_name",
"labeling_batches_recreation_frequency": "never",
"image_name": "serial_12345"
}