Skip to content

Byte Tracker

v3

Class: ByteTrackerBlockV3 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.transformations.byte_tracker.v3.ByteTrackerBlockV3

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Track objects across video frames using the ByteTrack algorithm to maintain consistent object identities, handle occlusions and temporary disappearances, associate detections with existing tracks, assign unique track IDs, categorize instances as new or previously seen, and enable object behavior analysis, movement tracking, first-appearance detection, and video analytics workflows.

How This Block Works

This block maintains object tracking across sequential video frames by associating detections from each frame with existing tracks and creating new tracks for new objects, while also categorizing instances based on whether they've been seen before. The block:

  1. Receives detection predictions for the current frame and an image with embedded video metadata
  2. Extracts video metadata from the image (including frame rate and video identifier):
  3. Accesses video_metadata from the WorkflowImageData object
  4. Extracts fps (frames per second) for tracker configuration
  5. Extracts video_identifier to maintain separate tracking state for different videos
  6. Handles missing fps gracefully (defaults to 0 and logs a warning instead of failing)
  7. Initializes or retrieves a ByteTrack tracker for the video:
  8. Creates a new tracker instance for each unique video (identified by video_identifier)
  9. Stores trackers in memory to maintain tracking state across frames
  10. Configures tracker with frame rate from metadata and user-specified parameters
  11. Reuses existing tracker for subsequent frames of the same video
  12. Initializes or retrieves an instance cache for the video:
  13. Creates a cache to track which track IDs have been seen before
  14. Maintains separate cache for each video using video_identifier
  15. Configures cache size using instances_cache_size parameter
  16. Uses FIFO (First-In-First-Out) strategy to manage cache capacity
  17. Merges multiple detection batches if provided:
  18. Combines detections from multiple sources into a single detection set
  19. Ensures all detections are processed together for consistent tracking
  20. Updates tracks using ByteTrack algorithm:
  21. Track Association: Matches current frame detections to existing tracks using IoU (Intersection over Union) matching
  22. Track Activation: Creates new tracks for detections with confidence above track_activation_threshold that don't match existing tracks
  23. Track Matching: Associates detections to tracks when IoU exceeds minimum_matching_threshold
  24. Track Persistence: Maintains tracks that don't have matches using lost_track_buffer to handle temporary occlusions
  25. Track Validation: Only outputs tracks that have been present for at least minimum_consecutive_frames consecutive frames
  26. Categorizes tracked instances as new or already seen:
  27. For each tracked detection with a track_id, checks the instance cache
  28. New Instances: Track IDs not found in cache are marked as new (first appearance)
  29. Already Seen Instances: Track IDs found in cache are marked as already seen (reappearance)
  30. Updates cache with new track IDs, managing cache size with FIFO eviction
  31. Handles tracking challenges:
  32. Occlusions: Maintains tracks when objects are temporarily hidden (using lost_track_buffer frames)
  33. Missed Detections: Keeps tracks alive through frames with missing detections
  34. False Positives: Filters out tracks that don't persist long enough (minimum_consecutive_frames)
  35. Track Fragmentation: Reduces track splits by maintaining buffer for lost objects
  36. Assigns unique track IDs to each object:
  37. Each tracked object receives a consistent track_id that persists across frames
  38. Track IDs are assigned when tracks are activated and maintained throughout the video
  39. Enables tracking individual objects across the entire video sequence
  40. Returns three sets of tracked detections:
    • tracked_detections: All tracked detections with track IDs (same as v2)
    • new_instances: Detections with track IDs that are appearing for the first time (each track ID appears only once when first generated)
    • already_seen_instances: Detections with track IDs that have been seen before (track IDs appear each time the tracker associates them with detections)

ByteTrack is an efficient multi-object tracking algorithm that performs tracking-by-detection, associating detections across frames without requiring appearance features. It uses a two-stage association strategy: first matching high-confidence detections to tracks, then matching low-confidence detections to remaining tracks and lost tracks. The algorithm maintains a buffer for lost tracks, allowing it to recover tracks when objects temporarily disappear due to occlusions or detection failures. The instance categorization feature enables detection of first appearances (new objects entering the scene) versus reappearances (objects returning after occlusion or leaving frame), which is useful for counting, behavior analysis, and event detection. The configurable parameters allow fine-tuning tracking behavior: track_activation_threshold controls when new tracks are created (higher = more conservative), lost_track_buffer controls occlusion handling (higher = better occlusion recovery), minimum_matching_threshold controls association quality (higher = stricter matching), minimum_consecutive_frames filters short-lived false tracks (higher = fewer false tracks), and instances_cache_size controls how many track IDs to remember for new/seen categorization (higher = longer memory).

Common Use Cases

  • Video Analytics: Track objects across video frames for behavior analysis and movement patterns (e.g., track people movement in videos, monitor vehicle paths, analyze object trajectories), enabling video analytics workflows
  • First Appearance Detection: Identify new objects entering the scene for counting and event detection (e.g., detect new people entering area, identify new vehicles appearing, track first-time appearances), enabling new instance detection workflows
  • Traffic Monitoring: Track vehicles and objects in traffic scenes with appearance tracking (e.g., track vehicles across frames, monitor vehicle paths, count unique vehicles with consistent IDs, detect new vehicles entering scene), enabling traffic monitoring workflows
  • Surveillance Systems: Maintain object identities and detect new entries for security monitoring (e.g., track individuals in surveillance footage, detect new people entering area, monitor object movements, maintain object identities), enabling surveillance tracking workflows
  • Retail Analytics: Track customers and products with entry detection for retail insights (e.g., track customer paths, detect new customers entering store, monitor shopping behavior, analyze foot traffic patterns), enabling retail analytics workflows
  • Object Counting: Accurately count unique objects by tracking first appearances (e.g., count unique visitors by tracking new instances, count vehicles entering intersection, track unique object appearances), enabling accurate counting workflows

Connecting to Other Blocks

This block receives an image with video metadata and detection predictions, and produces tracked_detections, new_instances, and already_seen_instances:

  • After object detection, instance segmentation, or keypoint detection blocks to track detected objects across video frames (e.g., track detected objects in video, add track IDs to detections, maintain object identities across frames), enabling detection-to-tracking workflows
  • Using new_instances output to detect and process first appearances (e.g., count new objects, trigger actions on first appearance, detect new entries, initialize tracking for new objects), enabling new instance detection workflows
  • Using already_seen_instances output to process reappearances and returning objects (e.g., handle returning objects, process reappearances, filter for existing objects), enabling reappearance handling workflows
  • Before video analysis blocks that require consistent object identities (e.g., analyze tracked object behavior, process object trajectories, work with tracked object data), enabling tracking-to-analysis workflows
  • Before visualization blocks to display tracked objects with consistent colors or labels (e.g., visualize tracked objects, display track IDs, show object paths, highlight new instances), enabling tracking visualization workflows
  • Before logic blocks like Continue If to make decisions based on track information or instance status (e.g., continue if object is new, filter based on track IDs, make decisions using tracking data, handle new vs returning objects), enabling tracking-based decision workflows

Version Differences

Enhanced from v2:

  • Instance Categorization: Adds two new outputs (new_instances and already_seen_instances) that categorize tracked objects based on whether their track IDs have been seen before, enabling first-appearance detection and reappearance tracking
  • Instance Cache: Introduces an instance cache system that remembers previously seen track IDs across frames, allowing distinction between new objects entering the scene and objects reappearing after occlusion or leaving frame
  • Keypoint Detection Support: Adds support for keypoint detection predictions in addition to object detection and instance segmentation, expanding tracking capabilities to keypoint-based detection models
  • Configurable Cache Size: Adds instances_cache_size parameter to control how many track IDs are remembered in the cache, balancing memory usage with tracking history length
  • Enhanced Outputs: Returns three outputs instead of one - tracked_detections (all tracked objects), new_instances (first appearances), and already_seen_instances (reappearances)

Requirements

This block requires detection predictions (object detection, instance segmentation, or keypoint detection) and an image with embedded video metadata containing frame rate (fps) and video identifier information. The image's video_metadata should include a valid fps value for optimal tracking performance, though the block will continue with fps=0 if missing. The block maintains tracking state and instance cache across frames for each video, so it should be used in video workflows where frames are processed sequentially. For optimal tracking performance, detections should be provided consistently across frames. The algorithm works best with stable detection performance and handles temporary detection gaps through the lost_track_buffer mechanism. The instance cache maintains a history of seen track IDs with FIFO eviction when the cache size limit is reached.

Type identifier

Use the following identifier in step "type" field: roboflow_core/byte_tracker@v3to add the block as as step in your workflow.

Properties

Name Type Description Refs
name str Enter a unique identifier for this step..
track_activation_threshold float Confidence threshold for activating new tracks from detections. Must be between 0.0 and 1.0. Default is 0.25. Only detections with confidence above this threshold can create new tracks. Increasing this threshold (e.g., 0.3-0.5) improves tracking accuracy and stability by only creating tracks from high-confidence detections, but might miss true detections with lower confidence. Decreasing this threshold (e.g., 0.15-0.2) increases tracking completeness by accepting lower-confidence detections, but risks introducing noise and instability from false positives. Adjust based on detection model performance: use lower values if detections are reliable, higher values if false positives are common..
lost_track_buffer int Number of frames to maintain a track when it's lost (no matching detections). Must be a positive integer. Default is 30 frames. When an object temporarily disappears (due to occlusion, missed detection, or leaving frame), the track is maintained for this many frames before being considered lost. Increasing this value (e.g., 50-100) enhances occlusion handling and significantly reduces track fragmentation or disappearance caused by brief detection gaps, but increases memory usage. Decreasing this value (e.g., 10-20) reduces memory usage but may cause tracks to disappear during short occlusions. Adjust based on occlusion frequency: use higher values for frequent occlusions, lower values for stable tracking scenarios..
minimum_matching_threshold float IoU (Intersection over Union) threshold for matching detections to existing tracks. Must be between 0.0 and 1.0. Default is 0.8. Detections are associated with tracks when their bounding box IoU exceeds this threshold. Increasing this threshold (e.g., 0.85-0.95) improves tracking accuracy by requiring stronger spatial overlap, but risks track fragmentation when objects move quickly or detection boxes vary. Decreasing this threshold (e.g., 0.6-0.75) improves tracking completeness by accepting looser matches, but risks false positive associations and track drift. Adjust based on object movement speed and detection stability: use higher values for stable objects, lower values for fast-moving objects..
minimum_consecutive_frames int Minimum number of consecutive frames an object must be tracked before the track is considered valid and output. Must be a positive integer. Default is 1 (all tracks are immediately valid). Only tracks that persist for at least this many consecutive frames are included in the output. Increasing this value (e.g., 3-5) prevents the creation of accidental tracks from false detections or double detections, filtering out short-lived spurious tracks, but risks missing shorter legitimate tracks. Decreasing this value (e.g., 1) includes all tracks immediately, maximizing completeness but potentially including false tracks. Adjust based on false positive rate: use higher values if false detections are common, lower values if detections are reliable..
instances_cache_size int Maximum number of track IDs to remember in the instance cache for determining if instances are new or already seen. Must be a positive integer. Default is 16384. The cache uses FIFO (First-In-First-Out) eviction - when the cache is full, the oldest track ID is removed to make room for new ones. Increasing this value (e.g., 32768-65536) maintains longer history of seen track IDs, allowing detection of reappearances after longer gaps, but uses more memory. Decreasing this value (e.g., 8192) reduces memory usage but may lose history of track IDs that appeared earlier, causing reappearing objects to be classified as new. Adjust based on video length and object reappearance patterns: use higher values for long videos or frequent reappearances, lower values for short videos or rare reappearances..

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections

Compatible Blocks

Check what blocks you can connect to Byte Tracker in version v3.

Input and Output Bindings

The available connections depend on its binding kinds. Check what binding kinds Byte Tracker in version v3 has.

Bindings
  • input

    • image (image): Input image containing embedded video metadata (fps and video_identifier) required for ByteTrack initialization and tracking state management. The block extracts video_metadata from the WorkflowImageData object. The fps value is used to configure the tracker, and the video_identifier is used to maintain separate tracking state and instance cache for different videos. If fps is missing or invalid, the block defaults to 0 and logs a warning but continues operation. If processing multiple videos, each video should have a unique video_identifier in its metadata to maintain separate tracking states and caches. The block maintains persistent trackers and instance caches across frames for each video using the video_identifier..
    • detections (Union[instance_segmentation_prediction, keypoint_detection_prediction, object_detection_prediction]): Detection predictions (object detection, instance segmentation, or keypoint detection) for the current video frame to be tracked. The block associates these detections with existing tracks or creates new tracks. Supports object detection, instance segmentation, and keypoint detection predictions. Detections should be provided for each frame in sequence to maintain consistent tracking. If multiple detection batches are provided, they will be merged before tracking. The detections must include bounding boxes and class names (and keypoints if keypoint detection). After tracking, the output will include the same detections enhanced with track_id information, allowing identification of the same object across frames..
    • track_activation_threshold (float_zero_to_one): Confidence threshold for activating new tracks from detections. Must be between 0.0 and 1.0. Default is 0.25. Only detections with confidence above this threshold can create new tracks. Increasing this threshold (e.g., 0.3-0.5) improves tracking accuracy and stability by only creating tracks from high-confidence detections, but might miss true detections with lower confidence. Decreasing this threshold (e.g., 0.15-0.2) increases tracking completeness by accepting lower-confidence detections, but risks introducing noise and instability from false positives. Adjust based on detection model performance: use lower values if detections are reliable, higher values if false positives are common..
    • lost_track_buffer (integer): Number of frames to maintain a track when it's lost (no matching detections). Must be a positive integer. Default is 30 frames. When an object temporarily disappears (due to occlusion, missed detection, or leaving frame), the track is maintained for this many frames before being considered lost. Increasing this value (e.g., 50-100) enhances occlusion handling and significantly reduces track fragmentation or disappearance caused by brief detection gaps, but increases memory usage. Decreasing this value (e.g., 10-20) reduces memory usage but may cause tracks to disappear during short occlusions. Adjust based on occlusion frequency: use higher values for frequent occlusions, lower values for stable tracking scenarios..
    • minimum_matching_threshold (float_zero_to_one): IoU (Intersection over Union) threshold for matching detections to existing tracks. Must be between 0.0 and 1.0. Default is 0.8. Detections are associated with tracks when their bounding box IoU exceeds this threshold. Increasing this threshold (e.g., 0.85-0.95) improves tracking accuracy by requiring stronger spatial overlap, but risks track fragmentation when objects move quickly or detection boxes vary. Decreasing this threshold (e.g., 0.6-0.75) improves tracking completeness by accepting looser matches, but risks false positive associations and track drift. Adjust based on object movement speed and detection stability: use higher values for stable objects, lower values for fast-moving objects..
    • minimum_consecutive_frames (integer): Minimum number of consecutive frames an object must be tracked before the track is considered valid and output. Must be a positive integer. Default is 1 (all tracks are immediately valid). Only tracks that persist for at least this many consecutive frames are included in the output. Increasing this value (e.g., 3-5) prevents the creation of accidental tracks from false detections or double detections, filtering out short-lived spurious tracks, but risks missing shorter legitimate tracks. Decreasing this value (e.g., 1) includes all tracks immediately, maximizing completeness but potentially including false tracks. Adjust based on false positive rate: use higher values if false detections are common, lower values if detections are reliable..
  • output

Example JSON definition of step Byte Tracker in version v3
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/byte_tracker@v3",
    "image": "<block_does_not_provide_example>",
    "detections": "$steps.object_detection_model.predictions",
    "track_activation_threshold": 0.25,
    "lost_track_buffer": 30,
    "minimum_matching_threshold": 0.8,
    "minimum_consecutive_frames": 1,
    "instances_cache_size": "<block_does_not_provide_example>"
}

v2

Class: ByteTrackerBlockV2 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.transformations.byte_tracker.v2.ByteTrackerBlockV2

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Track objects across video frames using the ByteTrack algorithm to maintain consistent object identities, handle occlusions and temporary disappearances, associate detections with existing tracks, assign unique track IDs, and enable object behavior analysis, movement tracking, and video analytics workflows.

How This Block Works

This block maintains object tracking across sequential video frames by associating detections from each frame with existing tracks and creating new tracks for new objects. The block:

  1. Receives detection predictions for the current frame and an image with embedded video metadata
  2. Extracts video metadata from the image (including frame rate and video identifier):
  3. Accesses video_metadata from the WorkflowImageData object
  4. Extracts fps (frames per second) for tracker configuration
  5. Extracts video_identifier to maintain separate tracking state for different videos
  6. Handles missing fps gracefully (defaults to 0 and logs a warning instead of failing)
  7. Initializes or retrieves a ByteTrack tracker for the video:
  8. Creates a new tracker instance for each unique video (identified by video_identifier)
  9. Stores trackers in memory to maintain tracking state across frames
  10. Configures tracker with frame rate from metadata and user-specified parameters
  11. Reuses existing tracker for subsequent frames of the same video
  12. Merges multiple detection batches if provided:
  13. Combines detections from multiple sources into a single detection set
  14. Ensures all detections are processed together for consistent tracking
  15. Updates tracks using ByteTrack algorithm:
  16. Track Association: Matches current frame detections to existing tracks using IoU (Intersection over Union) matching
  17. Track Activation: Creates new tracks for detections with confidence above track_activation_threshold that don't match existing tracks
  18. Track Matching: Associates detections to tracks when IoU exceeds minimum_matching_threshold
  19. Track Persistence: Maintains tracks that don't have matches using lost_track_buffer to handle temporary occlusions
  20. Track Validation: Only outputs tracks that have been present for at least minimum_consecutive_frames consecutive frames
  21. Handles tracking challenges:
  22. Occlusions: Maintains tracks when objects are temporarily hidden (using lost_track_buffer frames)
  23. Missed Detections: Keeps tracks alive through frames with missing detections
  24. False Positives: Filters out tracks that don't persist long enough (minimum_consecutive_frames)
  25. Track Fragmentation: Reduces track splits by maintaining buffer for lost objects
  26. Assigns unique track IDs to each object:
  27. Each tracked object receives a consistent track_id that persists across frames
  28. Track IDs are assigned when tracks are activated and maintained throughout the video
  29. Enables tracking individual objects across the entire video sequence
  30. Returns tracked detections with track IDs:
  31. Outputs detection predictions enhanced with track_id information
  32. Each detection includes its assigned track_id for identifying the same object across frames
  33. Maintains all original detection properties (bounding boxes, confidence, class names) plus tracking information

ByteTrack is an efficient multi-object tracking algorithm that performs tracking-by-detection, associating detections across frames without requiring appearance features. It uses a two-stage association strategy: first matching high-confidence detections to tracks, then matching low-confidence detections to remaining tracks and lost tracks. The algorithm maintains a buffer for lost tracks, allowing it to recover tracks when objects temporarily disappear due to occlusions or detection failures. The configurable parameters allow fine-tuning tracking behavior: track_activation_threshold controls when new tracks are created (higher = more conservative), lost_track_buffer controls occlusion handling (higher = better occlusion recovery), minimum_matching_threshold controls association quality (higher = stricter matching), and minimum_consecutive_frames filters short-lived false tracks (higher = fewer false tracks).

Common Use Cases

  • Video Analytics: Track objects across video frames for behavior analysis and movement patterns (e.g., track people movement in videos, monitor vehicle paths, analyze object trajectories), enabling video analytics workflows
  • Traffic Monitoring: Track vehicles and objects in traffic scenes for traffic analysis (e.g., track vehicles across frames, monitor vehicle paths, count vehicles with consistent IDs), enabling traffic monitoring workflows
  • Surveillance Systems: Maintain object identities across video frames for security monitoring (e.g., track individuals in surveillance footage, monitor object movements, maintain object identities), enabling surveillance tracking workflows
  • Sports Analysis: Track players and objects in sports videos for performance analysis (e.g., track player movements, analyze player trajectories, monitor ball positions), enabling sports analysis workflows
  • Retail Analytics: Track customers and products across video frames for retail insights (e.g., track customer paths, monitor shopping behavior, analyze foot traffic patterns), enabling retail analytics workflows
  • Object Behavior Analysis: Track objects to analyze their behavior and interactions over time (e.g., analyze object interactions, study movement patterns, track object relationships), enabling behavior analysis workflows

Connecting to Other Blocks

This block receives an image with video metadata and detection predictions, and produces tracked_detections with track IDs:

  • After object detection or instance segmentation blocks to track detected objects across video frames (e.g., track detected objects in video, add track IDs to detections, maintain object identities across frames), enabling detection-to-tracking workflows
  • Before video analysis blocks that require consistent object identities (e.g., analyze tracked object behavior, process object trajectories, work with tracked object data), enabling tracking-to-analysis workflows
  • Before visualization blocks to display tracked objects with consistent colors or labels (e.g., visualize tracked objects, display track IDs, show object paths), enabling tracking visualization workflows
  • Before logic blocks like Continue If to make decisions based on track information (e.g., continue if object is tracked, filter based on track IDs, make decisions using tracking data), enabling tracking-based decision workflows
  • Before counting or aggregation blocks to count tracked objects accurately (e.g., count unique tracked objects, aggregate track statistics, process track data), enabling tracking-to-counting workflows
  • In video processing pipelines where object tracking is part of a larger video analysis workflow (e.g., track objects in video pipelines, maintain identities in processing chains, enable video analytics), enabling video tracking pipeline workflows

Version Differences

Enhanced from v1:

  • Simplified Input: Uses image input that contains embedded video metadata instead of requiring a separate metadata field, simplifying workflow connections and reducing input complexity
  • Graceful FPS Handling: Handles missing or invalid fps values gracefully by defaulting to 0 and logging a warning instead of raising an error, making the block more resilient to incomplete metadata
  • Improved Integration: Better integration with image-based workflows since video metadata is accessed directly from the image object rather than requiring separate metadata input

Requirements

This block requires detection predictions (object detection or instance segmentation) and an image with embedded video metadata containing frame rate (fps) and video identifier information. The image's video_metadata should include a valid fps value for optimal tracking performance, though the block will continue with fps=0 if missing. The block maintains tracking state across frames for each video, so it should be used in video workflows where frames are processed sequentially. For optimal tracking performance, detections should be provided consistently across frames. The algorithm works best with stable detection performance and handles temporary detection gaps through the lost_track_buffer mechanism.

Type identifier

Use the following identifier in step "type" field: roboflow_core/byte_tracker@v2to add the block as as step in your workflow.

Properties

Name Type Description Refs
name str Enter a unique identifier for this step..
track_activation_threshold float Confidence threshold for activating new tracks from detections. Must be between 0.0 and 1.0. Default is 0.25. Only detections with confidence above this threshold can create new tracks. Increasing this threshold (e.g., 0.3-0.5) improves tracking accuracy and stability by only creating tracks from high-confidence detections, but might miss true detections with lower confidence. Decreasing this threshold (e.g., 0.15-0.2) increases tracking completeness by accepting lower-confidence detections, but risks introducing noise and instability from false positives. Adjust based on detection model performance: use lower values if detections are reliable, higher values if false positives are common..
lost_track_buffer int Number of frames to maintain a track when it's lost (no matching detections). Must be a positive integer. Default is 30 frames. When an object temporarily disappears (due to occlusion, missed detection, or leaving frame), the track is maintained for this many frames before being considered lost. Increasing this value (e.g., 50-100) enhances occlusion handling and significantly reduces track fragmentation or disappearance caused by brief detection gaps, but increases memory usage. Decreasing this value (e.g., 10-20) reduces memory usage but may cause tracks to disappear during short occlusions. Adjust based on occlusion frequency: use higher values for frequent occlusions, lower values for stable tracking scenarios..
minimum_matching_threshold float IoU (Intersection over Union) threshold for matching detections to existing tracks. Must be between 0.0 and 1.0. Default is 0.8. Detections are associated with tracks when their bounding box IoU exceeds this threshold. Increasing this threshold (e.g., 0.85-0.95) improves tracking accuracy by requiring stronger spatial overlap, but risks track fragmentation when objects move quickly or detection boxes vary. Decreasing this threshold (e.g., 0.6-0.75) improves tracking completeness by accepting looser matches, but risks false positive associations and track drift. Adjust based on object movement speed and detection stability: use higher values for stable objects, lower values for fast-moving objects..
minimum_consecutive_frames int Minimum number of consecutive frames an object must be tracked before the track is considered valid and output. Must be a positive integer. Default is 1 (all tracks are immediately valid). Only tracks that persist for at least this many consecutive frames are included in the output. Increasing this value (e.g., 3-5) prevents the creation of accidental tracks from false detections or double detections, filtering out short-lived spurious tracks, but risks missing shorter legitimate tracks. Decreasing this value (e.g., 1) includes all tracks immediately, maximizing completeness but potentially including false tracks. Adjust based on false positive rate: use higher values if false detections are common, lower values if detections are reliable..

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections

Compatible Blocks

Check what blocks you can connect to Byte Tracker in version v2.

Input and Output Bindings

The available connections depend on its binding kinds. Check what binding kinds Byte Tracker in version v2 has.

Bindings
  • input

    • image (image): Input image containing embedded video metadata (fps and video_identifier) required for ByteTrack initialization and tracking state management. The block extracts video_metadata from the WorkflowImageData object. The fps value is used to configure the tracker, and the video_identifier is used to maintain separate tracking state for different videos. If fps is missing or invalid, the block defaults to 0 and logs a warning but continues operation. If processing multiple videos, each video should have a unique video_identifier in its metadata to maintain separate tracking states. The block maintains persistent trackers across frames for each video using the video_identifier. This version simplifies input by embedding metadata in the image object rather than requiring a separate metadata field..
    • detections (Union[instance_segmentation_prediction, object_detection_prediction]): Detection predictions (object detection or instance segmentation) for the current video frame to be tracked. The block associates these detections with existing tracks or creates new tracks. Detections should be provided for each frame in sequence to maintain consistent tracking. If multiple detection batches are provided, they will be merged before tracking. The detections must include bounding boxes and class names. After tracking, the output will include the same detections enhanced with track_id information, allowing identification of the same object across frames..
    • track_activation_threshold (float_zero_to_one): Confidence threshold for activating new tracks from detections. Must be between 0.0 and 1.0. Default is 0.25. Only detections with confidence above this threshold can create new tracks. Increasing this threshold (e.g., 0.3-0.5) improves tracking accuracy and stability by only creating tracks from high-confidence detections, but might miss true detections with lower confidence. Decreasing this threshold (e.g., 0.15-0.2) increases tracking completeness by accepting lower-confidence detections, but risks introducing noise and instability from false positives. Adjust based on detection model performance: use lower values if detections are reliable, higher values if false positives are common..
    • lost_track_buffer (integer): Number of frames to maintain a track when it's lost (no matching detections). Must be a positive integer. Default is 30 frames. When an object temporarily disappears (due to occlusion, missed detection, or leaving frame), the track is maintained for this many frames before being considered lost. Increasing this value (e.g., 50-100) enhances occlusion handling and significantly reduces track fragmentation or disappearance caused by brief detection gaps, but increases memory usage. Decreasing this value (e.g., 10-20) reduces memory usage but may cause tracks to disappear during short occlusions. Adjust based on occlusion frequency: use higher values for frequent occlusions, lower values for stable tracking scenarios..
    • minimum_matching_threshold (float_zero_to_one): IoU (Intersection over Union) threshold for matching detections to existing tracks. Must be between 0.0 and 1.0. Default is 0.8. Detections are associated with tracks when their bounding box IoU exceeds this threshold. Increasing this threshold (e.g., 0.85-0.95) improves tracking accuracy by requiring stronger spatial overlap, but risks track fragmentation when objects move quickly or detection boxes vary. Decreasing this threshold (e.g., 0.6-0.75) improves tracking completeness by accepting looser matches, but risks false positive associations and track drift. Adjust based on object movement speed and detection stability: use higher values for stable objects, lower values for fast-moving objects..
    • minimum_consecutive_frames (integer): Minimum number of consecutive frames an object must be tracked before the track is considered valid and output. Must be a positive integer. Default is 1 (all tracks are immediately valid). Only tracks that persist for at least this many consecutive frames are included in the output. Increasing this value (e.g., 3-5) prevents the creation of accidental tracks from false detections or double detections, filtering out short-lived spurious tracks, but risks missing shorter legitimate tracks. Decreasing this value (e.g., 1) includes all tracks immediately, maximizing completeness but potentially including false tracks. Adjust based on false positive rate: use higher values if false detections are common, lower values if detections are reliable..
  • output

Example JSON definition of step Byte Tracker in version v2
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/byte_tracker@v2",
    "image": "<block_does_not_provide_example>",
    "detections": "$steps.object_detection_model.predictions",
    "track_activation_threshold": 0.25,
    "lost_track_buffer": 30,
    "minimum_matching_threshold": 0.8,
    "minimum_consecutive_frames": 1
}

v1

Class: ByteTrackerBlockV1 (there are multiple versions of this block)

Source: inference.core.workflows.core_steps.transformations.byte_tracker.v1.ByteTrackerBlockV1

Warning: This block has multiple versions. Please refer to the specific version for details. You can learn more about how versions work here: Versioning

Track objects across video frames using the ByteTrack algorithm to maintain consistent object identities, handle occlusions and temporary disappearances, associate detections with existing tracks, assign unique track IDs, and enable object behavior analysis, movement tracking, and video analytics workflows.

How This Block Works

This block maintains object tracking across sequential video frames by associating detections from each frame with existing tracks and creating new tracks for new objects. The block:

  1. Receives detection predictions for the current frame and video metadata (including frame rate and video identifier)
  2. Initializes or retrieves a ByteTrack tracker for the video:
  3. Creates a new tracker instance for each unique video (identified by video_identifier)
  4. Stores trackers in memory to maintain tracking state across frames
  5. Configures tracker with frame rate from metadata and user-specified parameters
  6. Reuses existing tracker for subsequent frames of the same video
  7. Merges multiple detection batches if provided:
  8. Combines detections from multiple sources into a single detection set
  9. Ensures all detections are processed together for consistent tracking
  10. Updates tracks using ByteTrack algorithm:
  11. Track Association: Matches current frame detections to existing tracks using IoU (Intersection over Union) matching
  12. Track Activation: Creates new tracks for detections with confidence above track_activation_threshold that don't match existing tracks
  13. Track Matching: Associates detections to tracks when IoU exceeds minimum_matching_threshold
  14. Track Persistence: Maintains tracks that don't have matches using lost_track_buffer to handle temporary occlusions
  15. Track Validation: Only outputs tracks that have been present for at least minimum_consecutive_frames consecutive frames
  16. Handles tracking challenges:
  17. Occlusions: Maintains tracks when objects are temporarily hidden (using lost_track_buffer frames)
  18. Missed Detections: Keeps tracks alive through frames with missing detections
  19. False Positives: Filters out tracks that don't persist long enough (minimum_consecutive_frames)
  20. Track Fragmentation: Reduces track splits by maintaining buffer for lost objects
  21. Assigns unique track IDs to each object:
  22. Each tracked object receives a consistent track_id that persists across frames
  23. Track IDs are assigned when tracks are activated and maintained throughout the video
  24. Enables tracking individual objects across the entire video sequence
  25. Returns tracked detections with track IDs:
  26. Outputs detection predictions enhanced with track_id information
  27. Each detection includes its assigned track_id for identifying the same object across frames
  28. Maintains all original detection properties (bounding boxes, confidence, class names) plus tracking information

ByteTrack is an efficient multi-object tracking algorithm that performs tracking-by-detection, associating detections across frames without requiring appearance features. It uses a two-stage association strategy: first matching high-confidence detections to tracks, then matching low-confidence detections to remaining tracks and lost tracks. The algorithm maintains a buffer for lost tracks, allowing it to recover tracks when objects temporarily disappear due to occlusions or detection failures. The configurable parameters allow fine-tuning tracking behavior: track_activation_threshold controls when new tracks are created (higher = more conservative), lost_track_buffer controls occlusion handling (higher = better occlusion recovery), minimum_matching_threshold controls association quality (higher = stricter matching), and minimum_consecutive_frames filters short-lived false tracks (higher = fewer false tracks).

Common Use Cases

  • Video Analytics: Track objects across video frames for behavior analysis and movement patterns (e.g., track people movement in videos, monitor vehicle paths, analyze object trajectories), enabling video analytics workflows
  • Traffic Monitoring: Track vehicles and objects in traffic scenes for traffic analysis (e.g., track vehicles across frames, monitor vehicle paths, count vehicles with consistent IDs), enabling traffic monitoring workflows
  • Surveillance Systems: Maintain object identities across video frames for security monitoring (e.g., track individuals in surveillance footage, monitor object movements, maintain object identities), enabling surveillance tracking workflows
  • Sports Analysis: Track players and objects in sports videos for performance analysis (e.g., track player movements, analyze player trajectories, monitor ball positions), enabling sports analysis workflows
  • Retail Analytics: Track customers and products across video frames for retail insights (e.g., track customer paths, monitor shopping behavior, analyze foot traffic patterns), enabling retail analytics workflows
  • Object Behavior Analysis: Track objects to analyze their behavior and interactions over time (e.g., analyze object interactions, study movement patterns, track object relationships), enabling behavior analysis workflows

Connecting to Other Blocks

This block receives detection predictions and video metadata, and produces tracked_detections with track IDs:

  • After object detection or instance segmentation blocks to track detected objects across video frames (e.g., track detected objects in video, add track IDs to detections, maintain object identities across frames), enabling detection-to-tracking workflows
  • Before video analysis blocks that require consistent object identities (e.g., analyze tracked object behavior, process object trajectories, work with tracked object data), enabling tracking-to-analysis workflows
  • Before visualization blocks to display tracked objects with consistent colors or labels (e.g., visualize tracked objects, display track IDs, show object paths), enabling tracking visualization workflows
  • Before logic blocks like Continue If to make decisions based on track information (e.g., continue if object is tracked, filter based on track IDs, make decisions using tracking data), enabling tracking-based decision workflows
  • Before counting or aggregation blocks to count tracked objects accurately (e.g., count unique tracked objects, aggregate track statistics, process track data), enabling tracking-to-counting workflows
  • In video processing pipelines where object tracking is part of a larger video analysis workflow (e.g., track objects in video pipelines, maintain identities in processing chains, enable video analytics), enabling video tracking pipeline workflows

Requirements

This block requires detection predictions (object detection or instance segmentation) and video metadata with frame rate (fps) information. The video metadata must include a valid fps value for ByteTrack initialization. The block maintains tracking state across frames for each video, so it should be used in video workflows where frames are processed sequentially. For optimal tracking performance, detections should be provided consistently across frames. The algorithm works best with stable detection performance and handles temporary detection gaps through the lost_track_buffer mechanism.

Type identifier

Use the following identifier in step "type" field: roboflow_core/byte_tracker@v1to add the block as as step in your workflow.

Properties

Name Type Description Refs
name str Enter a unique identifier for this step..
track_activation_threshold float Confidence threshold for activating new tracks from detections. Must be between 0.0 and 1.0. Default is 0.25. Only detections with confidence above this threshold can create new tracks. Increasing this threshold (e.g., 0.3-0.5) improves tracking accuracy and stability by only creating tracks from high-confidence detections, but might miss true detections with lower confidence. Decreasing this threshold (e.g., 0.15-0.2) increases tracking completeness by accepting lower-confidence detections, but risks introducing noise and instability from false positives. Adjust based on detection model performance: use lower values if detections are reliable, higher values if false positives are common..
lost_track_buffer int Number of frames to maintain a track when it's lost (no matching detections). Must be a positive integer. Default is 30 frames. When an object temporarily disappears (due to occlusion, missed detection, or leaving frame), the track is maintained for this many frames before being considered lost. Increasing this value (e.g., 50-100) enhances occlusion handling and significantly reduces track fragmentation or disappearance caused by brief detection gaps, but increases memory usage. Decreasing this value (e.g., 10-20) reduces memory usage but may cause tracks to disappear during short occlusions. Adjust based on occlusion frequency: use higher values for frequent occlusions, lower values for stable tracking scenarios..
minimum_matching_threshold float IoU (Intersection over Union) threshold for matching detections to existing tracks. Must be between 0.0 and 1.0. Default is 0.8. Detections are associated with tracks when their bounding box IoU exceeds this threshold. Increasing this threshold (e.g., 0.85-0.95) improves tracking accuracy by requiring stronger spatial overlap, but risks track fragmentation when objects move quickly or detection boxes vary. Decreasing this threshold (e.g., 0.6-0.75) improves tracking completeness by accepting looser matches, but risks false positive associations and track drift. Adjust based on object movement speed and detection stability: use higher values for stable objects, lower values for fast-moving objects..
minimum_consecutive_frames int Minimum number of consecutive frames an object must be tracked before the track is considered valid and output. Must be a positive integer. Default is 1 (all tracks are immediately valid). Only tracks that persist for at least this many consecutive frames are included in the output. Increasing this value (e.g., 3-5) prevents the creation of accidental tracks from false detections or double detections, filtering out short-lived spurious tracks, but risks missing shorter legitimate tracks. Decreasing this value (e.g., 1) includes all tracks immediately, maximizing completeness but potentially including false tracks. Adjust based on false positive rate: use higher values if false detections are common, lower values if detections are reliable..

The Refs column marks possibility to parametrise the property with dynamic values available in workflow runtime. See Bindings for more info.

Available Connections

Compatible Blocks

Check what blocks you can connect to Byte Tracker in version v1.

Input and Output Bindings

The available connections depend on its binding kinds. Check what binding kinds Byte Tracker in version v1 has.

Bindings
  • input

    • metadata (video_metadata): Video metadata containing frame rate (fps) and video identifier information required for ByteTrack initialization and tracking state management. The fps value is used to configure the tracker, and the video_identifier is used to maintain separate tracking state for different videos. The metadata must include valid fps information - ByteTrack requires frame rate to initialize. If processing multiple videos, each video's metadata should have a unique video_identifier to maintain separate tracking states. The block maintains persistent trackers across frames for each video using the video_identifier..
    • detections (Union[instance_segmentation_prediction, object_detection_prediction]): Detection predictions (object detection or instance segmentation) for the current video frame to be tracked. The block associates these detections with existing tracks or creates new tracks. Detections should be provided for each frame in sequence to maintain consistent tracking. If multiple detection batches are provided, they will be merged before tracking. The detections must include bounding boxes and class names. After tracking, the output will include the same detections enhanced with track_id information, allowing identification of the same object across frames..
    • track_activation_threshold (float_zero_to_one): Confidence threshold for activating new tracks from detections. Must be between 0.0 and 1.0. Default is 0.25. Only detections with confidence above this threshold can create new tracks. Increasing this threshold (e.g., 0.3-0.5) improves tracking accuracy and stability by only creating tracks from high-confidence detections, but might miss true detections with lower confidence. Decreasing this threshold (e.g., 0.15-0.2) increases tracking completeness by accepting lower-confidence detections, but risks introducing noise and instability from false positives. Adjust based on detection model performance: use lower values if detections are reliable, higher values if false positives are common..
    • lost_track_buffer (integer): Number of frames to maintain a track when it's lost (no matching detections). Must be a positive integer. Default is 30 frames. When an object temporarily disappears (due to occlusion, missed detection, or leaving frame), the track is maintained for this many frames before being considered lost. Increasing this value (e.g., 50-100) enhances occlusion handling and significantly reduces track fragmentation or disappearance caused by brief detection gaps, but increases memory usage. Decreasing this value (e.g., 10-20) reduces memory usage but may cause tracks to disappear during short occlusions. Adjust based on occlusion frequency: use higher values for frequent occlusions, lower values for stable tracking scenarios..
    • minimum_matching_threshold (float_zero_to_one): IoU (Intersection over Union) threshold for matching detections to existing tracks. Must be between 0.0 and 1.0. Default is 0.8. Detections are associated with tracks when their bounding box IoU exceeds this threshold. Increasing this threshold (e.g., 0.85-0.95) improves tracking accuracy by requiring stronger spatial overlap, but risks track fragmentation when objects move quickly or detection boxes vary. Decreasing this threshold (e.g., 0.6-0.75) improves tracking completeness by accepting looser matches, but risks false positive associations and track drift. Adjust based on object movement speed and detection stability: use higher values for stable objects, lower values for fast-moving objects..
    • minimum_consecutive_frames (integer): Minimum number of consecutive frames an object must be tracked before the track is considered valid and output. Must be a positive integer. Default is 1 (all tracks are immediately valid). Only tracks that persist for at least this many consecutive frames are included in the output. Increasing this value (e.g., 3-5) prevents the creation of accidental tracks from false detections or double detections, filtering out short-lived spurious tracks, but risks missing shorter legitimate tracks. Decreasing this value (e.g., 1) includes all tracks immediately, maximizing completeness but potentially including false tracks. Adjust based on false positive rate: use higher values if false detections are common, lower values if detections are reliable..
  • output

Example JSON definition of step Byte Tracker in version v1
{
    "name": "<your_step_name_here>",
    "type": "roboflow_core/byte_tracker@v1",
    "metadata": "<block_does_not_provide_example>",
    "detections": "$steps.object_detection_model.predictions",
    "track_activation_threshold": 0.25,
    "lost_track_buffer": 30,
    "minimum_matching_threshold": 0.8,
    "minimum_consecutive_frames": 1
}