Identify Outliers¶
Class: IdentifyOutliersBlockV1
Source: inference.core.workflows.core_steps.sampling.identify_outliers.v1.IdentifyOutliersBlockV1
Identify outlier embeddings compared to prior data using von Mises-Fisher statistical distribution analysis to detect anomalies, unusual patterns, or deviations from normal behavior by comparing current embedding vectors against a sliding window of historical embeddings for quality control, anomaly detection, and data sampling workflows.
How This Block Works¶
This block detects outliers by statistically comparing embedding vectors against historical data using directional statistics. The block:
- Receives an embedding vector representing the current data point's features
- Normalizes the embedding to unit length:
- Converts the embedding to a unit vector (length = 1) for directional analysis
- Enables comparison using angular/directional statistics rather than distance-based metrics
- Handles zero vectors gracefully by skipping normalization
- Tracks sample count and warmup status:
- Increments sample counter for each processed embedding
- Determines if still in warmup period (samples < warmup parameter)
- During warmup, no outliers are identified to allow baseline establishment
- Maintains a sliding window of historical embeddings:
- Stores normalized embeddings in a buffer that grows up to window_size
- When buffer exceeds window_size, removes oldest embeddings (FIFO)
- Creates a rolling history of recent data for statistical comparison
- Fits von Mises-Fisher (vMF) distribution parameters during warmup completion:
- Mean Direction (mu): Calculates the average direction of all historical embeddings
- Concentration Parameter (kappa): Measures how tightly clustered the embeddings are around the mean
- Uses statistical estimation to model the distribution of embedding directions
- vMF distribution is ideal for directional data on a hypersphere (unit vectors)
- Computes alignment score for current embedding:
- Calculates dot product between current normalized embedding and mean direction vector
- Measures how well the current embedding aligns with the typical direction
- Higher values indicate closer alignment to the norm, lower values indicate deviation
- Calculates empirical percentile of current embedding:
- Computes alignment scores for all historical embeddings against the mean direction
- Ranks the current embedding's alignment score among historical scores
- Determines percentile position (0.0 = lowest, 1.0 = highest) of current embedding
- Determines outlier status based on percentile thresholds:
- Flags as outlier if percentile is below threshold_percentile (e.g., bottom 5%)
- Flags as outlier if percentile is above (1 - threshold_percentile) (e.g., top 5%)
- Detects both extreme low and extreme high deviations from the norm
- Returns three outputs:
- is_outlier: Boolean flag indicating if the current embedding is an outlier
- percentile: Float value (0.0-1.0) representing where the embedding ranks among historical data
- warming_up: Boolean flag indicating if still in warmup period (always False after warmup)
The block uses von Mises-Fisher distribution analysis, which is designed for directional data on a hypersphere (unit vectors). This makes it well-suited for high-dimensional embeddings where direction matters more than magnitude. The sliding window approach ensures the statistical model adapts to recent trends while the percentile-based detection identifies embeddings that are unusually different from the historical pattern. Lower percentiles indicate embeddings that are less aligned with typical patterns, while higher percentiles indicate embeddings that are unusually well-aligned or different in a positive direction.
Common Use Cases¶
- Anomaly Detection: Detect unusual images, objects, or patterns that deviate from normal data (e.g., identify unusual product variations, detect anomalous behavior, flag unexpected patterns), enabling anomaly detection workflows
- Quality Control: Identify defective or unusual items in manufacturing or production (e.g., detect product defects, identify quality issues, flag manufacturing anomalies), enabling quality control workflows
- Data Sampling: Identify interesting or unusual data points for manual review or further analysis (e.g., sample unusual images for labeling, identify edge cases for model improvement, select interesting data for analysis), enabling intelligent data sampling workflows
- Change Detection: Detect when data patterns change significantly from historical norms (e.g., detect scene changes, identify pattern shifts, flag significant variations), enabling change detection workflows
- Model Monitoring: Monitor model performance by detecting when embeddings deviate from training distribution (e.g., detect distribution shift, identify out-of-distribution data, monitor model drift), enabling model monitoring workflows
- Content Filtering: Identify unusual or inappropriate content that differs from expected patterns (e.g., detect unusual content, flag inappropriate material, identify content anomalies), enabling content filtering workflows
Connecting to Other Blocks¶
This block receives embeddings and produces is_outlier, percentile, and warming_up outputs:
- After embedding model blocks (CLIP, Perception Encoder, etc.) to analyze embedding outliers (e.g., identify outliers from CLIP embeddings, analyze Perception Encoder outliers, detect anomalies from embeddings), enabling embedding-to-outlier workflows
- After classification or detection blocks with embeddings to identify unusual predictions (e.g., identify unusual detections, flag anomalous classifications, detect outlier predictions), enabling prediction-to-outlier workflows
- Before logic blocks like Continue If to make decisions based on outlier detection (e.g., continue if outlier detected, filter based on outlier status, trigger actions on anomalies), enabling outlier-based decision workflows
- Before notification blocks to alert on outlier detection (e.g., alert on anomalies, notify about unusual data, trigger alerts on outliers), enabling outlier-based notification workflows
- Before data storage blocks to record outlier information (e.g., log outlier data, store anomaly statistics, record unusual data points), enabling outlier data logging workflows
- In quality control pipelines where outlier detection is part of quality assurance (e.g., filter outliers in quality pipelines, identify issues in production workflows, detect problems in processing chains), enabling quality control workflows
Requirements¶
This block requires embeddings as input (typically from embedding model blocks like CLIP or Perception Encoder). The block maintains internal state across workflow executions, accumulating a sliding window of historical embeddings. During the warmup period (first warmup samples), no outliers are identified and the block returns is_outlier=False and percentile=0.5. After warmup, the block uses at least warmup embeddings (up to window_size embeddings) to establish statistical baselines. The threshold_percentile parameter (0.0-1.0) controls sensitivity - lower values (e.g., 0.01) detect only extreme outliers, while higher values (e.g., 0.1) detect more moderate deviations. The block works best with consistent embedding models and may need adjustment of threshold_percentile based on expected variation in your data.
Type identifier¶
Use the following identifier in step "type" field: roboflow_core/identify_outliers@v1to add the block as
as step in your workflow.
Properties¶
| Name | Type | Description | Refs |
|---|---|---|---|
name |
str |
Unique name of step in workflows. | ❌ |
threshold_percentile |
float |
Percentile threshold for outlier detection, range 0.0-1.0. Embeddings below this percentile or above (1 - threshold_percentile) are flagged as outliers. Lower values (e.g., 0.01) detect only extreme outliers - very strict. Higher values (e.g., 0.1) detect more moderate deviations - more sensitive. Default 0.05 means bottom 5% and top 5% are outliers. Adjust based on expected variation in your data.. | ✅ |
warmup |
int |
Number of initial data points required before outlier detection begins. During warmup, no outliers are identified (is_outlier=False) to allow baseline establishment. Must be at least 2 for statistical analysis. Typical range: 3-100 samples. Higher values provide more stable baselines but delay outlier detection. Lower values enable faster detection but may be less accurate initially.. | ✅ |
window_size |
int |
Maximum number of historical embeddings to maintain in sliding window. The block keeps the most recent window_size embeddings for statistical comparison. When exceeded, oldest embeddings are removed (FIFO). Larger windows provide more stable statistics but adapt slower to distribution changes. Smaller windows adapt faster but may be less stable. Set to None for unlimited window (uses all historical data). Typical range: 10-100 embeddings.. | ✅ |
The Refs column marks possibility to parametrise the property with dynamic values available
in workflow runtime. See Bindings for more info.
Runtime compatibility¶
-
soft— runtimehosted_serverless,dedicated_deployment; executionremote; inputvideo - Block keeps per-video state in process memory (keyed by video_metadata.video_identifier). With remote step execution on stateless or multi-replica HTTP runtimes, successive requests may be served by different worker processes, so the state resets between calls and the output is meaningless for tracking / counting / aggregation. Use local step execution in an InferencePipeline for stable cross-frame results.
-
soft— inputimage - Block depends on temporal context from video or repeated-frame workflows. With a still image/photo, there is no meaningful history to track, compare, aggregate, or visualize, so the block provides little or no benefit.
Available Connections¶
Compatible Blocks
Check what blocks you can connect to Identify Outliers in version v1.
- inputs:
Line Counter,Perception Encoder Embedding Model,Line Counter,Distance Measurement,Image Stack,CLIP Embedding Model,Detection Event Log,Clip Comparison,SIFT Comparison,Detections Consensus,Identify Outliers,SIFT Comparison,Perspective Correction,Identify Changes,Image Contours,Template Matching,Pixel Color Count - outputs:
Stability AI Image Generation,Trace Visualization,Image Stack,Per-Class Confidence Filter,Icon Visualization,SIFT Comparison,Color Visualization,Single-Label Classification Model,Perspective Correction,Corner Visualization,Roboflow Custom Metadata,Halo Visualization,Dynamic Zone,Keypoint Detection Model,Email Notification,Halo Visualization,Object Detection Model,Background Color Visualization,Ellipse Visualization,Email Notification,Twilio SMS/MMS Notification,Text Display,Polygon Visualization,Crop Visualization,Template Matching,Model Monitoring Inference Aggregator,Relative Static Crop,Motion Detection,Heatmap Visualization,Blur Visualization,Instance Segmentation Model,Stability AI Outpainting,YOLO-World Model,Keypoint Visualization,Webhook Sink,Byte Tracker,Stitch Images,Detections List Roll-Up,Google Gemini,Slack Notification,Triangle Visualization,Time in Zone,Detections Stabilizer,Multi-Label Classification Model,Keypoint Detection Model,Roboflow Asset Library Attributes,Polygon Zone Visualization,Image Slicer,Time in Zone,Line Counter Visualization,Semantic Segmentation Model,Multi-Label Classification Model,Camera Calibration,ByteTrack Tracker,Microsoft SQL Server Sink,Twilio SMS Notification,Roboflow Vision Events,PTZ Tracking (ONVIF),Stability AI Inpainting,Classification Label Visualization,Event Writer,Mask Visualization,Byte Tracker,Reference Path Visualization,Image Slicer,Label Visualization,Identify Outliers,Byte Tracker,OPC UA Writer Sink,Dot Visualization,Identify Changes,Dynamic Crop,Detections Stitch,Circle Visualization,BoT-SORT Tracker,Gaze Detection,Segment Anything 2 Model,Single-Label Classification Model,Overlap Analysis,Object Detection Model,SAM 3 Interactive,Detections Consensus,Bounding Box Visualization,Multi-Label Classification Model,Instance Segmentation Model,Roboflow Dataset Upload,SAM 3,Instance Segmentation Model,Pixelate Visualization,Keypoint Detection Model,Instance Segmentation Model,Roboflow Dataset Upload,PLC Writer,SORT Tracker,Track Class Lock,Object Detection Model,Time in Zone,MQTT Writer,Polygon Visualization,OC-SORT Tracker,SAM 3,Model Comparison Visualization,Single-Label Classification Model
Input and Output Bindings¶
The available connections depend on its binding kinds. Check what binding kinds
Identify Outliers in version v1 has.
Bindings
-
input
embedding(embedding): Embedding vector representing the current data point's features. Typically from embedding models like CLIP or Perception Encoder. The embedding is normalized to unit length for directional statistical analysis using von Mises-Fisher distribution. Must be a numerical vector of any dimension..threshold_percentile(float_zero_to_one): Percentile threshold for outlier detection, range 0.0-1.0. Embeddings below this percentile or above (1 - threshold_percentile) are flagged as outliers. Lower values (e.g., 0.01) detect only extreme outliers - very strict. Higher values (e.g., 0.1) detect more moderate deviations - more sensitive. Default 0.05 means bottom 5% and top 5% are outliers. Adjust based on expected variation in your data..warmup(integer): Number of initial data points required before outlier detection begins. During warmup, no outliers are identified (is_outlier=False) to allow baseline establishment. Must be at least 2 for statistical analysis. Typical range: 3-100 samples. Higher values provide more stable baselines but delay outlier detection. Lower values enable faster detection but may be less accurate initially..window_size(integer): Maximum number of historical embeddings to maintain in sliding window. The block keeps the most recent window_size embeddings for statistical comparison. When exceeded, oldest embeddings are removed (FIFO). Larger windows provide more stable statistics but adapt slower to distribution changes. Smaller windows adapt faster but may be less stable. Set to None for unlimited window (uses all historical data). Typical range: 10-100 embeddings..
-
output
is_outlier(boolean): Boolean flag.percentile(float_zero_to_one):floatvalue in range[0.0, 1.0].warming_up(boolean): Boolean flag.
Example JSON definition of step Identify Outliers in version v1
{
"name": "<your_step_name_here>",
"type": "roboflow_core/identify_outliers@v1",
"embedding": "$steps.clip.embedding",
"threshold_percentile": 0.05,
"warmup": 3,
"window_size": 32
}