Statistical Approach for Outlier Detection

I developed a simple yet effective algorithm for identifying outliers in a daily-updated database containing thousands of historical item-level records. Without resorting to machine learning or deep learning, the method uses each time series' median and, through iteration, sorts the results by financial variation in descending order to highlight critical outliers. I then plot the most relevant transaction trends. This solution has become one of the most useful and easily implementable tools in my weekly routine. For this article, I manually created a fictional dataset to simulate the results

Code Description

I used Google Colab to load an Excel file. The code allows me to upload the dataset directly through the interface and displays the name of the selected file

Figure 1. File import process.

Next, I sorted the accounting periods in chronological order. For each month, I calculated the median cost per item using only historical data, then compared these values with current month costs. I filtered only items with costs above the historical median, computed the absolute variation, and finally sorted the results to highlight the top 30 deviations, creating a prioritized list of financial outliers.

Figure 2. Iterative Median-Based Outlier Detection Algorithm.

Finally, for each of the top 10 outliers, I plotted a time-series chart comparing actual monthly costs against the rolling historical median (progressively calculated with each new period). I used solid lines for observed costs and dashed lines for median values. This visualization clearly highlights when and how costs exceeded historical patterns, revealing either seasonal trends or abrupt spikes.

Figure 3. Time series visualization of cost outliers.

The four main time series demonstrate the algorithm's effectiveness: each plot clearly shows outliers as sharp deviations from historical medians, highlighting both isolated spikes and anomalous trends. This immediate visualization of problem areas validates our approach and focuses analysis on the most critical transactions.

Figure 4. Example time series with critical outliers

Conclusion

This project demonstrates how a simple solution based on medians and progressive iterations can effectively identify financial outliers. The algorithm proves that with clear logic and minimal processing, historical data can be transformed into actionable insights, highlighting critical anomalies without requiring sophisticated tools or extensive computational resources.

0
Subscribe to my newsletter

Read articles from Bernardo Ribeiro de Moura directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Bernardo Ribeiro de Moura
Bernardo Ribeiro de Moura

Senior Data Analyst at Unimed Rio Preto, working with predictive models, cost optimization, and data-driven decision-making. Bachelor’s in Chemistry (UNESP), transitioning to Data Science (UNIVESP), combining science and technology to solve real-world problems. Specialized in Google Data Analytics. I write about predictive analysis, data visualization, and statistical modeling. Let’s exchange ideas on Python, SQL, and the impact of data in our daily lives!