Skip to main content
BI4ALL BI4ALL
  • Expertise
    • Artificial Intelligence
    • Data Strategy & Governance
    • Data Visualisation
    • Low Code & Automation
    • Modern BI & Big Data
    • R&D Software Engineering
    • PMO, BA & UX/ UI Design
  • Knowledge Centre
    • Blog
    • Industry
    • Customer Success
    • Tech Talks
  • About Us
    • Board
    • History
    • Sustainability
    • Awards
    • Media Centre
    • Partners
  • Careers
  • Contacts
English
Português
Last Page:
    Knowledge Center
  • Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations

Página Anterior: Blog
  • Knowledge Center
  • Blog
  • Fabric: nova plataforma de análise de dados
1 Junho 2023

Fabric: nova plataforma de análise de dados

Placeholder Image Alt
  • Knowledge Centre
  • Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations
2 September 2025

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations

Anomaly detection involves identifying unusual patterns or behaviours in data that deviate from what is expected. These deviations often signal critical events such as fraud, system malfunctions, or security breaches. Anomaly Detection is commonly used in various fields, like finance, cybersecurity, manufacturing, and healthcare, aiming to identify and act upon these irregularities proactively.

With the increasing digitisation of systems over the past years, enormous volumes of data have been generated and stored. These include time-series data from sensors, textual logs from applications and systems, and high-dimensional inputs like images or video streams. The growing availability of such data creates significant opportunities for Anomaly Detection. However, labelled anomalies remain scarce, as they are naturally rare, domain-specific, and time-consuming. Therefore, the effectiveness of traditional supervised learning approaches is limited, reinforcing the need for other viable solutions. Additionally, as these systems become more automated and integrated into decision-making pipelines, concerns about fairness, transparency, and accountability are gaining momentum.

Challenges in Anomaly Detection

Building effective anomaly detection systems presents several challenges. As mentioned, anomalies are typically rare, poorly defined, and may evolve, resulting in a lack of labelled examples for training. In many cases, labelled anomalies make up less than 1% of the data, leading to severe class imbalance. As a result, traditional supervised learning methods often struggle, facing difficulties in generalising and an increased risk of overfitting to the limited set of labelled anomalies.

Historically, Anomaly Detection began with rule-based systems and statistical thresholds. Today, machine learning approaches are the ones most used.

Supervised methods, such as decision trees or neural networks, require labelled registers of both normal and anomalous instances. Labelled data is used to train the model by allowing it to learn the underlying patterns, which are then applied to make predictions on unseen test data, as illustrated in the diagram below.

 

Unsupervised methods, on the other hand, focus on learning the patterns of normal behaviour and flagging any observations that differ significantly from those learned patterns as potential anomalies, as illustrated in the diagram below. Unlike supervised approaches, they do not require labelled data, where the model learns patterns from the training data and applies them to detect anomalies in unseen data. Techniques such as Isolation Forest, One-Class SVM, and clustering algorithms are commonly used as baselines for these applications. However, these methods often struggle to distinguish between truly anomalous behaviour and expected behaviour. This is especially challenging in complex or noisy datasets, where high variability and randomness make it difficult to establish a reliable definition of normal behaviour. The task becomes even more complicated when the specific types and patterns of anomalies are unknown beforehand, as the system must identify authentic anomalies without any prior examples to guide it.

 

Semi-supervised methods integrate aspects from both supervised and unsupervised techniques, leveraging a small amount of labelled data alongside a larger quantity of unlabelled data. Several semi-supervised techniques are available; however, only two are described in detail below.

1.Self-Training

Self-training is one of the simplest techniques for implementing semi-supervised learning. It starts with a small set of labelled data used to train an initial model. The model then generates predictions for the unlabelled data, and if it is sufficiently confident in some of those predictions, they are added to the labelled dataset. The model is retrained using this expanded dataset, and the process is repeated iteratively, either for a predefined number of N cycles, until no high-confidence predictions remain, or until all data records have been labelled.

N times, which can be predefined or when there are no more high-confidence predictions or even if all registers are labelled.

This approach can be practical when the model’s early predictions are accurate; however, if the model is overly confident in incorrect predictions, those errors can propagate and degrade performance over time. The diagram below illustrates a typical example of the self-training process. The first part demonstrates the initial training using a small, labelled dataset. In contrast, the second part represents the addition of confidently predicted labels, resulting in an expanded training set for the next iteration.

 

2.Co-Training

Co-training is a semi-supervised learning technique that uses multiple perspectives of the same data. Complementary feature sets are used, each one containing distinct and independent information relevant to the target variable.

The method begins with a small set of labelled data, similar to the self-training approach, and a larger set of unlabeled data. The data is split into two separate groups based on the features, each containing different and complementary information. Then, two separate models are trained using different feature sets, and during each iteration, both models predict labels for the unlabelled data. High-confidence predictions from one model are then added as labelled samples to the other model’s labelled set, enabling a cross-labelling strategy that allows both models to iteratively improve by using each other’s confident predictions. This process continues for an N number of times, until no further high-confidence predictions can be made or when all data has been labelled, as illustrated in the diagram below. Co-training is particularly effective when each feature set provides sufficient and independent information for classification, such as using webpage text and hyperlinks as distinct sources of data.

 

To support these core methods, some other techniques are often implemented to address situations where anomalies are scarce and unlabelled:

  • Resampling techniques such as SMOTE (Synthetic Minority Over-sampling Technique) help address class imbalance in supervised anomaly detection, where labelled anomalies are available but underrepresented.
  • Synthetic anomaly generation techniques, such as Generative Adversarial Networks (GANs) or simulation-based methods, can be used to create realistic synthetic anomalies. These are useful for augmenting training data or testing models in rare-event situations, enabling model training or evaluation in low-data scenarios.

While these techniques enhance the robustness of anomaly detection models, it’s essential to recognise that such systems are often applied in sensitive areas, such as finance, healthcare, and security. This outlines several key concerns, such as interpretability, which plays a vital role in machine learning. While complex models may deliver high performance, their “black-box” nature can make results difficult to explain. To address this, tools such as SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) are commonly used to help uncover how models make their decisions. Other concerns must also be considered, such as bias in training data, lack of transparency, and privacy issues, all of which can result in unfair or harmful outcomes. Ensuring responsible use requires thoughtful model design, rigorous validation, and continuous monitoring.

Looking ahead, Anomaly Detection is becoming increasingly more relevant, with growing attention on real-time detection, explainability, and cross-domain adaptation. In many contexts, such as security or predictive maintenance, detecting anomalies as they happen is critical. As a result, new solutions are emerging with a strong focus on real-time capabilities. At the same time, explainability is gaining even more importance, particularly in sensitive or regulated fields, where it’s essential to understand the key factors or features that contribute most to anomalies. This insight helps stakeholders to not only trust the detection results but also take targeted actions to address underlying issues. Alongside this, there is a growing interest in transfer learning, a technique in which a model trained on one task or dataset is adapted for a different, yet related, task. In the context of anomaly detection, this enables a model to learn patterns from a domain with abundant labelled data and apply that knowledge to a new domain where labelled anomalies are scarce. By transferring previously learned features, the model can improve detection performance in new environments while reducing the need for extensive manual labelling.

In the future, the focus will be on building faster, more transparent, and adaptive systems capable of handling multivariate data, reacting to it in real-time, and justifying their decisions.

 

Bibliography

IBM. (2023). semi-supervised-learning. Retrieved from www.ibm.com: https://www.ibm.com/think/topics/semi-supervised-learning

Matthew Lau, T.-Y. Z. (2025). Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies. arXiv. Retrieved from https://arxiv.org/abs/2506.13955

Milvus. (2025). what-is-the-future-of-anomaly-detection. Retrieved from milvus.io: https://milvus.io/ai-quick-reference/what-is-the-future-of-anomaly-detection

Author

Patrícia Morais

Patrícia Morais

Associate Consultant

Share

Suggested Content

Finsolutia: Accelerated portfolio analysis
Success Cases

Finsolutia: Accelerated portfolio analysis

The partnership between Finsolutia and BI4ALL by Plexus demonstrates how the application of artificial intelligence technologies to document analysis can redefine critical processes in the financial sector. 

Human–AI Partnerships: From Automation to Collaboration
Blog AI & Data Science

Human–AI Partnerships: From Automation to Collaboration

AI is no longer limited to executing predefined rules in the background. It is increasingly able to observe, decide and act with purpose, supporting workflows rather than isolated tasks.

Native Writeback in Power BI with Translytical Task Flows
Blog Data Visualisation

Native Writeback in Power BI with Translytical Task Flows

O Power BI tem vindo a distinguir-se como uma plataforma particularmente eficaz para modelação semântica, análise e visualização de dados, mas menos orientada para cenários de ação operacional no próprio contexto do relatório.

Optimising Report Creation through a Design System and Report Toolkit
Use Cases Data Visualisation

Optimising Report Creation through a Design System and Report Toolkit

BI4ALL implemented an approach based on a Design System and a Report Toolkit, designed to accelerate and standardise the report creation process.

Enable real-time data updates with a Write-Back solution in Power BI
Use Cases Data Visualisation

Enable real-time data updates with a Write-Back solution in Power BI

BI4ALL implemented a write-back solution integrated with Power BI, based on the PowerFlow Framework and supported by Power BI Transactional Task Flows. This approach enables business users to update critical data directly from Power BI reports.

Vision 2026: The complete overview of AI Trends
eBooks AI & Data Science

Vision 2026: The complete overview of AI Trends

This eBook brings together the key trends that will shape 2026, including intelligent agents, invisible AI, and physics.

video title

Lets Start

Got a question? Want to start a new project?
Contact us

Menu

  • Expertise
  • Knowledge Centre
  • About Us
  • Careers
  • Contacts

Newsletter

Keep up to date and drive success with innovation
Newsletter
PRR - Plano de Recuperação e Resiliência. Financiado pela União Europeia - NextGenerationEU

2026 All rights reserved

Privacy and Data Protection Policy Information Security Policy
URS - ISO 27001
URS - ISO 27701
Cookies Settings

BI4ALL may use cookies to memorise your login data, collect statistics to optimise the functionality of the website and to carry out marketing actions based on your interests.
You can customise the cookies used in .

Cookies options

These cookies are essential to provide services available on our website and to enable you to use certain features on our website. Without these cookies, we cannot provide certain services on our website.

These cookies are used to provide a more personalised experience on our website and to remember the choices you make when using our website.

These cookies are used to recognise visitors when they return to our website. This enables us to personalise the content of the website for you, greet you by name and remember your preferences (for example, your choice of language or region).

These cookies are used to protect the security of our website and your data. This includes cookies that are used to enable you to log into secure areas of our website.

These cookies are used to collect information to analyse traffic on our website and understand how visitors are using our website. For example, these cookies can measure factors such as time spent on the website or pages visited, which will allow us to understand how we can improve our website for users. The information collected through these measurement and performance cookies does not identify any individual visitor.

These cookies are used to deliver advertisements that are more relevant to you and your interests. They are also used to limit the number of times you see an advertisement and to help measure the effectiveness of an advertising campaign. They may be placed by us or by third parties with our permission. They remember that you have visited a website and this information is shared with other organisations, such as advertisers.

Política de Privacidade