Skip to main content
BI4ALL BI4ALL
  • Expertise
    • Artificial Intelligence
    • Data Strategy & Governance
    • Data Visualisation
    • Low Code & Automation
    • Modern BI & Big Data
    • R&D Software Engineering
    • PMO, BA & UX/ UI Design
  • Knowledge Centre
    • Blog
    • Industry
    • Customer Success
    • Tech Talks
  • About Us
    • Board
    • History
    • Partners
    • Awards
    • Media Centre
  • Careers
  • Contacts
English
GermanPortuguês
Last Page:
    Knowledge Center
  • Explainable AI: How to Make Machine Learning Models More Understandable

Explainable AI: How to Make Machine Learning Models More Understandable

Página Anterior: Blog
  • Knowledge Center
  • Blog
  • Fabric: nova plataforma de análise de dados
1 Junho 2023

Fabric: nova plataforma de análise de dados

Placeholder Image Alt
  • Knowledge Centre
  • Explainable AI: How to Make Machine Learning Models More Understandable
4 November 2024

Explainable AI: How to Make Machine Learning Models More Understandable

Explainable AI: How to Make Machine Learning Models More Understandable

Key takeways

Tools like SHAP, LIME and ELI5 help to explain forecasts, but contextualised interpretations are lacking.

Translating ML results into plain language improves decision-making in critical projects.

The interpretation of ML models is still complex for stakeholders without technical knowledge.

What if it were possible to read the outcomes of a Machine Learning (ML) model simply and within the context of each problem?

“White boxes” and “black boxes” are terms used to describe ML algorithms. “Black boxes” are highly complicated models that are challenging to understand, even for specialists in the field. On the other hand, “white boxes” represent simpler models that data scientists can understand. However, neither of these models is easily understandable to those outside the field, making it difficult for project stakeholders to understand and accept the models’ decisions.

To close this gap, the field of Explainable AI has been developed. It intends to enhance transparency and understanding of model decisions, ultimately leading to greater confidence and security when applying ML models in critical domains. (IBM, n.d.)

To aid in the interpretation of models, more and more Python libraries and techniques are becoming available. Currently, SHAP, LIME, and ELI5 are the most widely used. These Python libraries help us comprehend the significance of each variable for the predictions made. Interpreting the meaning of each contribution (positive vs. negative value, high vs. low values) and what they mean in the context of each problem/business remains challenging even with the existence of these libraries.

 

But what if these contributions could be translated into an appropriate language for each challenge, making them easier for people who do not know ML?

The approach entails merging a Large Language Model (LLM) with the LIME library to comprehend the model’s decisions better and explain each decision in the context of the challenge.

A LLM is a model that uses advanced natural language processing (NLP) algorithms to understand and generate text. These models are trained on large volumes of textual data, such as books and web articles, to learn how language is structured and how words and phrases relate to each other. A famous example of an LLM is the GPT (Generative Pre-trained Transformer). (Souza, 2023)

 

And how is this possible?

By combining the problem context with the results from the LIME library, we can input this information into an LLM prompt. This enables us to translate complex numerical contributions into simpler explanations that fit the specific context of each problem or business.

Let’s analyse an example to help you grasp this proposition a little better: a classification problem with an ML model that uses clinical factors such as blood pressure, glucose, insulin, BMI, diabetes pedigree function (the likelihood of acquiring diabetes based on family history), age, number of pregnancies, and skin thickness to predict whether a patient would develop diabetes.

Let’s see this individual:

Why did the ML model predict this person will not get diabetes?

This is how the LIME library tackles it. It is clear to us that the model indicates this person has a 63% chance of not getting diabetes. However, what does the 0.19 contribution for glucose levels below 96 mean? It becomes challenging to evaluate the results with just these values.

We obtained a report on the interpretation of the ML model using an LLM from OpenAI, the GPT-3.5 model and provided the theoretical context of the problem. In this case, the LLM was instructed that this was a classification problem to predict predisposition to diabetes and that the goal was to provide an interpretation of the results obtained. The obtained report was as follows:

“The model’s interpretation of these factors is as follows:

– Glucose <= 96.00: This has the highest negative contribution of -0.193, indicating that glucose levels below 96 mg/dL are associated with a lower risk of developing diabetes.

– 111.25 < Insulin <= 125.00: Insulin levels between 111.25 and 125.00 have a positive contribution of 0.073, suggesting that higher insulin levels increase the risk of developing diabetes.

– BMI > 37.52: A BMI greater than 37.52 has a positive contribution of 0.067, indicating that obesity is a risk factor for diabetes.

– DiabetesPedigreeFunction <= 0.25: A lower Diabetes Pedigree Function, with a contribution of -0.061, suggests a lower genetic predisposition to diabetes.

– Age between 23.00 and 31.00: This age range has a negative contribution of -0.041, indicating that younger age is associated with a lower risk of developing diabetes.

– Pregnancies between 3.00 and 6.75: This range has a negative contribution of -0.014, suggesting that having had between 3 and 6.75 pregnancies is associated with a lower risk of developing diabetes.

– 72.41 < BloodPressure <= 80.00: Blood pressure levels between 72.41 and 80.00 have a small positive contribution of 0.004, indicating that higher blood pressure may be a risk factor for diabetes.

– SkinThickness between 29.00 and 29.15: This range has a very small negative contribution of -0.001, suggesting that skin thickness in this range is not a significant risk factor for diabetes.

Overall, the model suggests that this patient has a moderate risk of developing diabetes, with the most significant risk factors being higher glucose levels, higher insulin levels, and obesity. However, the patient’s younger age, lower genetic predisposition, and lower number of pregnancies may be protective factors. It is important to note that these are statistical correlations and should be validated with medical advice. Nevertheless, the model’s contributions provide valuable insights into the factors that may be relevant for predicting diabetes risk in this patient.”

 

Through the LLM’s response, stakeholders can determine whether this prediction is clinically reasonable and comprehend what led the model to anticipate that the patient will not have diabetes.

This simplification in expressing the results eliminates the concept of “black boxes,” resulting in transparency and accountability in model usage. This allows all stakeholders to comprehend and use the results without concern.

The range of problems to which this solution can be used is increased by the fact that the LIME library can deal with image classifiers, regression, and classification problems, among other problem types.

In a world where Artificial Intelligence and Machine Learning are increasingly prominent solutions, the ability to interpret “black boxes” facilitates the development of ethical and responsible behaviour and the useful application of these technologies for the good of humanity.

BI4ALL aims to be at the forefront of this path. If you are interested in enhancing your business’s Data Science and Artificial Intelligence capabilities, we are available to discuss your project.

 

Bibliography

IBM. (n.d.). Retrieved from https://www.ibm.com/topics/explainable-ai

Souza, A. (2023, julho). Medium. Retrieved from https://medium.com/blog-do-zouza/tudo-o-que-voc%C3%AA-precisa-saber-sobre-llm-large-language-model-a36be85bbf8f

state, A. (n.d.). Retrieved from https://www.activestate.com/blog/white-box-vs-black-box-algorithms-in-machine-learning/

Author

Marta Carreira

Marta Carreira

Data Scientist Consultant

Share

Suggested Content

Data sovereignty: the strategic asset for businesses Blog

Data sovereignty: the strategic asset for businesses

In 2025, data sovereignty has become the new engine of competitiveness — turning massive volumes of information into innovation, efficiency, and strategic advantage.

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations Blog

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations

Anomaly Detection identifies unusual data patterns to prevent risks, using machine learning techniques

Optimising Performance in Microsoft Fabric Without Exceeding Capacity Limits Blog

Optimising Performance in Microsoft Fabric Without Exceeding Capacity Limits

Microsoft Fabric performance can be optimised through parallelism limits, scaling, workload scheduling, and monitoring without breaching capacity limits.

Metadata Frameworks in Microsoft Fabric: YAML Deployments (Part 3) Blog

Metadata Frameworks in Microsoft Fabric: YAML Deployments (Part 3)

YAML deployments in Microsoft Fabric use Azure DevOps for validation, environment structure, and pipelines with approvals, ensuring consistency.

Metadata Frameworks in Microsoft Fabric: Logging with Eventhouse (Part 2) Blog

Metadata Frameworks in Microsoft Fabric: Logging with Eventhouse (Part 2)

Logging in Microsoft Fabric with Eventhouse ensures centralised visibility and real-time analysis of pipelines, using KQL for scalable ingestion.

Simplifying Metadata Frameworks in Microsoft Fabric with YAML Blog

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Simplify metadata-driven frameworks in Microsoft Fabric with YAML to gain scalability, readability, and CI/CD integration.

video title

Lets Start

Got a question? Want to start a new project?
Contact us

Menu

  • Expertise
  • Knowledge Centre
  • About Us
  • Careers
  • Contacts

Newsletter

Keep up to date and drive success with innovation
Newsletter

2025 All rights reserved

Privacy and Data Protection Policy Information Security Policy
URS - ISO 27001
URS - ISO 27701
Cookies Settings

BI4ALL may use cookies to memorise your login data, collect statistics to optimise the functionality of the website and to carry out marketing actions based on your interests.
You can customise the cookies used in .

Cookies options

These cookies are essential to provide services available on our website and to enable you to use certain features on our website. Without these cookies, we cannot provide certain services on our website.

These cookies are used to provide a more personalised experience on our website and to remember the choices you make when using our website.

These cookies are used to recognise visitors when they return to our website. This enables us to personalise the content of the website for you, greet you by name and remember your preferences (for example, your choice of language or region).

These cookies are used to protect the security of our website and your data. This includes cookies that are used to enable you to log into secure areas of our website.

These cookies are used to collect information to analyse traffic on our website and understand how visitors are using our website. For example, these cookies can measure factors such as time spent on the website or pages visited, which will allow us to understand how we can improve our website for users. The information collected through these measurement and performance cookies does not identify any individual visitor.

These cookies are used to deliver advertisements that are more relevant to you and your interests. They are also used to limit the number of times you see an advertisement and to help measure the effectiveness of an advertising campaign. They may be placed by us or by third parties with our permission. They remember that you have visited a website and this information is shared with other organisations, such as advertisers.

Política de Privacidade