Skip to main content
BI4ALL BI4ALL
  • Expertise
    • Artificial Intelligence
    • Data Strategy & Governance
    • Data Visualisation
    • Low Code & Automation
    • Modern BI & Big Data
    • R&D Software Engineering
    • PMO, BA & UX/ UI Design
  • Knowledge Centre
    • Blog
    • Industry
    • Customer Success
    • Tech Talks
  • About Us
    • Board
    • History
    • Partners
    • Awards
    • Media Centre
  • Careers
  • Contacts
English
GermanPortuguês
Last Page:
    Knowledge Center
  • Multimodal predictive models: A practical approach in medicine and education

Multimodal predictive models: A practical approach in medicine and education

Página Anterior: Blog
  • Knowledge Center
  • Blog
  • Fabric: nova plataforma de análise de dados
1 Junho 2023

Fabric: nova plataforma de análise de dados

Placeholder Image Alt
  • Knowledge Centre
  • Multimodal predictive models: A practical approach in medicine and education
22 May 2025

Multimodal predictive models: A practical approach in medicine and education

Multimodal predictive models: A practical approach in medicine and education

Key takeways

HAIM improves medical predictions by combining diverse patient data.

MuDoC enhances learning by merging text and visuals.

Multimodal AI still faces technical, privacy, and interpretability challenges.

In the first part of this article, we explored the fundamentals of Multimodal Artificial Intelligence: what it is, how it works, and the main methods of data fusion. We looked at how multimodal systems can integrate different modalities — such as text, image, speech, or biometric signals — to create richer, more accurate, and more contextualised interactions. We also examined the technical challenges and advantages of this approach, which aims to bring AI’s perception and response capabilities closer to the way humans interact with the world.

In this second part, we will explore the role of multimodal AI in healthcare and education. Both sectors are being progressively transformed by AI, which is improving patient care and personalising learning. We will examine two examples to understand the benefits and challenges of implementing multimodal AI in these fields.

 

Use Case 1: Multimodal AI in Healthcare

Healthcare produces huge and diverse data in different formats, such as medical images, clinical notes, lab tests, and patient records. The combination of these diverse data types can offer the potential for a more holistic view of patient condition. Multimodal AI engines are engineered to digest and integrate these multiple data sources, resulting in better diagnoses and individualized treatment plans.

One example of how multimodal AI is used in healthcare is the Holistic AI in Medicine (HAIM) framework. HAIM simulates a diverse set with different types of data (e.g., EHR, medical imaging, and clinical notes) to increase predictive model learning in healthcare. By integrating these three types of datasets, HAIM has shown better results for various tasks, including disease identification and patient outcome prediction. The average percent improvement of all multimodal HAIM predictive systems is 9–28% across all evaluated tasks (Integrated multimodal artificial intelligence framework for healthcare applications | npj Digital Medicine)

HAIM combines data from multiple sources to create comprehensive patient profiles. Each profile includes structured data like demographics, lab results, and medication records; time-series data such as vital signs and other chronological measurements; unstructured text like clinical notes and reports; and medical images, including chest X-rays and associated imaging data. Each data type is processed separately to create numerical representations, known as embeddings:​

  • Structured data is normalized and transformed into numerical values.​
  • Time-series data is analyzed using statistical metrics to represent trends over time.​
  • Text data is processed using pre-trained transformer models to produce fixed-size embeddings.​
  • Image data is analyzed with pre-trained convolutional neural networks to extract feature embeddings.

The individual embeddings from each modality are concatenated to form a comprehensive fusion embedding. This unified representation serves as input for predictive models, such as XGBoost, to perform tasks like disease diagnosis and patient outcome prediction.

Fig. 1: Integrated multimodal artificial intelligence framework for healthcare applications | npj Digital Medicine

 

Benefits:

  • Integrates diverse data modalities, creating more comprehensive patient profiles.
  • Consistently outperforms single-modality models, with improvements of 9% to 28% in healthcare tasks.
  • Supports various applications, including disease diagnosis and patient outcome prediction.
  • Modular design enables the addition of new data types, enhancing adaptability and scalability in clinical settings.

Challenges:

  • Requires sophisticated preprocessing and normalization to ensure compatibility across diverse data types.
  • Computational complexity can be resource-intensive, raising scalability concerns.
  • Demands stringent privacy and data security measures due to sensitive patient information.
  • Model interpretability remains challenging, affecting clinical trust and adoption.

 

Use Case 2: Multimodal AI in Education

Education is a natural fit for multimodal AI because learning materials often include a mix of text, images, graphs, and diagrams. Traditional educational AI tools have mainly worked with text, but by incorporating other forms of content (visual and interactive elements), multimodal systems can better reflect how humans learn. This results in more engaging and effective educational experiences that are tailored to diverse learning styles.

One of the most promising examples of this approach is the MuDoC system (Multimodal Document-grounded Conversational AI). MuDoC is designed to support learners by combining natural language processing and computer vision to analyse educational materials, including written text and visual elements. When a student asks a question, the system doesn’t just respond with plain text. Instead, it scans the source material, retrieves the relevant section, and provides a response that integrates the necessary text and images from the original document. This helps learners build stronger mental models and verify the AI’s answers directly in the learning materials, building transparency and trust.

Technically, MuDoC uses a language model (like GPT-4o) to process and generate natural language answers. At the same time, it applies computer vision techniques to parse visual content (such as diagrams, figures, and illustrations) embedded in learning documents. The system maps these different content types into a unified representation that allows it to select and combine them contextually. This process results in rich, grounded answers that go beyond what purely text-based AI systems can deliver. It creates a dynamic learning assistant that not only explains but also shows, supporting better understanding of complex subjects.

Fig 2. [2504.13884] Towards a Multimodal Document-grounded Conversational AI System for Education

 

Benefits:

  • Gets students interested and involved by making learning engaging.
  • Combines words and pictures to communicate information effectively.
  • Simplifies complex concepts like physics, biology, and math through visuals.
  • Enhances trust with clear visibility of answer sources.
  • Encourages deeper learning by inspiring curiosity.

Challenges:

  • Aligning words and pictures perfectly can be challenging, and mismatched visuals can cause confusion.
  • Ensuring accessibility for all students, including those with visual or learning difficulties, is essential.
  • Managing the simultaneous use of words and pictures requires significant computing power.

 

Conclusion

In summary, multimodal AI is transforming how machines understand and interact with the world by combining data from multiple sources like text, images, speech, and time-series signals.

The HAIM framework leverages this approach to create comprehensive patient profiles in healthcare, achieving performance improvements, including disease diagnosis and outcome prediction. However, it faces challenges like the need for sophisticated data preprocessing, high computational demands, stringent privacy measures, and model interpretability, which are critical for clinical trust and scalability.

Similarly, in education, the MuDoC system uses multimodal AI to enhance student engagement, making learning more accessible and understandable through a combination of words and images. Yet, it must overcome challenges in aligning text and visuals accurately, ensuring accessibility for all learners, and managing high computational requirements.

As seen in the HAIM framework and in the MuDoC system, this approach enables more accurate predictions, deeper insights, and better user experiences. While challenges remain, the potential of multimodal AI to enhance decision-making, personalize experiences, and align more closely with human communication makes it a vital direction for the future of artificial intelligence.

Author

Marta Carreira

Marta Carreira

Consultant

Share

Suggested Content

Optimising Performance in Microsoft Fabric Without Exceeding Capacity Limits Blog

Optimising Performance in Microsoft Fabric Without Exceeding Capacity Limits

Microsoft Fabric performance can be optimised through parallelism limits, scaling, workload scheduling, and monitoring without breaching capacity limits.

Metadata Frameworks in Microsoft Fabric: YAML Deployments (Part 3) Blog

Metadata Frameworks in Microsoft Fabric: YAML Deployments (Part 3)

YAML deployments in Microsoft Fabric use Azure DevOps for validation, environment structure, and pipelines with approvals, ensuring consistency.

Metadata Frameworks in Microsoft Fabric: Logging with Eventhouse (Part 2) Blog

Metadata Frameworks in Microsoft Fabric: Logging with Eventhouse (Part 2)

Logging in Microsoft Fabric with Eventhouse ensures centralised visibility and real-time analysis of pipelines, using KQL for scalable ingestion.

Simplifying Metadata Frameworks in Microsoft Fabric with YAML Blog

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Simplify metadata-driven frameworks in Microsoft Fabric with YAML to gain scalability, readability, and CI/CD integration.

Analytical solution in Fabric to ensure Scalability, Single Source of Truth, and Autonomy Use Cases

Analytical solution in Fabric to ensure Scalability, Single Source of Truth, and Autonomy

The new Microsoft Fabric-based analytics architecture ensured data integration, reliability, and scalability, enabling analytical autonomy and readiness for future demands.

Applications of Multimodal Models | BI4ALL Talks Tech Talks

Applications of Multimodal Models | BI4ALL Talks

video title

Lets Start

Got a question? Want to start a new project?
Contact us

Menu

  • Expertise
  • Knowledge Centre
  • About Us
  • Careers
  • Contacts

Newsletter

Keep up to date and drive success with innovation
Newsletter

2025 All rights reserved

Privacy and Data Protection Policy Information Security Policy
URS - ISO 27001
URS - ISO 27701
Cookies Settings

BI4ALL may use cookies to memorise your login data, collect statistics to optimise the functionality of the website and to carry out marketing actions based on your interests.
You can customise the cookies used in .

Cookies options

Estes cookies são essenciais para fornecer serviços disponíveis no nosso site e permitir que possa usar determinados recursos no nosso site. Sem estes cookies, não podemos fornecer certos serviços no nosso site.

Estes cookies são usados para fornecer uma experiência mais personalizada no nosso site e para lembrar as escolhas que faz ao usar o nosso site.

Estes cookies são usados para reconhecer visitantes quando voltam ao nosso site. Isto permite-nos personalizar o conteúdo do site para si, cumprimentá-lo pelo nome e lembrar as suas preferências (por exemplo, a sua escolha de idioma ou região).

Estes cookies são usados para proteger a segurança do nosso site e dos seus dados. Isto inclui cookies que são usados para permitir que faça login em áreas seguras do nosso site.

Estes cookies são usados para coletar informações para analisar o tráfego no nosso site e entender como é que os visitantes estão a usar o nosso site. Por exemplo, estes cookies podem medir fatores como o tempo despendido no site ou as páginas visitadas, isto vai permitir entender como podemos melhorar o nosso site para os utilizadores. As informações coletadas por meio destes cookies de medição e desempenho não identificam nenhum visitante individual.

Estes cookies são usados para fornecer anúncios mais relevantes para si e para os seus interesses. Também são usados para limitar o número de vezes que vê um anúncio e para ajudar a medir a eficácia de uma campanha publicitária. Podem ser colocados por nós ou por terceiros com a nossa permissão. Lembram que já visitou um site e estas informações são partilhadas com outras organizações, como anunciantes.

Política de Privacidade