Skip to main content
BI4ALL BI4ALL
  • Expertise
    • Artificial Intelligence
    • Data Strategy & Governance
    • Data Visualisation
    • Low Code & Automation
    • Modern BI & Big Data
    • R&D Software Engineering
    • PMO, BA & UX/ UI Design
  • Knowledge Centre
    • Blog
    • Industry
    • Customer Success
    • Tech Talks
  • About Us
    • Board
    • History
    • Sustainability
    • Awards
    • Media Centre
    • Partners
  • Careers
  • Contacts
English
Português
Last Page:
    Knowledge Center
  • Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Página Anterior: Blog
  • Knowledge Center
  • Blog
  • Fabric: nova plataforma de análise de dados
1 Junho 2023

Fabric: nova plataforma de análise de dados

Placeholder Image Alt
  • Knowledge Centre
  • Simplifying Metadata Frameworks in Microsoft Fabric with YAML
2 September 2025

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Exploring YAML-based configuration, eventhouse logging, and CI/CD in Fabric data pipelines.

 

Why metadata-driven frameworks matter

If you’ve ever worked on a data engineering project, you’ve probably run into “configuration chaos.” Perhaps someone updated a config table without telling anyone. Perhaps a “quick fix” for Project A broke Project B. Or maybe — and let’s be honest here — you ran an UPDATE without a WHERE and suddenly had a very quiet (and very long) evening.

The truth is, managing pipeline configurations at scale is tricky. That’s precisely why metadata-driven frameworks exist.

At their core, they let you move away from hardcoding and towards flexibility. Instead of baking logic into every pipeline, you define what needs to run, where the data lives, where it should land, and how it should get there — all in metadata.

These frameworks can:

  • Parameterise tasks and data movements.
  • Track deltas and log execution history.
  • Define dependencies and error tolerance.

 

The problem with SQL tables

Traditionally, configurations live in SQL tables. In Microsoft Fabric, that usually means a warehouse or SQL database, with pipelines running orchestration.

This works… up to a point. But as projects and teams grow, cracks start to show:

  • Configuration tables are updated in production “just this once” (famous last words).
  • Version control is basically non-existent.
  • DevOps integration becomes painful.
  • Debugging config changes turns into detective work.

Soon, what started as a neat solution becomes hard to maintain.

 

A cleaner approach with YAML

This is where YAML comes in. YAML files are:

  • Human-readable → easy to scan, even in big projects.
  • Version-controlled → Git makes tracking changes trivial.
  • PySpark-friendly → parsing is straightforward.

Yes, indentation can be unforgiving (we’ve all been betrayed by an extra space), but the trade-offs are worth it.

In our Fabric projects, we use YAML to replace those config tables, aligning with a medallion architecture — Bronze, Silver, Gold — with distinct lakehouses for each layer. YAML acts as the blueprint that orchestrates how data moves through them.

 

Bronze layer configuration

Here’s a simple YAML definition for a bronze extraction:

 

What’s happening here?

  • Source → AdventureWorks SQL table Person.Person.
  • Destination → Bronze lakehouse, saved as Parquet.
  • Load method → delta loads based on ModifiedDate.
  • Executor → a pipeline (pl_copyFrom_adventureWorks2022).

Think of it as the project’s GPS: it knows where the data starts, where it’s going, and which vehicle (pipeline) is driving it there.

 

Silver layer configuration (dependencies)

Now let’s see how this looks in silver:

 

Here we see a few differences:

  • The load type is a merge into the Silver Lakehouse.
  • The artifact is a notebook (NB_Load_Silver).
  • There’s a dependency → it waits for bronze. Person to complete first.

So even in YAML, Bronze comes before Silver — no skipping ahead.

 

How orchestration works

All of this comes together in an orchestration notebook. Its job is simple (in concept):

  1. Read the YAML.
  2. Parse the tasks and dependencies.
  3. Build a DAG (directed acyclic graph).
  4. Execute tasks in the correct order.
  • Pipelines and other non-notebook artifacts are triggered via the Fabric API.
  • DAG is executed using notebookutils.runMultiple.

Essentially, the notebook acts as an air traffic controller for your data pipelines — calmly determining who takes off and when.

 

The benefits in practice

Moving configurations to YAML has clear benefits:

  • Scalability → easy to add new tasks as projects grow.
  • Maintainability → configs are structured, readable, and versioned.
  • Safer deployments → no more mystery updates to config tables in production.

It keeps your data projects clean, traceable, and easier to debug.

 

Logging and error handling

A metadata-driven framework isn’t complete without proper logging and error handling. In our approach, each task execution writes structured logs into a dedicated logging eventhouse. These logs capture:

  • Task name and layer (Bronze, Silver, Gold)
  • Execution timestamps (start, end, duration)
  • Row counts (processed, inserted, updated)
  • Status (success, failure)

In the event of failure, the framework records the exception details and automatically stops dependent tasks. Because the orchestration builds a DAG, dependencies won’t run if an upstream task fails — ensuring downstream layers aren’t polluted with incomplete data.

Error-handling rules can also be defined in YAML (for example, whether to retry a failed step or skip it with a warning). And once the root cause has been fixed, the framework can even resume execution from the exact task that failed, avoiding the need to reprocess everything from scratch.

 

DevOps deployment

Another key advantage of using YAML is smooth integration with DevOps pipelines. Since all configurations live in files, they:

  • Travel with the code → stored in Git repos, reviewed via pull requests.
  • Deploy consistently → Fabric deployment pipelines can move YAML from Dev to Test to Prod environments with confidence.
  • Support approvals and rollbacks → if something goes wrong, reverting to a previous YAML version is as simple as rolling back a commit.

This means teams can manage pipeline configurations just like application code – with proper CI/CD practices instead of ad-hoc table edits.

 

Wrapping up

Metadata-driven frameworks have always been designed to simplify data pipeline management. Using YAML in Microsoft Fabric takes that idea further — combining the flexibility of metadata with the safety of version control, structured logging, and modern DevOps deployment practices.

So, the next time someone asks where the configs live, you won’t have to point them to a half-forgotten SQL table. You can show them a YAML file in source control, trace logs in a dedicated eventhouse, and a clean deployment pipeline that moves everything from Dev to Prod without surprises.

This is just the start. In upcoming posts, I’ll go deeper into how logging is structured, how error recovery works, and how DevOps pipelines were set up for YAML deployment. Stay tuned!

Author

Rui Francisco Gonçalves

Rui Francisco Gonçalves

Senior Specialist

Share

Suggested Content

The real bottleneck in Agentic AI isn’t data. It’s context
Blog AI & Data Science

The real bottleneck in Agentic AI isn’t data. It’s context

A IA Agentic não irá escalar até que as organizações deixem de tratar o contexto como um conjunto de tabelas de metadados e passem a encará-lo pelo que realmente é: uma camada de controlo para significado, política, identidade e verdade.

HUMAN–AI Alliance Agents | BI4ALL Talks
Tech Talks AI & Data Science

HUMAN–AI Alliance Agents | BI4ALL Talks

Fabric Model Analyzer: Entreprise-scale best practices monitoring
Blog Data Visualisation

Fabric Model Analyzer: Entreprise-scale best practices monitoring

Fabric Model Analyzer brings the traditional Best Practice Analyzer into a modern, integrated, cloud-native solution built on Microsoft Fabric — scalable to enterprise level.

Finsolutia: Accelerated portfolio analysis
Success Cases

Finsolutia: Accelerated portfolio analysis

The partnership between Finsolutia and BI4ALL by Plexus demonstrates how the application of artificial intelligence technologies to document analysis can redefine critical processes in the financial sector. 

Human–AI Partnerships: From Automation to Collaboration
Blog AI & Data Science

Human–AI Partnerships: From Automation to Collaboration

AI is no longer limited to executing predefined rules in the background. It is increasingly able to observe, decide and act with purpose, supporting workflows rather than isolated tasks.

Native Writeback in Power BI with Translytical Task Flows
Blog Data Visualisation

Native Writeback in Power BI with Translytical Task Flows

O Power BI tem vindo a distinguir-se como uma plataforma particularmente eficaz para modelação semântica, análise e visualização de dados, mas menos orientada para cenários de ação operacional no próprio contexto do relatório.

video title

Lets Start

Got a question? Want to start a new project?
Contact us

Menu

  • Expertise
  • Knowledge Centre
  • About Us
  • Careers
  • Contacts

Newsletter

Keep up to date and drive success with innovation
Newsletter
PRR - Plano de Recuperação e Resiliência. Financiado pela União Europeia - NextGenerationEU

2026 All rights reserved

Privacy and Data Protection Policy Information Security Policy
URS - ISO 27001
URS - ISO 27701
Cookies Settings

BI4ALL may use cookies to memorise your login data, collect statistics to optimise the functionality of the website and to carry out marketing actions based on your interests.
You can customise the cookies used in .

Cookies options

These cookies are essential to provide services available on our website and to enable you to use certain features on our website. Without these cookies, we cannot provide certain services on our website.

These cookies are used to provide a more personalised experience on our website and to remember the choices you make when using our website.

These cookies are used to recognise visitors when they return to our website. This enables us to personalise the content of the website for you, greet you by name and remember your preferences (for example, your choice of language or region).

These cookies are used to protect the security of our website and your data. This includes cookies that are used to enable you to log into secure areas of our website.

These cookies are used to collect information to analyse traffic on our website and understand how visitors are using our website. For example, these cookies can measure factors such as time spent on the website or pages visited, which will allow us to understand how we can improve our website for users. The information collected through these measurement and performance cookies does not identify any individual visitor.

These cookies are used to deliver advertisements that are more relevant to you and your interests. They are also used to limit the number of times you see an advertisement and to help measure the effectiveness of an advertising campaign. They may be placed by us or by third parties with our permission. They remember that you have visited a website and this information is shared with other organisations, such as advertisers.

Política de Privacidade