Skip to main content
BI4ALL BI4ALL
  • Expertise
    • Artificial Intelligence
    • Data Strategy & Governance
    • Data Visualisation
    • Low Code & Automation
    • Modern BI & Big Data
    • R&D Software Engineering
    • PMO, BA & UX/ UI Design
  • Knowledge Centre
    • Blog
    • Industry
    • Customer Success
    • Tech Talks
  • About Us
    • Board
    • History
    • Partners
    • Sustainability
    • Awards
    • Media Centre
  • Careers
  • Contacts
English
Português
Last Page:
    Knowledge Center
  • Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Página Anterior: Blog
  • Knowledge Center
  • Blog
  • Fabric: nova plataforma de análise de dados
1 Junho 2023

Fabric: nova plataforma de análise de dados

Placeholder Image Alt
  • Knowledge Centre
  • Simplifying Metadata Frameworks in Microsoft Fabric with YAML
2 September 2025

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Exploring YAML-based configuration, eventhouse logging, and CI/CD in Fabric data pipelines.

 

Why metadata-driven frameworks matter

If you’ve ever worked on a data engineering project, you’ve probably run into “configuration chaos.” Perhaps someone updated a config table without telling anyone. Perhaps a “quick fix” for Project A broke Project B. Or maybe — and let’s be honest here — you ran an UPDATE without a WHERE and suddenly had a very quiet (and very long) evening.

The truth is, managing pipeline configurations at scale is tricky. That’s precisely why metadata-driven frameworks exist.

At their core, they let you move away from hardcoding and towards flexibility. Instead of baking logic into every pipeline, you define what needs to run, where the data lives, where it should land, and how it should get there — all in metadata.

These frameworks can:

  • Parameterise tasks and data movements.
  • Track deltas and log execution history.
  • Define dependencies and error tolerance.

 

The problem with SQL tables

Traditionally, configurations live in SQL tables. In Microsoft Fabric, that usually means a warehouse or SQL database, with pipelines running orchestration.

This works… up to a point. But as projects and teams grow, cracks start to show:

  • Configuration tables are updated in production “just this once” (famous last words).
  • Version control is basically non-existent.
  • DevOps integration becomes painful.
  • Debugging config changes turns into detective work.

Soon, what started as a neat solution becomes hard to maintain.

 

A cleaner approach with YAML

This is where YAML comes in. YAML files are:

  • Human-readable → easy to scan, even in big projects.
  • Version-controlled → Git makes tracking changes trivial.
  • PySpark-friendly → parsing is straightforward.

Yes, indentation can be unforgiving (we’ve all been betrayed by an extra space), but the trade-offs are worth it.

In our Fabric projects, we use YAML to replace those config tables, aligning with a medallion architecture — Bronze, Silver, Gold — with distinct lakehouses for each layer. YAML acts as the blueprint that orchestrates how data moves through them.

 

Bronze layer configuration

Here’s a simple YAML definition for a bronze extraction:

 

What’s happening here?

  • Source → AdventureWorks SQL table Person.Person.
  • Destination → Bronze lakehouse, saved as Parquet.
  • Load method → delta loads based on ModifiedDate.
  • Executor → a pipeline (pl_copyFrom_adventureWorks2022).

Think of it as the project’s GPS: it knows where the data starts, where it’s going, and which vehicle (pipeline) is driving it there.

 

Silver layer configuration (dependencies)

Now let’s see how this looks in silver:

 

Here we see a few differences:

  • The load type is a merge into the Silver Lakehouse.
  • The artifact is a notebook (NB_Load_Silver).
  • There’s a dependency → it waits for bronze. Person to complete first.

So even in YAML, Bronze comes before Silver — no skipping ahead.

 

How orchestration works

All of this comes together in an orchestration notebook. Its job is simple (in concept):

  1. Read the YAML.
  2. Parse the tasks and dependencies.
  3. Build a DAG (directed acyclic graph).
  4. Execute tasks in the correct order.
  • Pipelines and other non-notebook artifacts are triggered via the Fabric API.
  • DAG is executed using notebookutils.runMultiple.

Essentially, the notebook acts as an air traffic controller for your data pipelines — calmly determining who takes off and when.

 

The benefits in practice

Moving configurations to YAML has clear benefits:

  • Scalability → easy to add new tasks as projects grow.
  • Maintainability → configs are structured, readable, and versioned.
  • Safer deployments → no more mystery updates to config tables in production.

It keeps your data projects clean, traceable, and easier to debug.

 

Logging and error handling

A metadata-driven framework isn’t complete without proper logging and error handling. In our approach, each task execution writes structured logs into a dedicated logging eventhouse. These logs capture:

  • Task name and layer (Bronze, Silver, Gold)
  • Execution timestamps (start, end, duration)
  • Row counts (processed, inserted, updated)
  • Status (success, failure)

In the event of failure, the framework records the exception details and automatically stops dependent tasks. Because the orchestration builds a DAG, dependencies won’t run if an upstream task fails — ensuring downstream layers aren’t polluted with incomplete data.

Error-handling rules can also be defined in YAML (for example, whether to retry a failed step or skip it with a warning). And once the root cause has been fixed, the framework can even resume execution from the exact task that failed, avoiding the need to reprocess everything from scratch.

 

DevOps deployment

Another key advantage of using YAML is smooth integration with DevOps pipelines. Since all configurations live in files, they:

  • Travel with the code → stored in Git repos, reviewed via pull requests.
  • Deploy consistently → Fabric deployment pipelines can move YAML from Dev to Test to Prod environments with confidence.
  • Support approvals and rollbacks → if something goes wrong, reverting to a previous YAML version is as simple as rolling back a commit.

This means teams can manage pipeline configurations just like application code – with proper CI/CD practices instead of ad-hoc table edits.

 

Wrapping up

Metadata-driven frameworks have always been designed to simplify data pipeline management. Using YAML in Microsoft Fabric takes that idea further — combining the flexibility of metadata with the safety of version control, structured logging, and modern DevOps deployment practices.

So, the next time someone asks where the configs live, you won’t have to point them to a half-forgotten SQL table. You can show them a YAML file in source control, trace logs in a dedicated eventhouse, and a clean deployment pipeline that moves everything from Dev to Prod without surprises.

This is just the start. In upcoming posts, I’ll go deeper into how logging is structured, how error recovery works, and how DevOps pipelines were set up for YAML deployment. Stay tuned!

Author

Rui Francisco Gonçalves

Rui Francisco Gonçalves

Senior Specialist

Share

Suggested Content

Vision 2026: The complete overview of AI Trends eBooks

Vision 2026: The complete overview of AI Trends

This eBook brings together the key trends that will shape 2026, including intelligent agents, invisible AI, and physics.

The Role of Data Governance in Building a Data-Enabled Organisation Blog

The Role of Data Governance in Building a Data-Enabled Organisation

Data governance is the backbone of a truly data-enabled organisation, turning data into a trusted, secure, and strategic asset that accelerates insight and innovation.

Enable Digital Transformation through Data Democratisation Use Cases

Enable Digital Transformation through Data Democratisation

The creation of a decentralised, domain-oriented data architecture has democratised access and improved data quality and governance.

The Data Catalogue: Turning Governance into a Strategic Control Plane Blog

The Data Catalogue: Turning Governance into a Strategic Control Plane

The Data Catalogue transforms Data Governance into a strategic, automated system that connects people, data, and policies to build lasting trust and value.

Strengthening Competitiveness Through Data Strategy and Governance Use Cases

Strengthening Competitiveness Through Data Strategy and Governance

The definition and implementation of a data governance strategy and model enabled data to be aligned with business objectives, ensuring compliance and increasing efficiency and competitiveness.

Enterprise Data Maturity Assessment (DMA) for a Multinational in the Manufacturing Sector Use Cases

Enterprise Data Maturity Assessment (DMA) for a Multinational in the Manufacturing Sector

A decentralised manufacturing multinational implemented a tailored Data Maturity Assessment to align independent entities under a unified data strategy and framework.

video title

Lets Start

Got a question? Want to start a new project?
Contact us

Menu

  • Expertise
  • Knowledge Centre
  • About Us
  • Careers
  • Contacts

Newsletter

Keep up to date and drive success with innovation
Newsletter
PRR - Plano de Recuperação e Resiliência. Financiado pela União Europeia - NextGenerationEU

2026 All rights reserved

Privacy and Data Protection Policy Information Security Policy
URS - ISO 27001
URS - ISO 27701
Cookies Settings

BI4ALL may use cookies to memorise your login data, collect statistics to optimise the functionality of the website and to carry out marketing actions based on your interests.
You can customise the cookies used in .

Cookies options

These cookies are essential to provide services available on our website and to enable you to use certain features on our website. Without these cookies, we cannot provide certain services on our website.

These cookies are used to provide a more personalised experience on our website and to remember the choices you make when using our website.

These cookies are used to recognise visitors when they return to our website. This enables us to personalise the content of the website for you, greet you by name and remember your preferences (for example, your choice of language or region).

These cookies are used to protect the security of our website and your data. This includes cookies that are used to enable you to log into secure areas of our website.

These cookies are used to collect information to analyse traffic on our website and understand how visitors are using our website. For example, these cookies can measure factors such as time spent on the website or pages visited, which will allow us to understand how we can improve our website for users. The information collected through these measurement and performance cookies does not identify any individual visitor.

These cookies are used to deliver advertisements that are more relevant to you and your interests. They are also used to limit the number of times you see an advertisement and to help measure the effectiveness of an advertising campaign. They may be placed by us or by third parties with our permission. They remember that you have visited a website and this information is shared with other organisations, such as advertisers.

Política de Privacidade