Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Exploring YAML-based configuration, eventhouse logging, and CI/CD in Fabric data pipelines.

Why metadata-driven frameworks matter

If you’ve ever worked on a data engineering project, you’ve probably run into “configuration chaos.” Perhaps someone updated a config table without telling anyone. Perhaps a “quick fix” for Project A broke Project B. Or maybe — and let’s be honest here — you ran an UPDATE without a WHERE and suddenly had a very quiet (and very long) evening.

The truth is, managing pipeline configurations at scale is tricky. That’s precisely why metadata-driven frameworks exist.

At their core, they let you move away from hardcoding and towards flexibility. Instead of baking logic into every pipeline, you define what needs to run, where the data lives, where it should land, and how it should get there — all in metadata.

These frameworks can:

Parameterise tasks and data movements.
Track deltas and log execution history.
Define dependencies and error tolerance.

The problem with SQL tables

Traditionally, configurations live in SQL tables. In Microsoft Fabric, that usually means a warehouse or SQL database, with pipelines running orchestration.

This works… up to a point. But as projects and teams grow, cracks start to show:

Configuration tables are updated in production “just this once” (famous last words).
Version control is basically non-existent.
DevOps integration becomes painful.
Debugging config changes turns into detective work.

Soon, what started as a neat solution becomes hard to maintain.

A cleaner approach with YAML

This is where YAML comes in. YAML files are:

Human-readable → easy to scan, even in big projects.
Version-controlled → Git makes tracking changes trivial.
PySpark-friendly → parsing is straightforward.

Yes, indentation can be unforgiving (we’ve all been betrayed by an extra space), but the trade-offs are worth it.

In our Fabric projects, we use YAML to replace those config tables, aligning with a medallion architecture — Bronze, Silver, Gold — with distinct lakehouses for each layer. YAML acts as the blueprint that orchestrates how data moves through them.

Bronze layer configuration

Here’s a simple YAML definition for a bronze extraction:

What’s happening here?

Source → AdventureWorks SQL table Person.Person.
Destination → Bronze lakehouse, saved as Parquet.
Load method → delta loads based on ModifiedDate.
Executor → a pipeline (pl_copyFrom_adventureWorks2022).

Think of it as the project’s GPS: it knows where the data starts, where it’s going, and which vehicle (pipeline) is driving it there.

Silver layer configuration (dependencies)

Now let’s see how this looks in silver:

Here we see a few differences:

The load type is a merge into the Silver Lakehouse.
The artifact is a notebook (NB_Load_Silver).
There’s a dependency → it waits for bronze. Person to complete first.

So even in YAML, Bronze comes before Silver — no skipping ahead.

How orchestration works

All of this comes together in an orchestration notebook. Its job is simple (in concept):

Read the YAML.
Parse the tasks and dependencies.
Build a DAG (directed acyclic graph).
Execute tasks in the correct order.

Pipelines and other non-notebook artifacts are triggered via the Fabric API.
DAG is executed using notebookutils.runMultiple.

Essentially, the notebook acts as an air traffic controller for your data pipelines — calmly determining who takes off and when.

The benefits in practice

Moving configurations to YAML has clear benefits:

Scalability → easy to add new tasks as projects grow.
Maintainability → configs are structured, readable, and versioned.
Safer deployments → no more mystery updates to config tables in production.

It keeps your data projects clean, traceable, and easier to debug.

Logging and error handling

A metadata-driven framework isn’t complete without proper logging and error handling. In our approach, each task execution writes structured logs into a dedicated logging eventhouse. These logs capture:

Task name and layer (Bronze, Silver, Gold)
Execution timestamps (start, end, duration)
Row counts (processed, inserted, updated)
Status (success, failure)

In the event of failure, the framework records the exception details and automatically stops dependent tasks. Because the orchestration builds a DAG, dependencies won’t run if an upstream task fails — ensuring downstream layers aren’t polluted with incomplete data.

Error-handling rules can also be defined in YAML (for example, whether to retry a failed step or skip it with a warning). And once the root cause has been fixed, the framework can even resume execution from the exact task that failed, avoiding the need to reprocess everything from scratch.

DevOps deployment

Another key advantage of using YAML is smooth integration with DevOps pipelines. Since all configurations live in files, they:

Travel with the code → stored in Git repos, reviewed via pull requests.
Deploy consistently → Fabric deployment pipelines can move YAML from Dev to Test to Prod environments with confidence.
Support approvals and rollbacks → if something goes wrong, reverting to a previous YAML version is as simple as rolling back a commit.

This means teams can manage pipeline configurations just like application code – with proper CI/CD practices instead of ad-hoc table edits.

Wrapping up

Metadata-driven frameworks have always been designed to simplify data pipeline management. Using YAML in Microsoft Fabric takes that idea further — combining the flexibility of metadata with the safety of version control, structured logging, and modern DevOps deployment practices.

So, the next time someone asks where the configs live, you won’t have to point them to a half-forgotten SQL table. You can show them a YAML file in source control, trace logs in a dedicated eventhouse, and a clean deployment pipeline that moves everything from Dev to Prod without surprises.

This is just the start. In upcoming posts, I’ll go deeper into how logging is structured, how error recovery works, and how DevOps pipelines were set up for YAML deployment. Stay tuned!

Author

Rui Francisco Gonçalves

Senior Specialist

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Fabric: nova plataforma de análise de dados

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Why metadata-driven frameworks matter

The problem with SQL tables

A cleaner approach with YAML

Bronze layer configuration

Silver layer configuration (dependencies)

How orchestration works

The benefits in practice

Logging and error handling

DevOps deployment

Wrapping up

Author

Rui Francisco Gonçalves

The real bottleneck in Agentic AI isn’t data. It’s context

HUMAN–AI Alliance Agents | BI4ALL Talks

Fabric Model Analyzer: Entreprise-scale best practices monitoring

Finsolutia: Accelerated portfolio analysis

Human–AI Partnerships: From Automation to Collaboration

Native Writeback in Power BI with Translytical Task Flows

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Fabric: nova plataforma de análise de dados

Simplifying Metadata Frameworks in Microsoft Fabric with YAML

Why metadata-driven frameworks matter

The problem with SQL tables

A cleaner approach with YAML

Bronze layer configuration

Silver layer configuration (dependencies)

How orchestration works

The benefits in practice

Logging and error handling

DevOps deployment

Wrapping up

Author

Rui Francisco Gonçalves

Share

Suggested Content

The real bottleneck in Agentic AI isn’t data. It’s context

HUMAN–AI Alliance Agents | BI4ALL Talks

Fabric Model Analyzer: Entreprise-scale best practices monitoring

Finsolutia: Accelerated portfolio analysis

Human–AI Partnerships: From Automation to Collaboration

Native Writeback in Power BI with Translytical Task Flows