2 September 2025
Simplifying Metadata Frameworks in Microsoft Fabric with YAML
Exploring YAML-based configuration, eventhouse logging, and CI/CD in Fabric data pipelines.
If you’ve ever worked on a data engineering project, you’ve probably run into “configuration chaos.” Perhaps someone updated a config table without telling anyone. Perhaps a “quick fix” for Project A broke Project B. Or maybe — and let’s be honest here — you ran an UPDATE without a WHERE and suddenly had a very quiet (and very long) evening.
The truth is, managing pipeline configurations at scale is tricky. That’s precisely why metadata-driven frameworks exist.
At their core, they let you move away from hardcoding and towards flexibility. Instead of baking logic into every pipeline, you define what needs to run, where the data lives, where it should land, and how it should get there — all in metadata.
These frameworks can:
Traditionally, configurations live in SQL tables. In Microsoft Fabric, that usually means a warehouse or SQL database, with pipelines running orchestration.
This works… up to a point. But as projects and teams grow, cracks start to show:
Soon, what started as a neat solution becomes hard to maintain.
This is where YAML comes in. YAML files are:
Yes, indentation can be unforgiving (we’ve all been betrayed by an extra space), but the trade-offs are worth it.
In our Fabric projects, we use YAML to replace those config tables, aligning with a medallion architecture — Bronze, Silver, Gold — with distinct lakehouses for each layer. YAML acts as the blueprint that orchestrates how data moves through them.
Here’s a simple YAML definition for a bronze extraction:
What’s happening here?
Think of it as the project’s GPS: it knows where the data starts, where it’s going, and which vehicle (pipeline) is driving it there.
Now let’s see how this looks in silver:
Here we see a few differences:
So even in YAML, Bronze comes before Silver — no skipping ahead.
All of this comes together in an orchestration notebook. Its job is simple (in concept):
Essentially, the notebook acts as an air traffic controller for your data pipelines — calmly determining who takes off and when.
Moving configurations to YAML has clear benefits:
It keeps your data projects clean, traceable, and easier to debug.
A metadata-driven framework isn’t complete without proper logging and error handling. In our approach, each task execution writes structured logs into a dedicated logging eventhouse. These logs capture:
In the event of failure, the framework records the exception details and automatically stops dependent tasks. Because the orchestration builds a DAG, dependencies won’t run if an upstream task fails — ensuring downstream layers aren’t polluted with incomplete data.
Error-handling rules can also be defined in YAML (for example, whether to retry a failed step or skip it with a warning). And once the root cause has been fixed, the framework can even resume execution from the exact task that failed, avoiding the need to reprocess everything from scratch.
Another key advantage of using YAML is smooth integration with DevOps pipelines. Since all configurations live in files, they:
This means teams can manage pipeline configurations just like application code – with proper CI/CD practices instead of ad-hoc table edits.
Metadata-driven frameworks have always been designed to simplify data pipeline management. Using YAML in Microsoft Fabric takes that idea further — combining the flexibility of metadata with the safety of version control, structured logging, and modern DevOps deployment practices.
So, the next time someone asks where the configs live, you won’t have to point them to a half-forgotten SQL table. You can show them a YAML file in source control, trace logs in a dedicated eventhouse, and a clean deployment pipeline that moves everything from Dev to Prod without surprises.
This is just the start. In upcoming posts, I’ll go deeper into how logging is structured, how error recovery works, and how DevOps pipelines were set up for YAML deployment. Stay tuned!