What is DBT (Data Build Tool)

Key takeways

DBT automates data transformation

Offers flexibility of implementation

Improves the management and documentation of analysis projects

DBT is a transformation workflow. The acronym stands for data build tool.

As such, DBT handles the T from ELT, but actually, it’s not dbt that does the transformation. You will need to have a Postgres DB, SQL Server, Redshift, Databricks, Snowflake, or GCP (and more) so that dbt can send your transformation processes to be executed by the DB engine.

DBT takes care of orchestration of the transformation processes (models), sending them to be executed on the DB engine you configured. That means that dbt uses SQL (where the SQL flavour depends on the engine you connect it to).

DBT enables data analysts and data engineers to automate the testing and deployment of the data transformation process. This is especially useful because many companies have increasingly complex business logic behind their reporting data. The dbt tool keeps a record of all changes made to the underlying logic and makes it easy to trace data and update or fix the pipeline through version control.

Why DBT

DBT adds much more than just being a transformation process. Follows a quick list of what you gain when using dbt:

Work with materialisations and then adjust for what you need

In DBT, you work with models that can be materialised as tables, views, incremental or ephemeral. To change the materialization type, you don’t need to rewrite the DDL like you are used to when working with SQL; you just need to change the configuration for that mode, and DBT takes care of it the next time it runs.

Data lineage from the start

A model is nothing more than an SQL statement, but with dbt, you can use the ref keyword to refer to an upstream model. This allows you to determine the execution order of your models, execute your transformations in stages, and extract the complete lineage of your dbt projects.

Document your models

With dbt, your documentation is done where it belongs, close to your models, and is version-controlled. Since the project uses the same git repo for versioning, you and your fellow developers will have access to all the documentation you and your fellow developers created locally.

You can easily share it also by publishing it to a static website like the one below if you use DBT core (open source) or simply by running DBT on DBT cloud (paid subscription)

Reuse your code and use control structures in your code

SQL files in dbt can contain Jinja, a lightweight templating language. Using Jinja in SQL provides a way to use control structures in your queries. For example, if statements and for loops. It is very useful to write repetitive SQL code like this:

In the above, imagine you have to write an SQL with a big list of payment methods. With Jinja, you could just add them to the list, and the for loop would take care of it.

You can also create macros that you or your colleagues can reuse according to the needs, thus eliminating the need to repeat code throughout the project.

Test your models

Tests are tightly integrated into the dbt framework. You can use assertions on your code (generic or custom) to test the results generated by a model and act accordingly, either failing the execution or just warning about it.

Additionally, you can integrate automated testing packages into dbt (like dbt-expectations), allowing you to expand the scope of tests you can execute on your models.

Use external packages to help your development

DBT’s support for external packages in SQL and Python allows you to build more complex and powerful data pipelines, like using Python’s pandas’ library to make data transformations, for instance:

Options to use DBT

DBT Core

DBT Core is the open-source version of DBT. You can install it on your local machine using Homebrew (macOs or linux) or pip using the adapter you plan to use. For instance, DBT -Postgres. You can also install it using Docker or from the source git code.

With DBT Core, you manage your dbt infrastructure, versioning, and, depending on your use case, an orchestration platform to run and monitor your jobs on a schedule.

Refer to https://docs.getdbt.com/docs/core/installation for installation instructions.

DBT Cloud

DBT Cloud is a cloud-based service that provides additional functionality on top of DBT Core.

You won’t have to worry about infrastructure, software installation and managing your database connections/credentials. It provides code versioning + collaboration, and sharing options. You will have access to automated scheduling and orchestration and advanced analytics and reporting, which will allow you to track the performance of your data transformations and identify issues and bottlenecks.

It does come with a cost and may not be necessary for all use cases.

Refer to https://www.getdbt.com/pricing/ for pricing on this offer.

Autor

Rui Custódio

Associate Specialist

What is DBT (Data Build Tool)

Fabric: nova plataforma de análise de dados

What is DBT (Data Build Tool)

Key takeways

DBT automates data transformation

Offers flexibility of implementation

Improves the management and documentation of analysis projects

Why DBT

Work with materialisations and then adjust for what you need

Data lineage from the start

Document your models

Reuse your code and use control structures in your code

Test your models

Use external packages to help your development

Options to use DBT

DBT Core

DBT Cloud

Autor

Rui Custódio

Data-Driven Economy: How Data Redefines Decisions and Strategy

Data sovereignty: the strategic asset for businesses

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations

Optimising Performance in Microsoft Fabric Without Exceeding Capacity Limits

Metadata Frameworks in Microsoft Fabric: YAML Deployments (Part 3)

Metadata Frameworks in Microsoft Fabric: Logging with Eventhouse (Part 2)

What is DBT (Data Build Tool)

Fabric: nova plataforma de análise de dados

What is DBT (Data Build Tool)

Key takeways

DBT automates data transformation

Offers flexibility of implementation

Improves the management and documentation of analysis projects

Why DBT

Work with materialisations and then adjust for what you need

Data lineage from the start

Document your models

Reuse your code and use control structures in your code

Test your models

Use external packages to help your development

Options to use DBT

DBT Core

DBT Cloud

Autor

Rui Custódio

Share

Suggested Content

Data-Driven Economy: How Data Redefines Decisions and Strategy

Data sovereignty: the strategic asset for businesses

Modern Anomaly Detection: Techniques, Challenges, and Ethical Considerations

Optimising Performance in Microsoft Fabric Without Exceeding Capacity Limits

Metadata Frameworks in Microsoft Fabric: YAML Deployments (Part 3)

Metadata Frameworks in Microsoft Fabric: Logging with Eventhouse (Part 2)