09/05/2025 | News release | Distributed by Public on 09/05/2025 11:18
Data Build Tool (dbt)is an open source analytics engineering framework that enables teams to transform raw data that has been loaded into a warehouse like Snowflake, BigQuery, Redshift, or Databricks using SQL-based workflows. dbt is available in two main forms: dbt Core, the free and open source CLI tool, and dbt Cloud, a managed platform that adds scheduling, UI support, collaboration tools, and native integrations. Both options enable teams to introduce software engineering best practices-such as version control, automated testing, data lineage tracking, documentation generation , and CI/CD-to the analytics workflow. Its model dependency graphs ensure that transformations are executed in the correct order, while built-in testing and assertions help catch data quality issues early.
By abstracting complex transformations into reusable models and integrating seamlessly with the modern data stack, dbt enables teams to build scalable, trustworthy, and auditable data pipelines. This makes dbt an increasingly popular tool for analytics engineering teams, as it helps them simplify collaborative development, improve visibility into how data flows through the system, and deliver clean, production-grade datasets for analytics and business intelligence.
In this post, we'll show you:
dbt allows data teams to write modular, testable SQL transformations that run directly in modern data warehouses. Using Jinjatemplating, dbt enables developers to inject logic such as loops, conditionals, and environment-aware parameters into their SQL models, reducing duplication and making code more reusable across datasets and environments. dbt handles compiling, dependency resolution, and execution, ensuring models run in the correct order. For example, a team might use dbt to transform raw event logs into structured, analytics-ready tables by applying consistent business logic across pipelines.
To support data quality, dbt includes built-in testing for critical fields, such as checking for null values, uniqueness, or valid categories. These automated tests catch bad data before it reaches dashboards or downstream processes. For example, running a uniqueness test on a field can prevent silent duplication in user-level reporting. Additionally, dbt supports freshness blocksto validate whether a data source has been recently updated, flagging pipelines that may be running on stale data.
dbt also brings visibility to transformation pipelines with automatic dependency tracking and lineage graphs, which can be generated with the command dbt docs generate. These graphs help teams audit data flows-such as tracing a marketing dashboard metric all the way back to raw event ingestion-improving transparency and reducing the surface area for debugging.
Finally, dbt integrates with CI/CD platforms like GitHub Actions, GitLab CI, and Apache Airflowto validate and deploy data pipelines through version control. This makes it easier for data teams to collaborate on shared models, review changes, and enforce testing before deployment, lowering the technical barrier to modern pipeline development and bringing software engineering best practices to analytics workflows.
To get the most out of dbt, it's important to structure your project for clarity, maintainability, and scale. dbt encourages a layered approach to modeling, typically broken into three logical tiers: staging, intermediate, and marts. Each layer serves a distinct purpose in the transformation pipeline, helping teams apply consistent practices as their analytics workflows grow in complexity.
The staging layer is the foundation of any dbt project. These models are lightweight wrappers around raw tables-such as data from Salesforce, Stripe, or internal databases-and serve to clean and prepare source-conformed concepts for downstream usage.
To optimize the staging layer of dbt projects, organizations should:
The intermediate layer transforms staging data into more business-ready forms by applying joins, filters, and calculated metrics. These models often represent relationships (e.g., joined to ), key transformations (e.g., deriving ), or custom metrics (e.g., ).
Best practices for intermediate models in dbt include:
The marts layer contains final models that are used directly by analysts, dashboards, and machine learning pipelines. These models should be stable, trusted, and aligned with key business entities like , , or .
To keep marts clean and maintainable, teams should:
Beyond structuring models effectively, it's equally important to monitor how your dbt jobs run in production so you can quickly detect and resolve issues. By integrating OpenLineage, an open source framework for metadata and lineage collection, you can automatically capture run-level details from your dbt jobs, such as start and end times, execution status, and upstream/downstream model dependencies and data lineage. This provides a consistent way to detect dbt pipeline failures, monitor job health, and surface anomalies before they impact business-critical dashboards. Datadog actively contributes to OpenLineage's open source ecosystem, (e.g., structured dbt log consumption, and async HTTP transport) helping extend monitoring and observability capabilities for data pipelines. By adopting OpenLineage alongside dbt, and taking advantage of these ecosystem enhancements, you can gain deeper visibility into production runs, enabling proactive alerting, faster root-cause analysis, and more reliable analytics workflows.
In this post, we've explored what dbt is, how it's being used by analytics engineers, and tips for getting the most out of it. Datadog offers integrations with dbt cloudand dbt core, the capability to view dbt job execution details in context with their parent Apache Airflow tasks, as well as more forthcoming integrations to help you monitor the performance of your dbt runs, visualize model execution, and more.
If you're new to Datadog, sign up for a free trialto get started.