Data Build Tool (DBT) Integration With Snowflake

Author: Rini Shiny M

This blog describes the integration of DBT with Snowflake.

Fig 1. Role of DBT in data analytics

Introduction to DBT

DBT (Data Build Tool), a Python application is a transformation tool, which focuses on the Transformation part in ELT (Extract, Load, Transform) processes. It aims at performing data engineering activities before the data is used for analytics. DBT performs transformation using select statements, which in turn converts into tables and views, thus making the transformation process simple and effective.

DBT comes in two variants

i) DBT CLI — Command Line Interface, which is run by a terminal.
ii)DBT Cloud — A web based application along with an IDE.

Fig 2. Data Build Tool

Key features of DBT

1. Documentation
The documentation offered by DBT is updated as to when models are developed. It is also easily accessible throughout the development. The documentation is generated based on the descriptions provided, dependencies between models, SQL files, Sources, and the tests defined.

2. Data Lineage
The data pipeline in DBT is represented in the form of lineage graphs. This gives proper visibility of the data and how the data maps with the business logic. The complete flow of data from the source to the target system is shown in the graph, thus providing transparency.

3. Version Control
Integration of GIT with DBT has made version control come hand in hand. All the models, tests generated, sources, packages used, and other configurations used in the project are versioned in the connected GIT repository.

4. Testing
dbt comes with prebuilt unique, not null, referential integrity, and accepted value testing. Jinja and SQL can be used to write custom test cases to add more testing features apart from the default tests.

5. Reusability
In DBT, it is possible to develop models that can be reused by using the Jinja framework.

6. Data refreshes within dbt Cloud
Using DBT, it is possible to schedule refresh at the production environment according to the business requirement without the need of an orchestration tool.

Why DBT and Snowflake?

Snowflake is a data warehouse hosted as a Software-as-a-Service (SaaS) which is faster, user friendly, and more flexible than a traditional data warehouse. Snowflake’s data warehouse is developed using a SQL database engine with an architecture specifically designed for the cloud environment. Snowflake is built using multi-cluster, shared data architecture — This makes the Data Storage and Compute layer, the query processing layer separated. DBT is an open source tool that is used to manage the ELT load in Snowflake. DBT can be used with Snowflake for the following features.

Integration of DBT with Snowflake

Step 1: Snowflake account creation
Create an account with Snowflake and note the highlighted details which will be used for database connection with DBT.

Fig 3. Snowflake Worksheet

Step 2: DBT Cloud sign up

Create an account with DBT using Try dbt Cloud Free. Once signed in, create a new project. To set up the created project, refer to the below steps.

Fig 4. DBT Project

Click on begin to proceed. Give an appropriate DBT project name in the next screen and hit continue. In the next screen, select Snowflake from the list of data warehouses.

Step 3: Connecting to Snowflake instance

The following fields are required when creating a Snowflake connection:

Fig 5. Snowflake Connection with DBT

Once the credentials are given, test the connection. On a successful connection, proceed to connect DBT with an empty GIT repository.

References:
https://www.getdbt.com/
https://docs.getdbt.com/docs/introduction