dbt documentation — A comprehensive guide to generate and host dbt docs
What is dbt ?
dbt is a transformation workflow that helps you get more work done while producing higher-quality results. You can use dbt to modularize and centralize your analytics code, while also providing your data team with guardrails typically found in software engineering workflows.
Read more about dbt introduction here
dbt documentation
Good documentation for your dbt models will help downstream consumers discover and understand the datasets which you curate for them.
dbt provides a way to generate documentation for your dbt project and render it as a website.
Keeping the dbt documentation up to date and hosting it somewhere for the users or developers is a must.
In this comprehensive guide, we will explore how to generate the dbt docs for a project, and sometimes we will have more than one dbt project and we want to keep the documentation for all the projects in one place instead of having multiple sites for individual projects.
There is no direct way to generate the docs for multiple projects. To overcome this problem I have created a GitHub Action which generates the docs for all the projects and creates the files ready for hosting in GitHub Pages
Generating and deploying documentation
Navigate to the GitHub repository of your dbt project settings and create environments to store env variables
Create a new GitHub workflow to generate and deploy the dbt docs to GitHub pages. We will trigger the workflow on push to the main branch.
We will use the generate-dbt-docs GitHub action in the workflow to generate the docs — check the readme for all inputs supported by the action. In this article for demo, we will two inputs
projects_dir
- directory where dbt projects are present in the repository
docs_dir
- directory where the generated dbt docs will be written — this same directory will be used in the next step to upload this folder as page artifact
name: "dbt-docs-publish"
on:
push:
branches:
- main
workflow_dispatch:
jobs:
build:
runs-on: ubuntu-latest
environment: prod # environment name defined in the github
# using the defined enviromenent variables to set the env vars for the runner
env:
DBT_TARGET: ${{vars.DBT_TARGET}}
DBT_PROFILES_DIR: ${{ github.workspace }}/projects
CLICKHOUSE_USER: ${{vars.CLICKHOUSE_USER}}
CLICKHOUSE_PASSWORD: ${{vars.CLICKHOUSE_PASSWORD}}
CLICKHOUSE_DATABASE: ${{vars.CLICKHOUSE_DATABASE}}
steps:
- name: "Step 01 - Checkout current branch"
id: step01
uses: actions/checkout@v3
- name: "Step 02 - Install dbt"
id: step02
run: |
pip3 install dbt-core dbt-clickhouse
dbt --version
- name: "Step 03 - Setup ClickHouse"
id: step03
uses: praneeth527/clickhouse-server-action@v1.0.1
with:
tag: '23.3.18.15-alpine'
# https://github.com/marketplace/actions/generate-dbt-docs
- name: "Step 04 - Generate dbt docs"
id: step04
uses: praneeth527/generate-dbt-docs@v1
with:
projects_dir: projects
docs_dir: ${{ github.workspace }}/docs
- name: "Step 05 - Upload pages to artifact"
id: step05
uses: actions/upload-pages-artifact@v3
with:
path: ${{ github.workspace }}/docs
# https://github.com/marketplace/actions/deploy-github-pages-site
deploy-to-github-pages:
needs: build
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4
The above workflow has two jobs
- build — which will install the required dbt packages (dbt-core & dbt-clickhouse), other external dependencies (here ClickHouse server), and generating and preparting dbt docs to upload to GitHub pages
- deploy — which will deploy the artifact uploaded by the previous job to GitHub pages
Demo
Repository: https://github.com/praneeth527/dbt-docs-demo
Docs link: https://praneeth527.github.io/dbt-docs-demo/
Summary
Hosting dbt documentation is essential for collaboration and knowledge sharing among other teams. We have explored how to use generate-dbt-docs
action for generating docs for multiple projects and hosting the same in github pages
Hope this article was helpful. Corrections/suggestions are welcome. Thanks for reading.