#013: How to Automate & Improve Data Quality

Oct 01, 2022

“If code runs and manual checks pass, then we’re good!”

For most of my 8 years on data teams, this was the extent of Data Quality.

But this leads to sloppy errors that kill user confidence.

Today, I want to share 3 ways you can automate & improve Data Quality by using:

Continuous Integration (CI) workflows
Linters
Task Automations

CI workflows help you automate deployments, testing & docs.

The biggest confidence killer is broken logic or missing data.

And it can take months to regain that trust.

Instead, build workflows to validate changes beforehand.

Soon stakeholders will be focused on new features, not bug fixes.

Example: Use GitHub Actions to deploy & test all Pull Requests changes.

Linters establish clear syntax rules for your code.

Everyone has their own take on the “right” way to code.

But this leads to petty arguments and wasted time.

Linters hard-code styling rules and auto-check that they’re being followed.

The result is more consistent and maintainable code.

Example: SQL Fluff or PyLint

Task automations push, pull and update data on your behalf.

Fair or not, you’re expected to be aware of the full data platform.

But without the right systems, this is an impossible task.

Push notifications and task orchestrators are perfect for this.

Once in place, you’ll feel more in control and can quickly address issues.

Example: Slack notifications from Airflow

In summary:

Better data quality = Happy stakeholders.

Happy stakeholders = Happy engineers.

Are You Leading a Small (or 1-Person) Data Team?

Get 1:1 guidance to help you build a reliable, modern data architecture at your company rather than feeling overwhelmed trying to do it all on your own.

Learn More: Simple Stack Advising

#013: How to Automate & Improve Data Quality

CI workflows help you automate deployments, testing & docs.

Linters establish clear syntax rules for your code.

Task automations push, pull and update data on your behalf.

Are You Leading a Small (or 1-Person) Data Team?

Build A Reliable, Modern Data Architecture Without The Mess