#013: How to Automate & Improve Data Quality

newsletter Oct 01, 2022

“If code runs and manual checks pass, then we’re good!”

For most of my 8 years on data teams, this was the extent of Data Quality.

But this leads to sloppy errors that kill user confidence.


Today, I want to share 3 ways you can automate & improve Data Quality by using:

  1. Continuous Integration (CI) workflows

  2. Linters

  3. Task Automations


CI workflows help you automate deployments, testing & docs.

The biggest confidence killer is broken logic or missing data.

And it can take months to regain that trust.

Instead, build workflows to validate changes beforehand.

Soon stakeholders will be focused on new features, not bug fixes.


Example: Use GitHub Actions to deploy & test all Pull Requests changes.


Linters establish clear syntax rules for your code.

Everyone has their own take on the “right” way to code.

But this leads to petty arguments and wasted time.

Linters hard-code styling rules and auto-check that they’re being followed.

The result is more consistent and maintainable code.


Example: SQL Fluff or PyLint


Task automations push, pull and update data on your behalf.

Fair or not, you’re expected to be aware of the full data platform.

But without the right systems, this is an impossible task.

Push notifications and task orchestrators are perfect for this.

Once in place, you’ll feel more in control and can quickly address issues.


Example: Slack notifications from Airflow


In summary:

Better data quality = Happy stakeholders.

Happy stakeholders = Happy engineers.

Level-up your abilities as a Data Engineer, faster.

Subscribe to receive tips to improve your skillset as a data engineer every Saturday. Always readable in 2 minutes or less.

Other Recent Posts

#021: Why your data team needs version control

Nov 26, 2022

#020: What to Learn First as a Data Engineer

Nov 19, 2022

#019: 3 Time-Saving dbt Cloud Features

Nov 12, 2022

#018: Lessons Going from Snowflake to BigQuery

Nov 05, 2022