#013: How to Automate & Improve Data QualityOct 01, 2022
“If code runs and manual checks pass, then we’re good!”
For most of my 8 years on data teams, this was the extent of Data Quality.
But this leads to sloppy errors that kill user confidence.
Today, I want to share 3 ways you can automate & improve Data Quality by using:
Continuous Integration (CI) workflows
CI workflows help you automate deployments, testing & docs.
The biggest confidence killer is broken logic or missing data.
And it can take months to regain that trust.
Instead, build workflows to validate changes beforehand.
Soon stakeholders will be focused on new features, not bug fixes.
Example: Use GitHub Actions to deploy & test all Pull Requests changes.
Linters establish clear syntax rules for your code.
Everyone has their own take on the “right” way to code.
But this leads to petty arguments and wasted time.
Linters hard-code styling rules and auto-check that they’re being followed.
The result is more consistent and maintainable code.
Example: SQL Fluff or PyLint
Task automations push, pull and update data on your behalf.
Fair or not, you’re expected to be aware of the full data platform.
But without the right systems, this is an impossible task.
Push notifications and task orchestrators are perfect for this.
Once in place, you’ll feel more in control and can quickly address issues.
Example: Slack notifications from Airflow
Better data quality = Happy stakeholders.
Happy stakeholders = Happy engineers.
Level-up your abilities as a Data Engineer, faster.
Learn new data engineering tips, tricks and best practices every Wednesday.