#029: What Makes a Data Stack “Modern”?

Jan 21, 2023

If you work in data, there’s no escaping the term “modern data stack” (MDS).

In fact, any mention of using a “traditional data stack” (TDS) is instantly dismissed.

But what’s the big deal with the MDS and why is it important for engineers to know?

Today, I’m going to break down 3 key differences between the Traditional & Modern data stacks so you can better understand each and form your own opinions:

  1. Tools: All-In-One vs Separate
  2. Hosting: On-Prem vs Cloud
  3. User Focus: IT vs Business

 

Traditional stacks are coupled, modern stacks are modular

While the TDS approach means well, the tools were designed for users in a different time.

Business logic is hidden behind outdated drag/drop interfaces and it’s difficult (or impossible) to implement a modern git workflow.

Over time this leads to a complex, interconnected web of logic that’s hard to debug or update.

If you haven’t had the privilege on working on this type of platform, just trust me - it’s a mess.

To solve this issue, in a modern stack, each step of a pipeline is handled in a separate tool that’s optimized for that component.

This modular approach makes debugging easier, improves individual capabilities and allows for a more dynamic, automated workflow.

Plus, most logic is now written in code (vs drag-and-drop) which allows better version control.

The downside is there are now (too) many moving parts which makes organization and strategy more complex.

 

Traditional stacks stay on-premise, Modern stacks move to the cloud.

The rise of cloud computing since the mid-2010s has drastically changed how much and how easily data storage can be scaled.

But this concept applies not just to storage but also the tools themselves.

Many “modern” tools can be integrated in less than a day as a hosted cloud service (SaaS).

Or you can skip cloud licensing and host it open-source on a cloud provider.

The alternative in a TDS is to host and manage all infrastructure on-premise.

This gives you more ownership and may improve security.

But comes at the cost of your time spent on maintenance and scheduling upgrades.

With cloud options, maintenance is built-in and you can adjust settings through a web-interface.

 

Traditional tools scare business users, modern tools excite them.

Data literacy is no longer something just for technical folks.

Most working professionals today not only understand basic data concepts but expect to be included in conversations.

Many will even have direct access to the same tools we as engineers use.

In the past, this was unheard of.

You would never expect a stakeholder to log into SQL Server or Informatica and poke around.

It was a black box of complexity that nobody wanted to touch.

But modern SaaS tools are designed with clean interfaces that abstract away much of the complexity so business users can feel comfortable.

For example, Airbyte, Fivetran, dbt, Snowflake, Prefect, etc. all have nice UIs for anybody to check out.

Of course there are still technical components behind the scenes, but that’s only for data engineers to use.

 

While these are 3 high-level differences, there is much more to be covered.

Other topics to check out are data quality, automation and collaboration (collectively DataOps).

For more on this topic in particular, consider checking out this YouTube video.

But hopefully you now have a better understanding of how the TDS & MDS differ as it relates to tools, hosting and user-focus.

 

Cheers,

Mike

 


 

Looking for more? Here are 3 other ways I can help you:

  1. The Playbook for dbt™ - Learn step-by-step how to build, automate & scale dbt projects from scratch using best practices
  2. Consulting - Lets partner on your data project. Hire me as a hands-on consultant
  3. Sponsorship - Promote your product or brand to 5,000+ email subscribers and/or 16k+ YouTube subscribers

Level-up your abilities as a Data Engineer, faster.

Learn new data engineering tips, tricks and best practices every Wednesday.

Other Recent Posts

Data Automation (CI/CD) with a Real Life Example

May 17, 2023

3 Ways to Deploy Data Projects

May 10, 2023

The Importance of Virtual Environments

Apr 26, 2023

How to Create a Virtual Machine on GCP

Apr 19, 2023