The role of data modeling in the modern data stack

Data Science
January 18, 2023

Modeling behavioral data

Recent years have seen a rebirth of data modeling in the landscape of data science. Transformation is a term that's being thrown around a lot these days. Powerful solutions like dbt and leading ingestion tools, like Snowplow, have unlocked the power of transforming data for thousands of organizations.

Data practitioners spend a lot of time thinking about how best to model their data, what tools they need and the talent they should hire to perform important transformations.Using data modeling as an important part of your data stack is gaining popularity. One of the reasons for this is that it will provide a competitive advantage over businesses who are missing out on this trend.

What is data modeling? 

In essence, data modeling is the process of taking raw data from digital platforms and products, then transforming it into derived tables that can be used for specific purposes. For example, as a reporting tool or analytics use case.

Data modeling can offer two major benefits:

– It enables you to gather raw data and condense it into usable sets, which can be digested by BI tools

– You'll be able to apply business logic to unopinionated data, transforming it into specific, opinionated data that matches your needs.

You can think of these aggregations like 'chunks' of data, carefully selected to ensure the data is used for its designed purpose.

The importance of a good data model is that it drives data productivity

Some people say that data is only as good as how much it contributes to your business. How true is this? Let’s look at that idea of "data productivity" on its own and what it means for a business.

A definition of data productivity might be the ratio of data added to operational value. For instance:

– When you give your Marketing, Product, and other teams unrestricted access to all the data available, data productivity is high.

– When data isn't flowing freely, when it's coming in from too many sources and is too messy to be useful in decision-making, data productivity is low.

More and more companies are waking up to the opportunities that come from high data productivity. These companies are taking ownership of processes like data modeling and leaving behind one-size-fits-all solutions, or packaged analytics tools.

The rise of the modern data stack and the rise of data modeling are connected

It's no coincidence that concepts like the modern data stack are popping up at the same time as a surge in demand for data transformation solutions.

Many companies are beginning to see the power of owning their data stack from end to end. Automated processes have removed complexity, but they have also removed 'contact' with the organization at key points in the pipeline. When data is passed through a black box, we don't know how it's transformed. And when it comes out, it may not be neatly organized in a way that is useful or productive for our company. For example, for companies that rely on two-sided marketplaces (or SaaS platforms), having aggregated data can be imperative.

If you build your own data model, there are some quick benefits. For example, you can:

  • The data can be transformed into a format that’s intelligible to consumers, leading to increased productivity.
  • The information can be viewed at a high level or analyzed in detail for any specific need
  • Internal teams don't need to rely on analysts to put together data from disparate sources, since it's already in a central system.
  • Data models can apply logic that reflects the unique needs of a business
  • All employees can work with a single, consistent data source, and produce equitable results. This will build a sense of trust and transparency in your organization and give you all the facts to assess performance.

Building a modern data stack that puts you in control of your data unlocks countless possibilities. It not only means the data you collect is more relevant and can be used by your internal team, it also means you can deploy more powerful use cases with data collected on your own site. This has culminated in the data modeling renaissance.

Data modeling is undergoing a renaissance

Never before have so many new models, new tools, and new discussions come about. It is an exciting time in the field of data modeling, and we are lucky to be alive in this golden age of it.

Analytics Engineers help organizations analyze big data by helping to transform raw data into data that is easily actionable for company employees. These professionals are critical in the middle layer between ingestion and reporting, as they help make sure their internal team has clean and clear access to the information they need to do their jobs. Analytics Engineers without a sound understanding of data engineering would be disastrous for those teams.

Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions.
What is Analytics Engineering, by dbt Labs

The rise of data modeling has provided analytics teams with the ability to take ownership of their 'middle layer' by transforming data in the data warehouse. For many, this is accomplished with SQL, which is often utilized as their primary querying language.

The first tool in the dbt data democratization process is dbt, which removes the obstacles and lets teams develop models and test them before they're deployed.

Tools like dbt make the process of analytics engineering possible. They're the translators of the data age—those who can take data in its raw format and transform it into 'chunks,' or valuable insights, for marketing and product teams to use. But this can often be tough to achieve.

It isn't always easy to get it right

One of the endless advantages to owning the transformation process is that you will understand your business and data better than anyone else.

For a project or task of any size, designers and architects can call on us for help. Our work is designed to be easy on the project manager and increase productivity. That's why less data-backed organizations are happy to let a model do their work for them: they don't want to bother with the headaches of design or prototyping. And while having a model may seem favorable in the short term, this doesn't work as well in long term projects. For example, as more and more industries rely on data models for decisions, it can become detrimental to use an external tool when you could instead use one that you’ve tailored to match your organization's needs. In our next chapter we’ll explore how this process should go and what you should remember while designing your own unique data model.