Data Modelling vs Semantic Layer

The Problem with Traditional Data Modelling

In traditional data modeling, analytics-ready datasets were designed primarily for BI tools, often following Kimball star schema principles. These models focused on pre-aggregating data to optimize dashboard performance but led to challenges like:

Inflexibility: Pre-aggregated datasets restricted metric reusability.
Duplication of Business Logic: Metrics were often defined separately in BI tools, leading to inconsistencies.
Performance Issues: Repeated joins and transformations resulted in high query costs and slow reports.

The Role of the Semantic Layer

The semantic layer allows organizations to define metrics, business rules, and aggregations centrally, decoupling transformation from metric definition. This enables:

A single source of truth for business metrics
Consistent calculations across BI and data applications
Faster insights by reducing redundant SQL logic

Key Differences: Traditional vs. Modern Data Modeling

Aspect

Traditional (BI-Ready)

Modern (Metrics-Ready)

Schema Design

Pre-aggregated fact tables

Granular fact tables

Metric Definition

Spread across BI tools

Centralized in semantic layer

Transformation

Done in data modelling tools and BI

Mostly in data modelling, semantic handles metrics

Performance

Heavy BI tool queries

Optimized joins in data modelling, metrics in semantic

Designing Data Models to Support a Semantic Layer

To make data modelling tool work efficiently with a semantic layer, we need to adjust the traditional staging → intermediate → marts model by focusing on:

Reducing Intermediate Models: Avoid unnecessary joins and pre-aggregations.
Creating Granular Fact Tables: Keep data at an atomic level for flexible metric computation.
Standardizing Naming Conventions: Ensure data model names align with semantic layer definitions.

Example: Good vs. Bad Fact Table Design

❌ Bad (Pre-Aggregated, BI-Ready)

SELECT customer_id, DATE_TRUNC('month', order_date) AS month, SUM(order_amount) AS monthly_revenue
FROM stg_orders
GROUP BY customer_id, month;

Issue: This locks the aggregation at a monthly level, making it difficult to compute daily or weekly revenue.

✅ Good (Metrics-Ready, Semantic Compatible)

SELECT order_id, customer_id, order_date, order_amount
FROM stg_orders;

Why? The semantic layer can now dynamically aggregate revenue at different time grains (daily, weekly, monthly).

Building a Semantic Layer That Works with data modelling tool

The semantic layer should be designed to:

Map directly to data model fact & dimension tables
Define business metrics declaratively (not transform data further)
Ensure reusability across multiple BI tools

Example: Defining Metrics in the Semantic Layer

Data Model Table (fct_orders)

SELECT order_id, customer_id, order_date, order_amount FROM stg_orders;

Semantic Layer (metrics.yaml)

metrics: name: total_revenue description: "Total revenue, aggregated at different time grains" calculation: SUM(order_amount) entity: customer_id time_grain: ["day", "week", "month", "year"]

Benefit: The same total_revenue metric can be reused across dashboards and reports without redefining SQL logic.

Best Practices for a Unified Approach

✅ Keep Data Models at the Right Granularity: Avoid pre-aggregations in data model layer; let the semantic layer handle them.

✅ Ensure Referential Integrity: Maintain primary and foreign key relationships in data models.

✅ Validate Business Logic in Data Modelling Tool: Build test cases to enforce data consistency.

✅ Centralize Metric Definitions in the Semantic Layer: Avoid hardcoding business logic in dashboards.

Common Pitfalls and How to Avoid Them

Pitfall

Solution

Pre-aggregating Data in Data Model Layer

Keep fact tables granular; let the semantic layer handle aggregation.

Defining Metrics in BI Instead of Semantic Layer

Centralize metric definitions in a tool-agnostic semantic layer.

Inconsistent Naming Between Data Models & Semantic Models

Standardize naming conventions between data model layer and the semantic layer.

Case Study & Example

Scenario: Tracking Monthly Active Users (MAU)

✅ Data Model (fct_user_activity.sql)

SELECT user_id, event_date, event_type FROM raw_events WHERE event_type IN ('login', 'purchase', 'page_view');

✅ Semantic Model (metrics.yaml)

metrics: name: monthly_active_users description: "Unique users who were active in a given month" calculation: count_distinct entity: user_id time_grain: month filter: event_type IN ('login', 'purchase')

✅ Why this works: Data model tool prepares the raw event data, while the semantic layer defines the MAU metric dynamically.

Summary

By shifting towards a metrics-ready data modelling approach, we:

Separate data transformation (data modelling) from metric definition (semantic layer).
Ensure business metric consistency across tools.
Improve query performance by reducing redundant calculations.
Enable flexibility for future analytics and reporting needs.

Action Items for Data Teams:

Refactor data models to remove pre-aggregations.
Adopt a centralized semantic layer for metric definitions.
Standardize naming conventions across data and semantic models.

PreviousOverview NextCube

Last updated 4 months ago