Data Modelling vs Semantic Layer
The Problem with Traditional Data Modelling
In traditional data modeling, analytics-ready datasets were designed primarily for BI tools, often following Kimball star schema principles. These models focused on pre-aggregating data to optimize dashboard performance but led to challenges like:
Inflexibility: Pre-aggregated datasets restricted metric reusability.
Duplication of Business Logic: Metrics were often defined separately in BI tools, leading to inconsistencies.
Performance Issues: Repeated joins and transformations resulted in high query costs and slow reports.
The Role of the Semantic Layer
The semantic layer allows organizations to define metrics, business rules, and aggregations centrally, decoupling transformation from metric definition. This enables:
A single source of truth for business metrics
Consistent calculations across BI and data applications
Faster insights by reducing redundant SQL logic
Key Differences: Traditional vs. Modern Data Modeling
Schema Design
Pre-aggregated fact tables
Granular fact tables
Metric Definition
Spread across BI tools
Centralized in semantic layer
Transformation
Done in data modelling tools and BI
Mostly in data modelling, semantic handles metrics
Performance
Heavy BI tool queries
Optimized joins in data modelling, metrics in semantic
Designing Data Models to Support a Semantic Layer
To make data modelling tool work efficiently with a semantic layer, we need to adjust the traditional staging → intermediate → marts model by focusing on:
Reducing Intermediate Models: Avoid unnecessary joins and pre-aggregations.
Creating Granular Fact Tables: Keep data at an atomic level for flexible metric computation.
Standardizing Naming Conventions: Ensure data model names align with semantic layer definitions.
Example: Good vs. Bad Fact Table Design
❌ Bad (Pre-Aggregated, BI-Ready)
Issue: This locks the aggregation at a monthly level, making it difficult to compute daily or weekly revenue.
✅ Good (Metrics-Ready, Semantic Compatible)
Why? The semantic layer can now dynamically aggregate revenue at different time grains (daily, weekly, monthly).
Building a Semantic Layer That Works with data modelling tool
The semantic layer should be designed to:
Map directly to data model fact & dimension tables
Define business metrics declaratively (not transform data further)
Ensure reusability across multiple BI tools
Example: Defining Metrics in the Semantic Layer
Data Model Table (fct_orders
)
Semantic Layer (metrics.yaml
)
metrics: name: total_revenue description: "Total revenue, aggregated at different time grains" calculation: SUM(order_amount) entity: customer_id time_grain: ["day", "week", "month", "year"]
Benefit: The same total_revenue metric can be reused across dashboards and reports without redefining SQL logic.
Best Practices for a Unified Approach
✅ Keep Data Models at the Right Granularity: Avoid pre-aggregations in data model layer; let the semantic layer handle them.
✅ Ensure Referential Integrity: Maintain primary and foreign key relationships in data models.
✅ Validate Business Logic in Data Modelling Tool: Build test cases to enforce data consistency.
✅ Centralize Metric Definitions in the Semantic Layer: Avoid hardcoding business logic in dashboards.
Common Pitfalls and How to Avoid Them
Pre-aggregating Data in Data Model Layer
Keep fact tables granular; let the semantic layer handle aggregation.
Defining Metrics in BI Instead of Semantic Layer
Centralize metric definitions in a tool-agnostic semantic layer.
Inconsistent Naming Between Data Models & Semantic Models
Standardize naming conventions between data model layer and the semantic layer.
Case Study & Example
Scenario: Tracking Monthly Active Users (MAU)
✅ Data Model (fct_user_activity.sql
)
✅ Semantic Model (metrics.yaml
)
metrics: name: monthly_active_users description: "Unique users who were active in a given month" calculation: count_distinct entity: user_id time_grain: month filter: event_type IN ('login', 'purchase')
✅ Why this works: Data model tool prepares the raw event data, while the semantic layer defines the MAU metric dynamically.
Summary
By shifting towards a metrics-ready data modelling approach, we:
Separate data transformation (data modelling) from metric definition (semantic layer).
Ensure business metric consistency across tools.
Improve query performance by reducing redundant calculations.
Enable flexibility for future analytics and reporting needs.
Action Items for Data Teams:
Refactor data models to remove pre-aggregations.
Adopt a centralized semantic layer for metric definitions.
Standardize naming conventions across data and semantic models.
Last updated