# Data Modelling vs Semantic Layer

## The Problem with Traditional Data Modelling

In traditional data modeling, analytics-ready datasets were designed primarily for BI tools, often following **Kimball star schema** principles. These models focused on pre-aggregating data to optimize dashboard performance but led to challenges like:

* **Inflexibility**: Pre-aggregated datasets restricted metric reusability.
* **Duplication of Business Logic**: Metrics were often defined separately in BI tools, leading to inconsistencies.
* **Performance Issues**: Repeated joins and transformations resulted in high query costs and slow reports.

## The Role of the Semantic Layer

The **semantic layer** allows organizations to define **metrics, business rules, and aggregations centrally**, decoupling transformation from metric definition. This enables:

* **A single source of truth for business metrics**
* **Consistent calculations across BI and data applications**
* **Faster insights by reducing redundant SQL logic**

## Key Differences: Traditional vs. Modern Data Modeling

| Aspect            | Traditional (BI-Ready)              | Modern (Metrics-Ready)                                 |
| ----------------- | ----------------------------------- | ------------------------------------------------------ |
| Schema Design     | Pre-aggregated fact tables          | Granular fact tables                                   |
| Metric Definition | Spread across BI tools              | Centralized in semantic layer                          |
| Transformation    | Done in data modelling tools and BI | Mostly in data modelling, semantic handles metrics     |
| Performance       | Heavy BI tool queries               | Optimized joins in data modelling, metrics in semantic |

## Designing Data Models to Support a Semantic Layer

To make data modelling tool work efficiently with a semantic layer, we need to adjust the traditional **staging → intermediate → marts** model by focusing on:

1. **Reducing Intermediate Models**: Avoid unnecessary joins and pre-aggregations.
2. **Creating Granular Fact Tables**: Keep data at an atomic level for flexible metric computation.
3. **Standardizing Naming Conventions**: Ensure data model names align with semantic layer definitions.

#### Example: Good vs. Bad Fact Table Design

**❌ Bad (Pre-Aggregated, BI-Ready)**

```sql
SELECT customer_id, DATE_TRUNC('month', order_date) AS month, SUM(order_amount) AS monthly_revenue
FROM stg_orders
GROUP BY customer_id, month;
```

**Issue:** This locks the aggregation at a **monthly level**, making it difficult to compute daily or weekly revenue.

**✅ Good (Metrics-Ready, Semantic Compatible)**

```sql
SELECT order_id, customer_id, order_date, order_amount
FROM stg_orders;
```

**Why?** The semantic layer can now dynamically aggregate revenue at different time grains (daily, weekly, monthly).

## Building a Semantic Layer That Works with data modelling tool

The semantic layer should be designed to:

* **Map directly to data model fact & dimension tables**
* **Define business metrics declaratively (not transform data further)**
* **Ensure reusability across multiple BI tools**

#### Example: Defining Metrics in the Semantic Layer

**Data Model Table (`fct_orders`)**

```sql
SELECT order_id, customer_id, order_date, order_amount FROM stg_orders;
```

**Semantic Layer (`metrics.yaml`)**

* metrics: name: total\_revenue description: "Total revenue, aggregated at different time grains" calculation: SUM(order\_amount) entity: customer\_id time\_grain: \["day", "week", "month", "year"]

**Benefit:** The same **total\_revenue** metric can be reused across dashboards and reports without redefining SQL logic.

## Best Practices for a Unified Approach

✅ **Keep Data Models at the Right Granularity**: Avoid pre-aggregations in data model layer; let the semantic layer handle them.

✅ **Ensure Referential Integrity**: Maintain primary and foreign key relationships in data models.

✅ **Validate Business Logic in Data Modelling Tool**: Build test cases to enforce data consistency.

✅ **Centralize Metric Definitions in the Semantic Layer**: Avoid hardcoding business logic in dashboards.

## Common Pitfalls and How to Avoid Them

| Pitfall                                                   | Solution                                                                        |
| --------------------------------------------------------- | ------------------------------------------------------------------------------- |
| Pre-aggregating Data in Data Model Layer                  | Keep fact tables granular; let the semantic layer handle aggregation.           |
| Defining Metrics in BI Instead of Semantic Layer          | Centralize metric definitions in a tool-agnostic semantic layer.                |
| Inconsistent Naming Between Data Models & Semantic Models | Standardize naming conventions between data model layer and the semantic layer. |

## Case Study & Example

#### Scenario: Tracking Monthly Active Users (MAU)

**✅ Data Model (`fct_user_activity.sql`)**

```sql
SELECT user_id, event_date, event_type FROM raw_events WHERE event_type IN ('login', 'purchase', 'page_view');
```

**✅ Semantic Model (`metrics.yaml`)**

* metrics: name: monthly\_active\_users description: "Unique users who were active in a given month" calculation: count\_distinct entity: user\_id time\_grain: month filter: event\_type IN ('login', 'purchase')

✅ **Why this works:** Data model tool prepares the raw event data, while the semantic layer defines the **MAU metric dynamically**.

## Summary

By shifting towards a **metrics-ready data modelling approach**, we:

* Separate **data transformation (data modelling)** from **metric definition (semantic layer)**.
* Ensure **business metric consistency** across tools.
* Improve **query performance** by reducing redundant calculations.
* Enable **flexibility** for future analytics and reporting needs.

#### Action Items for Data Teams:

* **Refactor data models** to remove pre-aggregations.
* **Adopt a centralized semantic layer** for metric definitions.
* **Standardize naming conventions** across data and semantic models.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.reorc.com/semantic-modeling/data-modelling-vs-semantic-layer.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
