# 3. Create data models

This section introduces the core concepts of data modeling. You'll learn how to create a simple data transformation use case with ReOrc.

## Assets in ReOrc

Assets are the building blocks of your data workflow in ReOrc. You can think of assets as the essential components you'll create and manage throughout your data transformation process. Each type of asset serves a specific purpose, working together to turn your raw data into valuable insights.

In **Data modeling**, you mainly work with these types of assets:

* **Sources**: references to your raw data, allowing you to connect and document your data origins.
* **Models**: queries that process this data, applying transformations to create structured, analytics-ready datasets.
* **Jinja macros and variables**: Jinja templating artefacts that add programming logic to SQL, allowing you to write more dynamic and maintainable transformations.

## Data modeling process

The data modeling process in ReOrc follows a logical flow that helps you build and validate your transformation workflow.

1. **Define your sources**: Start by defining sources that represent your raw data tables. Sources help you describe and document the origins of your data.
2. **Create data models**: With your sources defined, create data models to transform your raw data into useful analytics datasets. Each model represents a specific transformation step that builds on previous steps.
3. **Configure materialization**: Specify how the results of your models will be materialized in your data warehouse.
4. **Add data tests**: Add data tests to the models to ensure they're working correctly. These tests associated with a model are automatically executed every time the model is successfully built.

Your development cycle should include these steps to ensure that transformations work properly before applying them to production data.&#x20;

## Prerequisites

The following walkthrough uses the `jaffle_shop` dataset (a fictional e-commerce store) provided by the dbt Community. You can follow the guide in this repository to generate the data and load it into your target database: [jaffle-shop-generator](https://github.com/dbt-labs/jaffle-shop-generator).

## Walkthrough

Let's go through the data modeling process of ReOrc with hands-on steps.

To begin, from the **Data development** dashboard, open the project that you've created. By default, ReOrc navigates you to the **Design** section, where all transformation activities happen.

<figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2FpBMarNsJBxTjY6EVnAH5%2FRecurve_project_data_section.png?alt=media&#x26;token=7cc32c3c-6af7-4c36-8cfe-78e3889eb40b" alt=""><figcaption></figcaption></figure>

### Define your sources

Follow these steps:

1. In the **Models** tab, click on the **+** icon and select **Add source**.&#x20;
2. In the opened modal:
   1. Select the connection type.&#x20;
   2. Select the target connection. This is the project connection that you've set up in [2.-create-a-project](https://docs.reorc.com/getting-started/2.-create-a-project "mention")
3. Click **Next**. ReOrc then displays all the tables available from the target connection.

<figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2FYqM2IyzMSSNa7D23digG%2FRecurve_add_source_select_tables.png?alt=media&#x26;token=de8959e9-b741-4ba0-8143-f343e1172e6a" alt=""><figcaption><p>Select tables from a connection</p></figcaption></figure>

4. Select the desired raw tables or models.&#x20;

   Here we select all tables organized in the `jaffle_shop` schema.

<details>

<summary>jaffle_shop tables</summary>

The `jaffle_shop` schema includes these tables:

* Customers (who place Orders)
* Orders (from those Customers)
* Products (the food and beverages the Orders contain)
* Items (of those Products)
* Supplies (needed for making those Products)
* Stores (where the Orders are placed and fulfilled)

</details>

5. Click **Add source**.

The selected tables will then be added to the **Sources** folder and grouped by the schema name.

<figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2F3ZNHgjxhVpArYQmI52tf%2FRecurve_added_sources.png?alt=media&#x26;token=a0de20f3-0518-418b-8dfb-def68ff4735d" alt=""><figcaption></figcaption></figure>

### Create data models

To demonstrate the dynamics and modularity of data models, here we're going to create three models:

{% tabs %}
{% tab title="stg\_customers" %}
This stage model standardizes customer data by selecting relevant fields (`customer_id`, `first_name`, and `last_name`) from the raw `customers` table.&#x20;

This uses the Jinja  [`{{ source() }}`](https://docs.reorc.com/data-modeling/jinja-templating) function to reference the raw tables defined in the previous section.

```sql
select 
    id as customer_id,
    name as customer_name

from {{ source('jaffle_shop', 'raw_customers') }}
```

{% endtab %}

{% tab title="stg\_orders" %}
Similar to `stg_customers`, this stage model standardizes order data by selecting and renaming relevant fields (`order_id`, `customer_id`, `order_date`, `status`) from the raw `orders` table.

```sql
select 
    id as order_id,
    customer as customer_id,
    ordered_at as order_date

from {{ source("jaffle_shop", "raw_orders") }}
```

{% endtab %}

{% tab title="customers" %}
This model consolidates customer data with order history.

Here we use the [`{{ ref() }}`](https://docs.reorc.com/data-modeling/jinja-templating) function to make reference to the two staging models.

{% code overflow="wrap" %}

```sql
-- Reference the staged data (stg_customers and stg_orders) to gather customer information.

with customers as (

    select * from {{ ref('stg_customers') }}

),

orders as (

    select * from {{ ref('stg_orders') }}

),

-- Aggregate the order data to calculate metrics per customer

customer_orders as (

    select
        customer_id,

        min(order_date) as first_order_date,
        max(order_date) as most_recent_order_date,
        count(order_id) as number_of_orders

    from orders

    group by 1

),

-- Join aggregated data with the customer information to produce details about each customer and their order history

final as (

    select
        customers.customer_id,
        customers.customer_name,
        customer_orders.first_order_date,
        customer_orders.most_recent_order_date,
        coalesce(customer_orders.number_of_orders, 0) as number_of_orders

    from customers

    left join customer_orders using (customer_id)

)

select * from final
```

{% endcode %}
{% endtab %}
{% endtabs %}

Follow these steps to create each model:

1. In the **Models** tab, click on the **+** icon and select **New SQL model**.
2. Provide the model name and click **Create**.

   The new model is then placed in the **Models** folder.
3. Open the model in the editor and paste in the query.&#x20;

   <figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2FhbxmHe7sgwMGc4LmA5WD%2FRecurve_query_pasted.png?alt=media&#x26;token=9c820f46-0734-4309-9ca1-17b166fe5812" alt=""><figcaption></figcaption></figure>
4. Click **Save** to confirm the changes.
5. Click **Preview** to view the query output in the **Result** tab.

   <figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2FniawLrtX2kYKydHZnxOv%2FRecurve_preview_results.png?alt=media&#x26;token=c172e310-cf2b-4caa-a339-2f735f8d8816" alt=""><figcaption></figcaption></figure>
6. Perform the steps above to create the other two models.

Now that we've created two staging models that standardize raw data, and one model that aggregates and consolidates the results, we can view them in **Data linage** to better understand the relationship.

Open a model and toggle on the **Lineage view** option. This will display a DAG (directed acyclic graph) showing the relationship of assets, from raw data to the final downstream model.

Data lineage is achieved through the use of `source()` and `ref()` functions, which automatically track dependencies of assets.

<figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2FpsIvZUoVwmAM4Gn3rBJP%2FRecurve_customers_data_lineage.png?alt=media&#x26;token=64944e39-840d-4354-baf6-90bf2da86d99" alt=""><figcaption></figcaption></figure>

### Configure materialization

You can specifically configure how a model is materialized within your warehouse.

By default, all models have the **table** materialization.

Follow these steps:

1. Open a model in the editor.
2. Click on ![](https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2FG92nawkXZhXwxMEsVCya%2FRecurve_info_button.png?alt=media\&token=bf4cf02e-a67b-4898-8975-211410cbfff4).
3. In the **Materialization** field, select a materialization option.

   <figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2FhqsVLRnpiPJWqaORNnQk%2FRecurve_change_model_materialization.png?alt=media&#x26;token=f3532594-063f-4695-b847-218b847fcdf0" alt=""><figcaption></figcaption></figure>

Continuing with our three example models, we can materialize the staging models as views to ensure they reflect the latest source data and minimize storage costs. On the other hand, the consolidated model can be materialized as table as it is the final model and is queried more frequently.

### Add data tests

{% hint style="info" %}
**Coming soon:** Data tests will be available in the next release.
{% endhint %}

Data tests are simply SQL queries that return failing records, based on the condition that you set. These tests validate the correctness of the transformed data and ensure results from the model meet predefined standards.&#x20;

ReOrc provides a list of built-in tests that you can quickly add to your models.

To add a test to a model, follow these steps:

1. Open a model in the editor.

2. Switch to the **Test cases** tab.

   <figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2FG6bb6oAyM73FJWxecoaV%2FRecurve_test_case_tab.png?alt=media&#x26;token=765bb74b-b078-482e-b6ad-bbea78f06dde" alt=""><figcaption></figcaption></figure>

3. Click **+Add new**.

4. Select a template and specify the values.

   For example, with the `stg_orders` model, we can add the Empty Value test to verify that no `null` value exists in the date column.&#x20;

   <figure><img src="https://786945529-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FFTrGKWnjusKCQj11CkuL%2Fuploads%2Ft2m4OIUNM7eYPhisuH2S%2FRecurve_not_null_check.png?alt=media&#x26;token=c4ed2f40-188e-40b6-8f1e-e264eca4b31d" alt=""><figcaption></figcaption></figure>

5. Click **Add**.

The new test is added to the model's test case list and is executed every time the model runs in the console or as part of a pipeline.
