ReOrc docs
Get ReOrc
English
English
  • About ReOrc
  • Set up and deployment
    • Set up organization
    • Install ReOrc agent
  • Getting started
    • 1. Set up a connection
      • BigQuery setup
    • 2. Create a project
    • 3. Create data models
    • 4. Build models in console
    • 5. Set up a pipeline
  • Connections
    • Destinations
      • Google Service Account
    • Integrations
      • Slack
  • Data modeling
    • Overview
    • Sources
    • Models
      • Model schema
      • Model configurations
    • Jinja templating
      • Variables
      • Macros
    • Materialization
    • Data lineage
    • Data tests
      • Built-in generic tests
      • Custom generic tests
      • Singular tests
  • Semantic modeling
    • Overview
    • Data Modelling vs Semantic Layer
    • Cube
      • Custom Dimension
      • Custom Measure
        • Aggregation Function
        • SQL functions and operators
        • Calculating Period-over-Period Changes
      • Relationship
    • View
      • Primary Dimension
      • Add Shared Fields
    • Shared Fields
    • Integration
      • Guandata Integration
      • Looker Studio
  • Pipeline
    • Overview
    • Modeling pipeline
    • Advanced pipeline
    • Job
  • Health tracking
    • Pipeline health
    • Data quality
  • Data governance
    • Data protection
  • Asset management
    • Console
    • Metadata
    • Version history
    • Packages and dependencies
  • DATA SERVICE
    • Overview
    • Create & edit Data Service
    • Data preview & download
    • Data sharing API
    • Access control
  • AI-powered
    • Rein AI Copilot
  • Settings
    • Organization settings
    • Project settings
    • Profile settings
    • Roles and permissions
  • Platform Specific
    • Doris/SelectDB
Powered by GitBook
On this page
  • Freshness (coming soon)
  • Volume
  • Completeness
  • Validity
  • Uniqueness
  • Consistency
  1. Data modeling
  2. Data tests

Built-in generic tests

PreviousData testsNextCustom generic tests

Last updated 15 days ago

ReOrc provides you with a list of templates for generic tests of common use cases. In this page, you can check out the description of each test and learn to incorporate them into your data workflow.

ReOrc built-in generic tests are built upon the curated tests of the dbt-expectations package. To learn more about the implementation of the tests, visit the .

Built-in test cases involving model columns (column-level tests) will retrieve the columns from the model's schema. Therefore, you must generate the schema before running these test cases. See: Model schema.

Freshness (coming soon)

Freshness

This test checks if the values in a specified time-based column are recent, meaning they fall within a certain interval before the current timestamp.

In Configuration, you need to provide the following:

  • Track data from column: select the target time-based column to check for recency.

  • Date part: the unit of time for the interval.

  • Interval: how far back from the current timestamp the data should be considered recent.

For example, to check if the orders in stg_orders are all created within the last 5 months, you need to track data from the ordered_at column, set month for date part and 5 for the interval.

Additionally, you can set conditions for rows, for example ordered_at is not null.

Freshness (with group by)

This test checks if the groups of values in a specified time-based column are recent, meaning they fall within a certain interval before the current timestamp.

In Configuration, you need to provide the following:

  • Track data from column: select the target time-based column to check for recency.

  • Date part: the unit of time for the interval.

  • Interval: how far back from the current timestamp the data should be considered recent.

  • Group by: which columns to group by. The test will evaluate each unique combination of the specified columns.

For example, to check each customer in the stg_orders model has at least one order in the last 24 hours, you can specify:

  • Track data from column: ordered_at

  • Date part: hour

  • Interval: 24

  • Group by: customer_id

Volume

Table Row Between

This test checks if the number of rows in a model falls within a certain range, ensuring that your analytics and reporting are based on a relevant and sufficient dataset.

In Configuration, you need to provide the following:

  • Min: the minimum value of the range.

  • Max: the maximum value of the range.

  • Strictly: whether the min and max boundaries are inclusive or exclusive.

For example, you can check that the number of orders in the stg_orders are sufficient for analytics by demanding the the row count should be between 50 (min) to 200 (max).

Table Row Equal

This test checks if the number of rows in a model to be equal to a specified value. This is useful when you expect that a specific transformation should produce an exact number of entries.

In Configuration, you need to provide the exact row count value.

Completeness

Not null

This test checks if data of a specific column contain null values. This is crucial for maintaining data integrity, especially when certain columns are expected to always have valid entries.

In Configuration, you need to select the target column to check for null values.

Validity

Numeric value within range

This test checks if each column value falls between a defined range. This is particularly useful for identifying outliers or errors in data, ensuring that values do not exceed expected limits.

In Configuration, you need to specify:

  • Track data from column: select the target column to check for valid value.

  • Min: the minimum value of the range.

  • Max: the maximum value of the range.

  • Strictly: whether the min and max boundaries are inclusive or exclusive.

For example, you can check if the orders in stg_order model have valid payment amount by checking if the order_total column data has values between 50 (min) and 1000 (max).

Maximum

This test checks if the maximum value in a specific column falls within a defined range.

In Configuration, you need to specify the following:

  • Track data from column: select the target column to check for valid value.

  • Min: the minimum value of the range.

  • Max: the maximum value of the range.

  • Strictly: whether the min and max boundaries are inclusive or exclusive.

Minimum

This test verifies that the minimum value in a specified column falls within a defined range.

In Configuration, you need to specify the following:

  • Track data from column: select the target column to check for valid value.

  • Min: the minimum value of the range.

  • Max: the maximum value of the range.

  • Strictly: whether the min and max boundaries are inclusive or exclusive.

Median

This test checks if the median value of a specified column — the middle value when data is sorted — falls within a defined range.

In Configuration, you need to specify the following:

  • Track data from column: select the target column to sort and retrieve the median value.

  • Min: the minimum value of the range.

  • Max: the maximum value of the range.

  • Strictly: whether the min and max boundaries are inclusive or exclusive.

Average

This test checks if the mean (average) value of a specified column falls within a defined range.

In Configuration, you need to specify the following:

  • Track data from column: select the target column to retrieve the average value.

  • Min: the minimum value of the range.

  • Max: the maximum value of the range.

  • Strictly: whether the min and max boundaries are inclusive or exclusive.

For example, you can use average test to check for anomalies in customer spending patterns by setting a defined range for order_total column values (between $50 and $1000).

Sum

This test checks if the total sum of all values in a specified column falls within a defined range.

In Configuration, you need to specify the following:

  • Track data from column: select the target column to calculate the sum.

  • Min: the minimum value of the range.

  • Max: the maximum value of the range.

  • Strictly: whether the min and max boundaries are inclusive or exclusive.

Compare between two columns

This test checks that, for each row, the values in one column (A) are greater than the values in another column (B). This ensures that the relationship between the two columns holds true across the dataset.

In Configuration, you need to specify the following:

  • Target column A: The column with values to be compared.

  • Equal: This allows for equality; the test will pass if column A's values are greater than or equal to column B's values.

  • Target column B: The column with values to be compared against.

For example, by design, the order total (after tax) should always be larger than subtotal.

Text value within enums

This test validates that all distinct values in a specified column are contained within a predefined set of acceptable values. This is crucial for maintaining data integrity and consistency, particularly when dealing with categorical data.

In Configuration, you need to specify the following:

  • Target column: the target column to check for valid values

  • Value list: a list of acceptable values, with each value separated by a new line, that the distinct values in the column must match.

Matches RegEx

This test expects column values to be strings that match a given regular expression, such as validation of e-mail addresses, phone numbers, or URL format specifications.

In Configuration, you need to specify the following:

  • Target column: the target column to verify values with regular expression.

  • Pattern: the regular expression.

  • Is Raw: true if the regex pattern should be treated as a raw string. This allows you to write regex patterns without worrying about escaping backslashes.

  • Flag: a string of one or more characters that are passed to the regex engine as flags (or parameters). Allowed flags are adapter-specific. A common flag is i, for case-insensitive matching. The default is no flags.

For example, in the stg_items model, we can verify whether the values in the sku column match the correct pattern: ^[A-Za-z]+-\d+$ (a sequence of letters, followed by a dash, and ending with digits).

Uniqueness

No duplicates

This test checks that no two rows have the same combination of values in the specified columns, helping to maintain data integrity by preventing duplicate records.

In Configuration, you need to specify the following:

  • Primary keys: the columns to create the combination of values.

  • Set ignore value: determine how combinations with empty values should be handled — either when all values are missing or when any value is missing.

For example, we can check for the duplicate products (using a combination of name, sku, type) within the product model.

Consistency

Data volume within two tables

This test verifies that the number of rows in one table matches the number of rows in another table, which is particularly useful in scenarios where data integrity between related tables is critical.

In Configuration, you need to specify the data model to compare against the current model.

For example, you can use this test to verify that the number of rows in the customers table matches the number of distinct customers in the orders table. This can help ensure that all customers have at least one order recorded.

dbt-expectations repository