Overview

The Pipeline module in ReOrc enables you to orchestrate data workflows efficiently. Rather than manually running models to retrieve results, you can define pipelines that automate the entire data transformation process. Pipelines represent your data workflows, from which you can schedule periodic runs, manage dependencies between models, and monitor workflow status in real time.

In ReOrc, pipeline is a another type of asset that can contain multiple models. When you create a pipeline, ReOrc automatically queries the relationship between models - derived from data lineage - and visualizes them in a directed acyclic graph (DAG).

The Pipeline module is designed to provide:

A high-level view: Pipelines provide an accessible, high-level visualization of the data workflow, allowing data practitioners and stakeholders to easily understand the workflow's objectives and outcomes.
End-to-end validation: Pipelines automatically trigger data tests for models and assets during each run as a form of regression analysis. This helps you identify any issues introduced when chaining together several transformations.
Simplified troubleshooting: Results and logs are instantly displayed in the Pipeline Health dashboard, where you can inspect details of each task to tfarace errors and resolve issues efficiently.

Key features

The module currently supports the following features:

Modeling pipeline: Create pipelines from data models.
Job: Configure job and schedule run from pipelines.

PreviousLooker Studio NextModeling pipeline

Last updated 1 month ago