Data lineage

Data lineage helps track how data flows through the transformation process, from sources to models. It provides visibility into the dependencies between datasets and the transformations applied to them, making it easier to understand the relationships and impacts of changes within a data process.

In ReOrc, data lineage is represented as a directed acyclic graph (DAG). Whenever you edit an asset and make references (through ref() and source() functions), ReOrc automatically tracks its relationships with other assets and reflects them in the DAG view.

View lineage

To view data lineage in ReOrc, open an asset in the editor and toggle on the Lineage view option.

Each asset in the data lineage is represented as a node with links showing the relationships with other assets (nodes). You can click and drag the nodes to reposition them for a clearer view.

The bottom-right toolbar allows you to expand the lineage section, zoom in and out, and focus on the current node of the opened asset.

Centric node

By default, the selected asset/node is designated as the centric node. With large project size and complex transformation strategy, there can be a lot of linkage associated with an asset. By focusing on one node at a time, the lineage view helps you inspect the asset information, and its relationship with the upstream and downstream models, and enhance the overall transformation plan.

You can use the search bar on the top left corner to search and adjust the number of upstream and downstream layers of the current node.

Show all lineages

To view the data lineage of all assets in the project, you can toggle on the Show all lineages option.

PreviousMaterialization NextData tests

Last updated 9 months ago

hashtagView lineage

hashtagNavigation in lineage

hashtagCentric node

hashtagShow all lineages

View lineage

Navigation in lineage

Centric node

Show all lineages