ReOrc docs
Get ReOrc
English
English
  • About ReOrc
  • Set up and deployment
    • Set up organization
    • Install ReOrc agent
  • Getting started
    • 1. Set up a connection
      • BigQuery setup
    • 2. Create a project
    • 3. Create data models
    • 4. Build models in console
    • 5. Set up a pipeline
  • Connections
    • Destinations
      • Google Service Account
    • Integrations
      • Slack
  • Data modeling
    • Overview
    • Sources
    • Models
      • Model schema
      • Model configurations
    • Jinja templating
      • Variables
      • Macros
    • Materialization
    • Data lineage
    • Data tests
      • Built-in generic tests
      • Custom generic tests
      • Singular tests
  • Semantic modeling
    • Overview
    • Data Modelling vs Semantic Layer
    • Cube
      • Custom Dimension
      • Custom Measure
        • Aggregation Function
        • SQL functions and operators
        • Calculating Period-over-Period Changes
      • Relationship
    • View
      • Primary Dimension
      • Add Shared Fields
    • Shared Fields
    • Integration
      • Guandata Integration
      • Looker Studio
  • Pipeline
    • Overview
    • Modeling pipeline
    • Advanced pipeline
    • Job
  • Health tracking
    • Pipeline health
    • Data quality
  • Data governance
    • Data protection
  • Asset management
    • Console
    • Metadata
    • Version history
    • Packages and dependencies
  • DATA SERVICE
    • Overview
    • Create & edit Data Service
    • Data preview & download
    • Data sharing API
    • Access control
  • AI-powered
    • Rein AI Copilot
  • Settings
    • Organization settings
    • Project settings
    • Profile settings
    • Roles and permissions
  • Platform Specific
    • Doris/SelectDB
Powered by GitBook
On this page
  • Create an advanced pipeline
  • Operators
  • SQL Operator
  • Transfer Operator
  • Python Operator
  1. Pipeline

Advanced pipeline

PreviousModeling pipelineNextJob

Last updated 15 days ago

An advanced pipeline is a special type of pipeline that lets you combine multiple operators and modeling pipelines to create complex data transformation workflows.

Operators are the fundamental units used by the ReOrc orchestrator to perform tasks. In a modeling pipeline, each model is an operator that transforms data by running SQL queries. In an advanced pipeline, however, you can include other types of operators capable of handling broader tasks, such as:

  • Ingesting data from a source to a target data warehouse.

  • Running custom SQL queries for ad-hoc transformations.

  • Sending notifications at various stages of job execution.

Create an advanced pipeline

Follow these steps to create an advanced pipeline:

  1. Navigate to Data Design > Pipelines in your ReOrc project.

  2. Click the + icon and select Create advanced pipeline.

  3. Enter a name for your pipeline and click Confirm.

    An advanced pipeline is created. You’ll now define its nodes and their relationships within the DAG (directed acyclic graph).

  4. Click + Add node. Choose an operator or an existing modeling pipeline.

    Each operator has different configuration requirements. See [documentation link] for details.

  5. Define the relationships (execution order) between nodes by clicking and dragging arrows between them.

Operators

SQL Operator

SQL Operator executes custom queries directly against your data sources. While SQL models focus on transformation logic and dependencies, SQL Operator is ideal for cleaning and preparing data before it enters your modeling pipeline.

Configuration:

  • Data Source: the target data source. These are the destinations that you've configured in your ReOrc organization.

  • Database: the specifc database that you want to run the query.

  • SQL Query: the SQL query script, where can you write multiple query statements.

For example, you can use it to clean source tables before transforming them in downstream SQL models.

Transfer Operator

The Transfer Operator allows you to move data from one location to another. For example, you can use it to transfer data from BigQuery to Postgres or from Amazon S3 to Google Cloud Storage.

The operator consists of two tasks: Dump and Load. The Dump task fetches data from a source, while the Load task transfers it to a destination. The configuration, authentication, and parameters vary depending on the source and destination.

For example, we use Transfer operator to move data from Google Sheet into our Postgres database, then processs the data in a modeling pipeline:

Python Operator

The Python Operator allows you to integrate Python code into data pipelines. This operator is particularly useful in scenarios such as:

  • Data Processing: Performing transformations or calculations on datasets.

  • API Interactions: Making API calls to external services or databases.

  • Custom Logic Implementation: Executing business logic that may not fit into standard operators like Bash or SQL operators.