Data protection

The Data Protection feature utilizes Data Encryption to ensure the confidentiality and integrity of data.

This process involves transforming readable data (plaintext) into an unreadable format (ciphertext) using advanced cryptographic algorithms and a secure encryption key. Only authorized parties with the corresponding decryption key can revert the data to its original form.

By encrypting data during storage, transmission, and processing, this feature protects sensitive information from unauthorized access, theft, and tampering.

Some of the use cases include:

  • Analytics querying: unless authorized, a user is not able to query privacy-sensitive data, even though they have all access to the database

  • Data querying: an authorized user can run a query to get the data, which reveals the privacy-sensitive data

  • Access control: the data owner can conveniently grant/revoke access to privacy-sensitive data

In the current version, ReOrc supports data encryption through:

  • Key management: Organization owners or admins can create encryption keys using symmetric cryptographic algorithms, AES-128 or AES-256.

  • Column-level encryption: Users can apply their database or data warehouse's built-in encryption and decryption functions with the created keys.

Key management

To create an encryption key:

  1. Log in to your ReOrc organization.

  2. Navigate to Data Governance > Data Protection in the sidebar.

  3. Click Create key.

  4. Provide the following details:

    • Key name: This name must be unique and will be used in transformation scripts for encryption and decryption.

    • Description: Optional, but recommended.

    • Encryption algorithm: Choose between AES-128 and AES-256.

  5. Click Create.

AES-128 or AES-256

When choosing between AES-128 and AES-256 for database encryption, you should consider the tradeoff between performance and security:

  • AES-128: Faster encryption and decryption (~40% faster than AES-256), suitable for general use cases.

  • AES-256: Provides stronger security and a larger key space, ideal for highly sensitive data, such as financial records or government information.

Encrypt and decrypt data

ReOrc currently supports data encryption in advanced pipeline's operators only.

In SQL

You can use the {{ secret('key_name') }} macro to reference your encryption key in SQL snippets. Use your database's encryption functions in a SQL opearator as follows:

-- Encrypt raw data using the secret key 'key_name'
SELECT AES_ENCRYPT(raw_column, {{ secret('key_name') }}) AS encrypted_data FROM my_table;

-- Decrypt data using the same key
SELECT AES_DECRYPT(encrypted_column, {{ secret('key_name') }}) AS decrypted_data FROM my_table;

In Python

You can use the aes_encrypt and aes_decrypt utility functions. For example, in a Transfer operator, we can encrypt column data before loading it into destination:

class MyTransformer(Transformer):
    def transform_impl(self, row: dict, *args, **kwargs):
        row['phone_number_encrypted'] = aes_encrypt(row['phone_number'], 'key_name')
        return row

transformer = MyTransformer()

Last updated