Redshift setup

Recurve's Redshift connector enables you to seamlessly integrate your Amazon Redshift data warehouse as a destination for your analytics database. This guide walks you through establishing a secure connection using Redshift credentials, from initial setup, data preparation, to your first successful connection.

You'll learn how to:

  • Set up a sample dataset in your Amazon Redshift cluster

  • Establish a connection between Recurve and Redshift

Prerequisites

  • You already belong to a Recurve organization

  • You have an AWS account with administrative access

  • You have an existing Redshift cluster or permissions to create one

Prepare sample dataset

To demonstrate the connetion process, we'll be using the jaffle_shop dataset — a fictional e-commerce store's data provided by the dbt community. This dataset offers a practical example of typical e-commerce data structures that mirror real-world scenarios.

You can use the jafgenarrow-up-right CLI tool generate synthetic data for any specified year range.

The dataset includes these tables:

  • Customers (who place Orders)

  • Orders (from those Customers)

  • Products (the food and beverages the Orders contain)

  • Order Items (of those Products)

  • Supplies (needed for making those Products)

  • Stores (where the Orders are placed and fulfilled)

  • Tweets (Customers sometimes issue Tweets after placing an Order)

Or simply download the generated data below:

file-archive
79MB

Create a Redshift cluster

Redshift cluster is the managed data warehouse solution of AWS. We'll be setting the cluster and S3 storage using AWS CloudFormation.

Follow these steps:

  1. Sign in to your AWS Management Console as a root user or an IAM user depending on your level of access.

  2. Navigate to the CloudFormation section to set up Redshift. AWS CloudFormation lets you create AWS resources using code instead of clicking through the AWS console. You can use a template file (in YAML or JSON) that describes your desired infrastructure. You can just click Create stackarrow-up-right and use the create-dbtworkshop-infrarrow-up-right template provided by the AWS-dbt community.

  3. Click Next for the next three steps. You leave the other options by default.

  4. Submit the information and you should see the stack page displays CREATE_IN_PROGRESS status.

  5. Once the stack creation is done (CREATE_COMPLETE), switch to the Outputs tab to view the credentials. These credentials will be used to access Redshift data warehouse.

  6. Now that Redshift cluster is set up via CloudFormation, you can navigate to the Redshift dashboardarrow-up-right to confirm the change.

  7. Click on the cluster and select query data to access the query editor. Here we choose the Query editor v2. You might be asked to configure account. This is where you can use the credentials generated from stack creation:

    • Authentication (select Database username and password)

    • Database

    • User name

    • Password

  8. Click Create connection.

Load data

Now that the infrastructure is ready, we can start loading the data.

The CloudFormation stack already creates an S3 bucket, where we can upload the CSV files.

Follow these steps:

  1. Navigate to the S3 dashboard and select the dbt-data-lake-xxxx bucket.

  2. Click Upload > Add files and select all the CSV files of the Jaffle Shop dataset. Click Upload.

  3. Once the upload process is done, you can select each object and click Copy S3 URI. This URI will be later used to refer to these S3 objects in the query editor.

  4. Navigate back to the query editor, in a query tab, input and run the following script to create the database and table schemas:

chevron-rightCreate schemashashtag

  1. Next we need to populate the tables with data from the S3 bucket. Open another query tab, input and run the script below. Replace the S3 URIs and IAM role with your own.

chevron-rightPopulate datahashtag
  1. You can verify that the data is loaded successfully by running queries like:

Connect Recurve to Redshift

Last updated