# Redshift setup

Recurve's Redshift connector enables you to seamlessly integrate your Amazon Redshift data warehouse as a destination for your analytics database. This guide walks you through establishing a secure connection using Redshift credentials, from initial setup, data preparation, to your first successful connection.

You'll learn how to:

* Set up a sample dataset in your Amazon Redshift cluster
* Establish a connection between Recurve and Redshift

## Prerequisites

* You already belong to a Recurve organization
* You have an AWS account with administrative access
* You have an existing Redshift cluster or permissions to create one

## Prepare sample dataset

To demonstrate the connetion process, we'll be using the `jaffle_shop` dataset — a fictional e-commerce store's data provided by the dbt community. This dataset offers a practical example of typical e-commerce data structures that mirror real-world scenarios.&#x20;

You can use the [jafgen](https://github.com/dbt-labs/jaffle-shop-generator) CLI tool generate synthetic data for any specified year range.

The dataset includes these tables:

* Customers (who place Orders)
* Orders (from those Customers)
* Products (the food and beverages the Orders contain)
* Order Items (of those Products)
* Supplies (needed for making those Products)
* Stores (where the Orders are placed and fulfilled)
* Tweets (Customers sometimes issue Tweets after placing an Order)

Or simply download the generated data below:

{% file src="/files/oYDfqEhOrOPCXZBLY5sJ" %}

## Create a Redshift cluster

Redshift cluster is the managed data warehouse solution of AWS. We'll be setting the cluster and S3 storage using AWS CloudFormation.

Follow these steps:

1. Sign in to your AWS Management Console as a root user or an IAM user depending on your level of access.
2. Navigate to the CloudFormation section to set up Redshift. AWS CloudFormation lets you create AWS resources using code instead of clicking through the AWS console. You can use a template file (in YAML or JSON) that describes your desired infrastructure. You can just click [**Create stack**](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/new?stackName=dbt-workshop\&templateURL=https://tpch-sample-data.s3.amazonaws.com/create-dbtworkshop-infr) and use the [create-dbtworkshop-infr](https://github.com/aws-samples/aws-modernization-with-dbtlabs/blob/main/resources/cloudformation/create-dbtworkshop-infr) template provided by the AWS-dbt community.

   <figure><img src="/files/lDQkiEU3gO5rIqwbJtBv" alt=""><figcaption></figcaption></figure>
3. Click **Next** for the next three steps. You leave the other options by default.
4. Submit the information and you should see the stack page displays CREATE\_IN\_PROGRESS status.
5. Once the stack creation is done (CREATE\_COMPLETE), switch to the **Outputs** tab to view the credentials. These credentials will be used to access Redshift data warehouse.
6. Now that Redshift cluster is set up via CloudFormation, you can navigate to the [Redshift dashboard](https://console.aws.amazon.com/redshiftv2/home#/dashboard) to confirm the change.&#x20;

   &#x20;

   <figure><img src="/files/ZBX95UxL9SqmC1Qk3YRS" alt=""><figcaption></figcaption></figure>
7. Click on the cluster and select query data to access the query editor. Here we choose the **Query editor v2**. You might be asked to configure account. This is where you can use the credentials generated from stack creation:
   * Authentication (select Database username and password)
   * Database
   * User name
   * Password
8. Click **Create connection**.

## Load data

Now that the infrastructure is ready, we can start loading the data.

The CloudFormation stack already creates an S3 bucket, where we can upload the CSV files.

Follow these steps:

1. Navigate to the S3 dashboard and select the `dbt-data-lake-xxxx` bucket.

2. Click **Upload** > **Add files** and select all the CSV files of [the Jaffle Shop dataset](/getting-started/1.-set-up-a-connection/redshift-setup.md#prepare-sample-dataset). Click **Upload**.

3. Once the upload process is done, you can select each object and click **Copy S3 URI**. This URI will be later used to refer to these S3 objects in the query editor.

   <figure><img src="/files/0CtBXr9jRnNQR20mIIC6" alt=""><figcaption></figcaption></figure>

4. Navigate back to the query editor, in a query tab, input and run the following script to create the database and table schemas:

<details>

<summary>Create schemas</summary>

```sql
create schema if not exists jaffle_shop;

-- create customers table
create table jaffle_shop.customers (
    id varchar(36),
    name varchar(100)
);

-- create items TABLE
create table jaffle_shop.items (
    id varchar(36),
    order_id varchar(36),
    sku varchar(10)
);

-- create stores table
create table jaffle_shop.stores (
    id varchar(36),
    name varchar(100),
    opened_at timestamp,
    tax_rate decimal(5,4)
);

-- create products table
create table jaffle_shop.products (
    sku varchar(50),
    name varchar(100),
    type varchar(50),
    price decimal(10,2),
    description text
);

-- create orders table
create table jaffle_shop.orders (
    id varchar(36),
    customer varchar(36),
    ordered_at timestamp,
    store_id varchar(36),
    subtotal decimal(10,2),
    tax_paid decimal(10,2),
    order_total decimal(10,2)
);

-- create supplies table
create table jaffle_shop.supplies (
    id varchar(36),
    name varchar(100),
    cost decimal(10,2),
    perishable boolean,
    sku varchar(50)
);

-- create tweets table
create table jaffle_shop.tweets (
    id varchar(36),
    user_id varchar(36),
    tweeted_at timestamp,
    content varchar(1000)
);
```

</details>

5. Next we need to populate the tables with data from the S3 bucket. Open another query tab, input and run the script below. Replace the S3 URIs and IAM role with your own.

<details>

<summary>Populate data</summary>

```sql
-- Load customers data
copy jaffle_shop.customers(id, name)
from 's3://dbt-data-lake-xxxx/raw_customers.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
ignoreheader 1
acceptinvchars;

-- Load stores data
copy jaffle_shop.stores(id,name,opened_at,tax_rate)
from 's3://dbt-data-lake-xxxx/raw_stores.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
timeformat 'YYYY-MM-DDTHH:MI:SS'
ignoreheader 1
acceptinvchars;

-- Load products data
copy jaffle_shop.products(sku,name,type,price,description)
from 's3://dbt-data-lake-xxxx/raw_products.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
ignoreheader 1
acceptinvchars
CSV QUOTE '"';

-- Load orders data
copy jaffle_shop.orders(id,customer,ordered_at,store_id,subtotal,tax_paid,order_total)
from 's3://dbt-data-lake-xxxx/raw_orders.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
timeformat 'YYYY-MM-DDTHH:MI:SS'
ignoreheader 1
acceptinvchars;

-- Load items data
copy jaffle_shop.items(id,order_id,sku)
from 's3://dbt-data-lake-xxxx/raw_items.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
ignoreheader 1
acceptinvchars;

-- Load supplies data
copy jaffle_shop.supplies(id,name,cost,perishable,sku)
from 's3://dbt-data-lake-xxxx/raw_supplies.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
region 'us-east-1'
delimiter ','
ignoreheader 1
acceptinvchars;

-- Load tweets data
copy jaffle_shop.tweets(id,user_id,tweeted_at,content)
from 's3://dbt-data-lake-xxxx/raw_tweets.csv'
iam_role 'arn:aws:iam::XXXXXXXXXX:role/RoleName'
delimiter ','
timeformat 'YYYY-MM-DDTHH:MI:SS'
ignoreheader 1
acceptinvchars
CSV QUOTE '"';
```

</details>

6. You can verify that the data is loaded successfully by running queries like:

```sql
select * from jaffle_shop.customers;
select * from jaffle_shop.stores;
select * from jaffle_shop.items;
```

## Connect Recurve to Redshift


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.reorc.com/getting-started/1.-set-up-a-connection/redshift-setup.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.