Skip to content

Redshift OLAP Team

Quintin Donnelly edited this page Mar 7, 2019 · 6 revisions

Biforce Brand

Goals

  • Establish warehousing solution for the data.
  • Migrate existing date into the data warehouse
  • Visualize data in data warehouse as well as predictions based on the data.
  • Provide notification via email based on the status of data.

Implementation

S3

  • Amazon Simple Storage Service (Amazon S3) is storage for the internet. You can use Amazon S3 to store and retrieve any amount of data at any time, from anywhere on the web.

  • To set up access to the S3 bucket:

  • Within S3:

    • Can copy files to and from S3 with $aws s3 cp <current_location> <target_location>
      • S3 locations must be formatted as 's3://bucket-name/path', local locations can be relative paths
    • Can create folders in the bucket implicitly:
      • When copying a file to s3 or moving a file within s3, add an extra step to the target location path to create a folder with that name i.e. s3://bucket-name/new-folder/file
  • In addition to using AWS CLI, the S3 bucket can be accessed via S3 Browser

RedShift

  • Fast, simple, cost-effective data warehouse that can extend queries to your data lake

  • To set up a connection to Redshift:

    • Any database tool compatible with Redshift is sufficient to connect to the database
    • When configuring a connection, select Redshift as the driver and input the endpoint, username, and password
      • Test the connection, if successful, everything should be ready
  • List of Redshift commands: https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html

Metabase

  • Metabase is a free and open-source analytics tool that provides data visualization.

  • Metabase also provides various ways to share information to other users.

  • Metabase is only a tool for connecting to a database and visualizing data. It does not support its own data hosting.

  • Features:

    • Metabase is simple to set up (e.g. running Metabase locally only requires downloading and running a jar file provided on their web page).
    • A query builder that uses drop-down boxes, checkboxes, custom expressions, etc. to create "questions" that Metabase will answer. This feature is specially tailored for non-technical users.
    • For more advanced users, Metabase provides an SQL scripting engine to ask questions that are out of the scope of its query builder.
    • A Slack bot and SMTP server capabilities are provided for automatic alerts (pulses).
    • Metabase is able to connect to most of the major RDBMS's, including AWS Redshift.
  • Metabase was used for the following:

    • Querying the data within Redshift
    • Creating a dashboard to visualize the Spark Team's analysis
    • Configuring an email-based alert system
  • For additional information about Metabase, refer to the markdown file in the Redshift-OLAP team folder called "metabase-detailed-guide."

Clone this wiki locally