Skip to content

Data modeling, engineering, and analysis on a group of CSV files about employees at a hypothetical company. Uses SQL to visualize table relationships and query certain information on the combined data.

Notifications You must be signed in to change notification settings

GMyers95/sql_challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

sql_challenge

Overview


This repo is an introduction to SQL and postgreSQL where a database of employees is created from 6 csv files of hypothetical employee information. The structure of the database can be seen below but the referenced tables include an employee information table, a salary table, a department managers, and a department employees table.

A schema sql file for generating the table structure and its corresponding Jupyter Notebook includes SQLAlchemy code for performing queries on the database and showing how the original data can be used in dataframes and matplotlib charts.


Technologies


Setup and Installation

  1. Setup Postgres Database
    • Create database in pgAdmin called sql_challenge
    • Open a Query Tool and paste/open the contents of schemata.sql
    • Run this entire script.
    • Next load the data into the database.
      • First, right click on schema and select Import/Export Data
      • Select Import button
      • Select the file location data/titles.csv
      • Select the 'has headers' button
      • Select the delimiter to be a 'comma'
      • Select Okay. The data should import quickly.
    • Repeat the process above for the following files in this specific order:
      • departments.csv
      • employees.csv
      • salaries.csv
      • dept_emp.csv
      • dept_manager.csv
    • Test that the database loaded correctly by making a couple queries:
      • Select * from salaries;
      • etc....
    • Next, test out specific queries found in queries.sql
  2. Next Setup your Environment with:
    • Python 3.6+
    • pandas
    • numpy
    • sqlalchemy
    • psycopg2
  3. Activate your environment
  4. Clone the repo to your local machine
  5. Start Jupyter Notebook within the environment from the repo folder
  6. Run bonus.ipynb

Data Source

Dataset generated by Trilogy Education Services.


Analysis

  • Analyses can be found in the queries.sql and bonus.ipynb files
  • Demonstrated in the notebook is one method of using SQLAlchemy.

PostgreSQL DB Development and Analsysis

Fig.1 - Screenshot of QuickDatabaseDiagrams.com Schema Creater

Sample Queries from pgAdmin

  1. List the following details of each employee: employee number, last name, first name, gender, and salary.

    Select employees.emp_no, employees.last_name, employees.first_name, employees.gender, salaries.salary
    From employees, salaries Where employees.emp_no = salaries.emp_no

  2. List employees who were hired in 1986.

    Select employees.emp_no, employees.last_name, employees.first_name, employees.hire_date From employees Where Extract(Year FROM hire_date) = 1986

  3. List the manager of each department with the following information: department number, department name, -- the manager's employee number, last name, first name, and start and end employment dates.

    Select departments.dept_no, departments.dept_name, dept_manager.emp_no, employees.last_name, employees.first_name, dept_manager.from_date, dept_manager.to_date
    From departments
    INNER Join dept_manager On departments.dept_no = dept_manager.dept_no
    Inner Join employees On employees.emp_no = dept_manager.emp_no

  4. List the department of each employee with the following information: employee number, last name, -- first name, and department name.

    Select employees.emp_no, employees.last_name, employees.first_name, departments.dept_name
    From employees
    Inner Join dept_emp on dept_emp.emp_no = employees.emp_no
    Inner Join departments on dept_emp.dept_no = departments.dept_no

  5. List all employees whose first name is "Hercules" and last names begin with "B."

    Select employees.first_name, employees.last_name From employees
    Where employees.first_name = 'Hercules' AND
    LEFT(employees.last_name, 1) = 'B'

  6. List all employees in the Sales department, including their employee number, last name, first name, and department name.

    Select employees.emp_no, employees.last_name, employees.first_name, departments.dept_name
    From employees
    Inner Join dept_emp on dept_emp.emp_no = employees.emp_no
    Inner Join departments on dept_emp.dept_no = departments.dept_no
    Where dept_emp.dept_no = 'd007'

  7. List all employees in the Sales and Development departments, including their employee number, last name, first name, and department name.

    Select employees.emp_no, employees.last_name, employees.first_name, departments.dept_name From employees
    Inner Join dept_emp on dept_emp.emp_no = employees.emp_no
    Inner Join departments on dept_emp.dept_no = departments.dept_no
    Where dept_emp.dept_no = 'd007' OR dept_emp.dept_no = 'd005'

  8. In descending order, list the frequency count of employee last names, i.e., how many employees share each last name.

    Select employees.last_name, count(employees.last_name)
    From employees
    Group By employees.last_name
    Order By count(employees.last_name) DESC

About

Data modeling, engineering, and analysis on a group of CSV files about employees at a hypothetical company. Uses SQL to visualize table relationships and query certain information on the combined data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published