Skip to content

A Scalable aws streaming Application for real time reporting. To do:

Notifications You must be signed in to change notification settings

niyotham/vehicle_data_streaming_pipelie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Vehicle data streaming data engineering application.

A Scalable aws streaming Application for real time reporting of the movement of the cars. THis application gather latitudes and longituttes of the moving cars, and the persist the data into AWS S3 and Redishift for data analyst to monitor cars movements and other partens.

System Architecture

The Steps to folow:

  • Find detailed document of steps here
  • Write a script to generale real time vehicle data
  • Write a script or create a lambda function to load data into s3
  • Sending data to redshift

--- external schema for kinesis ---

 CREATE EXTERNAL SCHEMA streamdataschema
FROM KINESIS
IAM_ROLE 'arn:aws:iam::533267024701:role/redshiftkinesisrole';

---- create materialized view ----

CREATE MATERIALIZED VIEW devicedataview AS
    SELECT approximate_arrival_timestamp,
    partition_key,
    shard_id,
    sequence_number,
    json_parse(from_varbyte(kinesis_data, 'utf-8')) as payload    
    FROM streamdataschema."d2d-app-kinesis-stream";

---- refresh view ----

REFRESH MATERIALIZED VIEW <VIEW_NAME>;

--- select data from view ----

select * from <VIEW_NAME>
  1. Setting up AWS CodeCommit IAM User with HTTPs Git Credential for AWS CodeCommit.
  2. Create CodeCommit Repo {not ecr repo!!!!!!}
  3. Copy GitHub Repo Data to AWS CodeCommit
  1. Prepare ECR for CodeBuild.
  2. Sett up CodeBuild
  3. Setup IAM roles and permissions To allow CodeBuild to push Docker images to ECR Note docker image can be pushed to dockerhub instead
  1. Make Available ECS service infrastructure
  2. Create Deploy Stage
  3. Run the Pipeline by making changes in the local repo and pushing to CodeCommit

Additional dependencies for python_dependancies_cloud9 if you want to use psycopg2

  1. sudo amazon-linux-extras install python3.8

  2. curl -O https://bootstrap.pypa.io/get-pip.py

  3. python3.8 get-pip.py --user

  4. sudo python3.8 -m pip install psycopg2-binary -t python/

  5. zip -r dependancies.zip python

Final results

  • Build The logs while building the images using ecs cluster and service logs while building the images
  • Data Storage results.

A view of aws S3 bucket data

A view of data loaded into Redishift

  1. Before sending the data to Redishift.
  • Creating an extenal schema Creating an extenal view
  • check the materview checkmaterview
  • create materializedview create materializedview
  1. After sending the data View the data into the redishift create materializedview
  2. Deployment

Future Work

  • read data into a csv
  • do visualization
  • Finish the codepipeline part

Contributor

👤 Niyomukiza Thamar

Acknowledgements

Show me your support

Give a ⭐ if you like this project!

Releases

No releases published

Packages

No packages published