Skip to content

AWS pipeline for web scraping the paddle ranking data. API Rest to consume own services

Notifications You must be signed in to change notification settings

JoseRamonMartinez/padel-aws

Repository files navigation


Logo

PADDLE RANKING

Serverless Web scraping automatic pipeline for paddle ranking.


Table of Contents
  1. About the Project
  2. Built With
  3. API
  4. Getting Started
  5. Author
  6. License

1. About the Project 📢

Diagram icon


The project cosist of a web scraping serverless system that allows to get the ranking of the paddle players in the world. The system is composed by web scraping functions pipeline that is triggered by a cron job. The pipeline ends into a NoSQL database and expose data by an API.

(back to top)

2. Built With 🛠️

Serverless Framework: Framework to implement IaaC with cloud providers.

AWS Services:

  • Lambda: Serverless logic functions
  • SNS: Service notification system to comunicate services
  • DynamoDB: NoSQL database
  • API Gateway: API Rest management
  • CloudWatch event: Schedule the system trigger.
  • X-Ray: Service to trace and monitor the system services.

GOOD PRACTICES APPLIED

  • FAN-IN and FAN-OUT patterns apply for lambda parallelization with SNS

  • SNS to apply first SOLID principle (SRP)

  • TLL created to delete database data before new paddle ranking data is web screaped.

  • Secondary Index created in the player table to reduce DynamoDB cost for get services

  • Services tracing

And more...

(back to top)

3. API 🚀

The API Rest is full docummented and you can use it for your projects Rapid API

4. Getting Started 🛠️

This is an example of how you may give instructions on setting up your project and deploy to AWS cloud. This is an example of how to list things you need to use the software and how to install them.

Pre-requisites

  • npm
    npm install npm@latest -g
  • serverless
    npm install serverless@latest -g

Installation

  1. Clone the repo

    git clone https://github.com/JoseRamonMartinez/padel-aws.git
  2. Install NPM packages

    npm install
  3. Create a config.yml

     playersTableName: <CustomName>
     playersSeeder: [./seeds/<CustomName.json>]
     scrapingTopicName: <CustomName>
     postPlayerTopicName: <CustomName>
     apiUrls:
       players: <CustomName>
     accountIdNumber: <AWS_ACCOUNT_ID_NUMBER>
     stage: <Stage>
     region: <AWS-REGION>
  4. Deploy to AWS

    sls deploy --<aws-profile>

5. Author ✒️

José Ramón Martínez Riveiro  



6. License 📄

Copyright © 2022, José Ramón Martínez Riveiro.

(back to top)

About

AWS pipeline for web scraping the paddle ranking data. API Rest to consume own services

Topics

Resources

Stars

Watchers

Forks

Languages