Table of Contents
The project cosist of a web scraping serverless system that allows to get the ranking of the paddle players in the world. The system is composed by web scraping functions pipeline that is triggered by a cron job. The pipeline ends into a NoSQL database and expose data by an API.
Serverless Framework: Framework to implement IaaC with cloud providers.
- Lambda: Serverless logic functions
- SNS: Service notification system to comunicate services
- DynamoDB: NoSQL database
- API Gateway: API Rest management
- CloudWatch event: Schedule the system trigger.
- X-Ray: Service to trace and monitor the system services.
GOOD PRACTICES APPLIED
-
FAN-IN and FAN-OUT patterns apply for lambda parallelization with SNS
-
SNS to apply first SOLID principle (SRP)
-
TLL created to delete database data before new paddle ranking data is web screaped.
-
Secondary Index created in the player table to reduce DynamoDB cost for get services
-
Services tracing
And more...
The API Rest is full docummented and you can use it for your projects Rapid API
This is an example of how you may give instructions on setting up your project and deploy to AWS cloud. This is an example of how to list things you need to use the software and how to install them.
- npm
npm install npm@latest -g
- serverless
npm install serverless@latest -g
-
Clone the repo
git clone https://github.com/JoseRamonMartinez/padel-aws.git
-
Install NPM packages
npm install
-
Create a
config.yml
playersTableName: <CustomName> playersSeeder: [./seeds/<CustomName.json>] scrapingTopicName: <CustomName> postPlayerTopicName: <CustomName> apiUrls: players: <CustomName> accountIdNumber: <AWS_ACCOUNT_ID_NUMBER> stage: <Stage> region: <AWS-REGION>
-
Deploy to AWS
sls deploy --<aws-profile>
Copyright
© 2022, José Ramón Martínez Riveiro.