This API provides endpoints to search for products, retrieve the top-rated products, and get product reviews from an Amazon database.
http://127.0.0.1:8000
Endpoint: /products
Method: GET
Description: Retrieves a list of products based on filters such as brand, model, price range, and rating. Supports pagination.
Parameter | Type | Description | Example |
---|---|---|---|
brand |
str |
(Optional) Filters products by brand name | Casio |
model |
str |
(Optional) Filters products by model name | G-Shock |
min_price |
float |
(Optional) Filters products with minimum price | 100.0 |
max_price |
float |
(Optional) Filters products with maximum price | 500.0 |
min_rating |
float |
(Optional) Filters products with minimum rating | 4.0 |
page |
int |
(Optional) Page number for pagination. Default is 1. | 1 |
limit |
int |
(Optional) Number of products per page. Default is 10. | 10 |
Returns a list of products matching the criteria.
[
{
"id": 1,
"title": "Casio Men's Watch",
"price": 150.0,
"overall_rating": 4.5,
"total_reviews": 100,
"availability": "In Stock",
"model": "G-Shock",
"material": "Resin",
"item_length": "7 inches",
"length": "7 inches",
"clasp": "Buckle",
"model_number": "GA100-1A1",
"link": "https://www.amazon.com/product/12345"
}
]
GET /products?brand=Casio&min_price=100.0&max_price=300.0&min_rating=4.0&page=1&limit=5
Endpoint: /products/top
Method: GET
Description: Retrieves a list of top-rated products based on reviews and ratings.
Parameter | Type | Description | Example |
---|---|---|---|
limit |
int |
(Optional) Number of top products to retrieve. Default is 10. | 10 |
Returns a list of top products.
[
{
"id": 1,
"title": "Casio Men's Watch",
"price": 150.0,
"overall_rating": 4.5,
"total_reviews": 100,
"availability": "In Stock",
"model": "G-Shock",
"material": "Resin",
"item_length": "7 inches",
"length": "7 inches",
"clasp": "Buckle",
"model_number": "GA100-1A1",
"link": "https://www.amazon.com/product/12345"
}
]
GET /products/top?limit=5
Endpoint: /products/{product_id}/reviews
Method: GET
Description: Retrieves a list of reviews for a specific product.
Parameter | Type | Description | Example |
---|---|---|---|
product_id |
int |
ID of the product to retrieve reviews for | 1 |
Parameter | Type | Description | Example |
---|---|---|---|
page |
int |
(Optional) Page number for pagination. Default is 1. | 1 |
limit |
int |
(Optional) Number of reviews per page. Default is 10. | 10 |
Returns a list of reviews for the specified product.
[
{
"reviewer_name": "John Doe",
"review_text": "Great product, very durable and stylish!",
"review_rating": "5.0",
"review_date": "2023-01-15"
},
{
"reviewer_name": "Jane Smith",
"review_text": "Good value for the price, but the strap is a bit uncomfortable.",
"review_rating": "4.0",
"review_date": "2023-02-10"
}
]
The table amazon_watches
stores product and review information with the following fields:
id
: Product IDtitle
: Product titleprice
: Product priceoverall_rating
: Overall rating (as string, extracted and cast as float)total_reviews
: Total number of reviews (as string, extracted and cast as integer)availability
: Product availability statusmodel
: Product model namematerial
: Product materialitem_length
: Length of the itemlength
: Product lengthclasp
: Type of clasp usedmodel_number
: Model numberlink
: URL link to the product page- Review fields (e.g.,
reviewer_name_1
,review_text_1
,review_rating_1
, etc.)
- Python 3.x
- FastAPI
- Uvicorn
- PostgreSQL
Run the following command to start the API:
uvicorn api_v1:app --reload
The following description provides a rough idea on the step-by-step approach I would take to deploying a FastAPI application and a periodic scraping task on AWS using Elastic Beanstalk, Amazon RDS for PostgreSQL, and AWS Lambda for scheduling.
- Managed Environment: Elastic Beanstalk handles infrastructure management, load balancing, scaling, and monitoring.
- Scalability: Automatically adjusts based on application traffic.
- Integration: Easily integrates with AWS services like RDS, S3, CloudWatch, and IAM.
Organize the project directory as given in this GitHub repo within an "app" folder or similar, and the Dockerfile in the project-root:
project-root/
├── app/
│ ├── api_v1.py # FastAPI app
│ ├── utility_v1.py # necessary functions script
│ ├── amazon_watches_v2.py # perioidic scrapping
│ └── requirements.txt # Dependencies
└── Dockerfile # Docker configuration for FastAPI
Use a Dockerfile
to containerize the FastAPI application:
# Dockerfile
FROM python:3.9
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["uvicorn", "app.api_v1:app", "--host", "0.0.0.0", "--port", "8000"]
List the dependencies in requirements.txt
. I have is mentioned above.
- Navigate to the Elastic Beanstalk service in the AWS Console.
- Create Application and select Web server environment.
- Configure the environment with the following options:
- Platform: Choose "Docker."
- Application Code: Upload the
project-root
folder.
- Under Configuration, adjust settings:
- Capacity: Set minimum and maximum instance count for scaling.
- Load Balancer: Ensure it’s set up for auto-scaling.
- Database: Link to an Amazon RDS PostgreSQL database (created in Step 3).
- Navigate to Amazon RDS in the AWS Console.
- Create a new PostgreSQL instance:
- Select the latest PostgreSQL version.
- Choose instance size according to expected load (I usually use
db.t3.micro
for development).
- Configure security groups to allow the Beanstalk environment to access the RDS instance.
- Note the endpoint, database name, username, and password for database connection in FastAPI.
In api_v1.py
, currently I have the connection code loaded from JSON file. But for AWS, we should add the database connection code using environment variables (.env) for security and load it in "startup" event:
import os
from fastapi import FastAPI
import psycopg2
app = FastAPI()
DATABASE_URL = os.getenv("DATABASE_URL")
@app.on_event("startup")
async def startup():
app.state.db = psycopg2.connect(DATABASE_URL)
-
Navigate to AWS Lambda in the Console.
-
Create a new Lambda function for the scraping task:
- Runtime: Python 3.x
- Permissions: Assign an IAM role allowing S3 access (if you’re storing scraped data in S3).
-
Write the scraping logic from
amazon_watches_v2.py
in the Lambda function and schedule it:- Use Amazon EventBridge to run the function at intervals (Suppose, every 30 minutes).
- Install the Elastic Beanstalk CLI and configure it:
pip install awsebcli eb init -p docker my-fastapi-app
- Create an Elastic Beanstalk environment and deploy:
eb create my-fastapi-env eb deploy
- From the Elastic Beanstalk Console, navigate to the application and click Upload and Deploy.
- Choose the Dockerized application bundle and deploy.
- Set up Amazon Route 53 for custom domain management.
- Use AWS Certificate Manager (ACM) to provision SSL certificates for HTTPS.
- Set up Amazon CloudWatch to monitor metrics like CPU usage, memory, and request latency.
- Enable Auto Scaling within the Elastic Beanstalk environment to automatically adjust the instance count based on demand.
Mashrukh Zayed – Sr Data Scientist at SSL Wireless.