Skip to content

A recommender system deploy in AWS infrastructure for a Big Data enviroment

License

Notifications You must be signed in to change notification settings

bvegaM/recommender-system-big-data-aws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Development of a recommendation system in AWS, for a Big Data environment

python spark jupyter aws ec2 s3

logo

Introduction:

Currently, the large amount of information that many companies receive today generates the need to find solutions that allow them to process all this information and make recommendations that allow end users to have a better experience. Recommendation systems solve the problem of infoxication or information overload. These systems perform various information processing and cleaning techniques to create a recommendation model that allows to recommend, predict tastes or products to users.

Inside the recommendation systems we find different approaches, however, the one that obtains the best results and is the most popular is the collaborative filtering. The objective of this work is to create a recommendation system based on collaborative filtering, adapted to a Big Data environment and using a cloud provider such as AWS. This objective arises from the current need of companies to have a robust infrastructure to process large amounts of information for their recommender systems.

Problem:

The exponential growth of information means that today's recommender systems have to process millions of pieces of data. To perform this processing, high performance computing is needed to process the data and send it to the recommender system to provide results. The problem arises from the need to have a high computational power to process all this information. Currently, many recommender systems work with an infrastructure based on a non-distributed process.

Big Data enviroment

The problem with these non-distributed recommender systems is that they do not make full use of all the resources of the equipment they have. In addition to this, being a non-distributed system, the information processing and recommendation time is quite high because everything is processed through a single task. This problem generates both time and money costs for companies that need to process millions of data.

One solution is to use a Big Data environment, in this case Apache Spark. In this case Apache Spark is a Big Data environment that allows processing a large amount of information in a distributed way, taking advantage of all the resources of the server.

Cloud computing vs On promise Solutions

cloud

Apache Spark needs several considerable computational resources such as memory, cpu, disk ... etc. One solution could be to buy a local server (on promise) that has the required features to run Apache Spark, taking into account the needs you have. However, this solution requires additional costs such as maintenance, support, security ... among others, which generates that this proposal is not so accepted.

Solution:

Infraestructure:

infraestructure

Steps:

  • Step 1: Data Gathering

  • Step 2: Data Wrangling

  • Step 3: Train Model

  • Step 4: Evaluate Model

Results:

About

A recommender system deploy in AWS infrastructure for a Big Data enviroment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published