ICTA 2024: An Automatic Machine Learning based Customer Segmentation Model with RFM Analysis

Thai Hoc Nguyen*, Xuan Thi Tran* (*equal contribution)

Introduction

The focus of many companies is to provide the best products and services to attract attention in the market.Each customer has different preferences due to variations in age, gender, and other personal factors. Purchasing behavior is a significant indicator that helps determine customer's preferences. To achieve this, they must find the way to classify customers with similarities into segments. Customer segmentation based on their direct or indirect interaction behavior with the company can be challenging due to the difficulty in selecting key features that highlight the interactions.

RFM model that refers to the three key features of Recency, Frequency, and Monetary value has been considered as an effective technique to expose valuable insights of customers' behaviors. Some studies have addressed that applying the K-means algorithm combined with the RFM model can be a promisin solution for customer segmentation.

With the continuous growth of generated data, it is crucial to deploy a machine learning based segmenting model in a Big data system. Hadoop and Spark are among best Big data storage and processing technologies. In this study, we propose an automatic, engaged machine learning based customer segmentation solution developed by Spark application framework while costumer data are stored in the HDFS storage.

Environment Setup

Install Hadoop and Spark

First, you need to install Hadoop and Spark tools. Follow the installation instructions below:

Create environment

Create virtual environments to ensure that libraries between applications do not conflict.You can create virtual environments anywhere you want. Using python for Window or python3 for Linux.

$ python3 -m venv demo-project
$ cd demo-project
$ source bin/activate

Download Source Code

Download repo from github to local using command:

$ git clone https://github.com/nthaihoc/segmentation-customer-hadoop-spark-mlops-icta-2024.git

Install Library Dependencies

You need to install the necessary libraries to manage and run the application. Using pip for Window or pip3 for Linux.

$ cd segmentation-customer-hadoop-spark-mlops-icta-2024
$ pip3 install -r requirements.txt

Folder Structure

There are some important files as artifacts, src and dvc.yaml.

artifact include model and results file
src include source code of application
dvc.yaml is a configuration file, supporting automatic command line execution, for building and managing pipelines

See more infomation about dvc

PipeLine Start

After successfully installing all the above steps, run the following command to start testing the application.

$ dvc repro

Contributing

For any feedback or comments, please feel free to contact me through the following information:

| email_01 | email_02 |

Citation

@INPROCEEDINGS{XuanThiTran2018,
  author={Xuan Thi Tran, Thai Hoc Nguyen},
  booktitle={The 3rd International Conference on Advances in Information and Communication Technology (ICTA2024)}, 
  title={An Automatic Machine Learning based Customer Segmentation Model with RFM Analysis}, 
  year={2024},
  volume={},
  number={},
  pages={},
  keywords={Machine Learning, RFM model, K-means Clustering},
  doi={}}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.dvc		.dvc
artifacts		artifacts
src/icta		src/icta
.dvcignore		.dvcignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dvc.yaml		dvc.yaml
folder_structure.png		folder_structure.png
icta.png		icta.png
main.py		main.py
requirement.txt		requirement.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ICTA 2024: An Automatic Machine Learning based Customer Segmentation Model with RFM Analysis

Introduction

Environment Setup

Install Hadoop and Spark

Create environment

Download Source Code

Install Library Dependencies

Folder Structure

PipeLine Start

Contributing

Citation

About

Releases

Packages

Languages

License

nthaihoc/segmentation-customer-hadoop-spark-mlops-icta-2024

Folders and files

Latest commit

History

Repository files navigation

ICTA 2024: An Automatic Machine Learning based Customer Segmentation Model with RFM Analysis

Introduction

Environment Setup

Install Hadoop and Spark

Create environment

Download Source Code

Install Library Dependencies

Folder Structure

PipeLine Start

Contributing

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages