Skip to content

prateekmaj21/Bank-Transaction-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Bank-Transaction-Analysis

Analyzing bank customer transactions to derive insights about Customers.

Data:

https://www.kaggle.com/datasets/shivamb/bank-customer-segmentation

This dataset consists of 1 Million+ transaction by over 800K customers for a bank in India. The data contains information such as - customer age (DOB), location, gender, account balance at the time of the transaction, transaction details, transaction amount, etc.

Notebook Run on:

Databricks Community Edition Python Notebook

Project Overview:

  • In this project, we conducted an data analysis of customer transaction data using PySpark, derived business insights with complex operations including filtering, projection, group by, joins, partition over ranking functions, moving averages,etc. We used Matplotlib and Seaborn libraries to provide data vizualizations and comprehensive view of the findings.

  • Data Cleaning and Preprocessing: The importance of thorough data cleaning and preprocessing was evident. Handling missing values, formatting dates, and ensuring data consistency were crucial steps to ensure accurate analysis.

  • Performance Optimization: Efficient use of PySpark operations significantly improved the performance of data processing tasks. Techniques like partitioning and using appropriate aggregations were key to managing large datasets.

  • Complex Operations in PySpark: Understanding and implementing complex operations such as window functions, moving averages, and ranking functions provided deeper insights into customer data.

  • Visualization Techniques: The ability to visualize data effectively using Matplotlib and Seaborn helped in better interpretation of the results and facilitated clearer communication of findings.

  • Business Insights and Decision Making: Deriving actionable business insights from data analysis is critical.

Conclusion:

By leveraging PySpark for data processing and Python for visualization, we were able to derive meaningful insights that help for decision-making. This analysis not only highlighted the current state of customer transactions but also provided a foundation for future data-driven strategies.

About

Analyzing bank customer transactions to derive insights.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published