Fraud Detection

Credit Card Fraudster by Richard Patterson | Creative Commons Licensed

Goal

Engineer data model using an Entity Relationship Diagram with Quickdbd. Then, use pgAdmin4 and PostgreSQL to query that database and discern indicators of credit card fraud.

Objective

Analyze databse for fraudulent transactions using several tables tied to gether with foreign keys

Table relationships

'one' card_holder --> 'many' cards
'one' card --> 'many' transactions
'many' transactions --> 'one' merchant
'many' nerchants --> 'one' merchant_category

Screenshot of ERD:

Fraud Analysis

Background

Some fraudsters may hack a credit card by making several small transactions, generally less than 2.00 USD, which are typically ignored by cardholders.

How can we isolate the transactions of each cardholder?

This is my query to retrieve transactions grouped by cardholder:

Count the transactions that are less than 2.00 USD per cardholder.

This is my query to count 'small transactions' per cardholder:

Is there any evidence to suggest that a credit card has been hacked?

One person has 26 transactions under 2.00 USD. This leads all cardholders. The credit card company may wish to contact that individual to see if those specific transactions are legitimate.

What are the top 100 highest transactions made between 7:00 am and 9:00 am?

This is my query to filter and sort these transactions:

Do you see any anomalous transactions that could be fraudulent?

The top 9 highest transactions during this time range from 100 USD to 1,894 USD. However, the rest of the 91 returned values are all well below this, from 11.65 USD to 23.13 USD. The bottom 91 'smaller' are suspicious and potentially fraudulent being so much less than the top 9.

Is there a higher number of fraudulent transactions made during this time frame versus the rest of the day?

During the rest of the day (aka, hours not in between 7 and 9 am), the top 100 transactions range from 22.78 USD to 2,249 USD. Therefore, it does seem that relatively small and potentially fraudulent transactions occur in higher proportion during 7am - 9am timeframe. Potential rationale for this includes that fraudsters assume people are sleeping, just getting up/ready for work, or otherwise preoccupied in early in the mornings and not paying attention to their banking.

What are the top 5 merchants prone to being hacked using small transactions?

Visual Data Analysis

Data Analysis Question 1

The two most important customers of the firm may have been hacked. Verify if there are any fraudulent transactions in their history. For privacy reasons, you only know that their cardholder IDs are 2 and 18.

What difference do you observe between the consumption patterns? Does the difference suggest a fraudulent transaction?

Based on the plots it appears fraudulent transactions may exist for card holder 18. There are at least 10 sharp and random spikes in their spending pattern which breaks the otherwise small amount pattern for both cardholders.

Data Analysis Question 2

The CEO of the biggest customer of the firm suspects that someone has used her corporate credit card without authorization in the first quarter of 2018 to pay quite expensive restaurant bills. Again, for privacy reasons, you know only that the cardholder ID in question is 25.

Are there any outliers for cardholder ID 25? How many outliers are there per month? Do you notice any anomalies?

There exists at least one outlier transaction in almost every month (minus February) for Carholder 25, with as many as 3 in some months like April and June. This suggests some level of potentially fraudulent treansactions occuring almost monthly.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Images		Images
SQL Queries		SQL Queries
ERD.png		ERD.png
README.md		README.md
visual_data_analysis.ipynb		visual_data_analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection

Goal

Objective

Table relationships

Fraud Analysis

Background

How can we isolate the transactions of each cardholder?

Count the transactions that are less than 2.00 USD per cardholder.

Is there any evidence to suggest that a credit card has been hacked?

What are the top 100 highest transactions made between 7:00 am and 9:00 am?

Do you see any anomalous transactions that could be fraudulent?

Is there a higher number of fraudulent transactions made during this time frame versus the rest of the day?

What are the top 5 merchants prone to being hacked using small transactions?

Visual Data Analysis

Data Analysis Question 1

What difference do you observe between the consumption patterns? Does the difference suggest a fraudulent transaction?

Data Analysis Question 2

Are there any outliers for cardholder ID 25? How many outliers are there per month? Do you notice any anomalies?

About

Releases

Packages

Languages

joeyh2/fraud-detection

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection

Goal

Objective

Table relationships

Fraud Analysis

Background

How can we isolate the transactions of each cardholder?

Count the transactions that are less than 2.00 USD per cardholder.

Is there any evidence to suggest that a credit card has been hacked?

What are the top 100 highest transactions made between 7:00 am and 9:00 am?

Do you see any anomalous transactions that could be fraudulent?

Is there a higher number of fraudulent transactions made during this time frame versus the rest of the day?

What are the top 5 merchants prone to being hacked using small transactions?

Visual Data Analysis

Data Analysis Question 1

What difference do you observe between the consumption patterns? Does the difference suggest a fraudulent transaction?

Data Analysis Question 2

Are there any outliers for cardholder ID 25? How many outliers are there per month? Do you notice any anomalies?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages