Apriori-Implementation-in-Python

This Python program generates frequent item sets and association rules from given datasets using Apriori algorithm.

Different support and confidence values - outputs:

0.02, 0.35 - 26
0.02, 0.42 - 10
0.03, 0.39 - 6
0.04, 0.35 - 5
0.04, 0.42 - 2
0.05, 0.35 - 0

The output is stored in 2 files:

File1:

Format: Freq Itemset >>> count
Global variable f1
Default value of f1 : FItems.txt

File2:

Format: LHS itemset (count) -> RHS itemset (count) [confidence] 
Global variable f2
Default value of f2 : Rules.txt

The outputs are appended to the files
So if you want to run the program multiple times, remember that the data will be written multiple times

Dataset used:

groceries.csv

Rules for using other datasets:

Change the global variable DataFile to the filename

Pre-processing of data

Sorter.py - to sort the transactions data in lexicographical order Stripped off whitespaces and newlines.

And converted the data into a more comfortable format for running the program,
with each line representing a single transaction, with the items being comma separated.
Got each transaction as a list from the csv and sorted each list and wrote the sorted transactions into a new csv.

Formulae used and pseudo code of algorithm:

Apriori:-

Generate frequent 1-itemsets - L1()
Generate Ck from Lk-1 - generateCk()
Generate Lk from Ck - generateLk()
Generate rules from frequent itemsets - rulegenerator()

Each of these are written in detail below.

L1(): Find frequent 1-itemsets

Read data from the csv file and store it into a list.
Sort the data if necessary.
Go through all the elements in each transaction and store their counts in a dictionary.
Threshold them i.e create a new dictionary with old dictionary values that had a support greater than the support threshold.
The final list is made into a set, to avoid repetition.

generateCk(Lk_1, flag, data): Generate Ck by joining 2 Lk-1

Traverse through all the itemsets of Lk_1 and on finding 2 itemsets that are identical,
except for the last element, merge them (i.e their union)in a sorted manner and insert into Ck.
The final list Ck is made into a set, to avoid repetition.

generateLk(Ck, data): Ck -> Ct -> L

If itemset in Ck belongs to a transaction, it makes it into list Ct, and its support is updated by 1,
each time a transaction contains the itemset. Then Ct is thresholded to form L,
using the support calculated during creation of Ct. L is stored in a new dicitonary,
by choosing itemsets above threshold from the old dictionary.

rulegenerator(fitems): Generates association rules from the frequent itemsets

For each itemset in the frequent items list, compute its total support.
Then get a list of all possible combinations of splitting the itemset into LHS and RHS, with min of 1 element.
Calculare support for each of these combinations from the dictionary, 
and if total_support/combination_support is greater than the min confidence value,
it is added as a rule, and written to f2.

A lot of conversion of lists to tuples would be required, since lists cannot be hashed into dictionaries as keys.

And lists should be converted into sets, to avoid repetition, which could affect the count values significantly, otherwise.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Apriori.py		Apriori.py
FItems.txt		FItems.txt
LICENSE		LICENSE
README.md		README.md
Rules.txt		Rules.txt
Sorter.py		Sorter.py
groceries2.csv		groceries2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apriori-Implementation-in-Python

Different support and confidence values - outputs:

File1:

File2:

Dataset used:

Rules for using other datasets:

Pre-processing of data

Sorter.py - to sort the transactions data in lexicographical order Stripped off whitespaces and newlines.

Formulae used and pseudo code of algorithm:

Apriori:-

L1(): Find frequent 1-itemsets

generateCk(Lk_1, flag, data): Generate Ck by joining 2 Lk-1

generateLk(Ck, data): Ck -> Ct -> L

rulegenerator(fitems): Generates association rules from the frequent itemsets

About

Releases

Packages

Languages

License

Niloth-p/Apriori-Implementation-in-Python

Folders and files

Latest commit

History

Repository files navigation

Apriori-Implementation-in-Python

Different support and confidence values - outputs:

File1:

File2:

Dataset used:

Rules for using other datasets:

Pre-processing of data

Sorter.py - to sort the transactions data in lexicographical order Stripped off whitespaces and newlines.

Formulae used and pseudo code of algorithm:

Apriori:-

L1(): Find frequent 1-itemsets

generateCk(Lk_1, flag, data): Generate Ck by joining 2 Lk-1

generateLk(Ck, data): Ck -> Ct -> L

rulegenerator(fitems): Generates association rules from the frequent itemsets

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages