Market Basket Analysis using FP-Growth Algorithm
Market Basket Analysis
· Introduction:
The
Number of Stores like Super Markets, Online Store and other nearby Grocery
Store is increasing Day by day and thus the competition is also increasing
rapidly between different stores. So to attract the customers to there store
they need to understand there purchasing pattern in order to launch some sort
of scheme. The entire process of analyzing shopping trends of the customers is
called Market Basket Analysis. Market Basket Analysis helps in increasing sale
in several ways. It also helps in making right decision in determining the
sales strategy and developing the right target promotion that is knowing the
consumers taste of buying.
Market Basket Analysis helps in
finding association between products. Because of which it makes easy to manage
the product placement i.e two products A and B that are frequently bought
together can be placed near to each other thus it attract the customer to buy B
if he/she purchases A. It is also used in managing pricing of the items. It
also helps to give discount offers on bundling items that are frequently bought
together. Such that Buy A and B both and get 10% off on each.
Market
Basket Analysis is a data mining process that focuses on discovering purchasing
pattern by extracting association rules from the transactional database of
store. Different Data mining techniques helps in analyzing the data.
Association Rule Mining is one of the Data mining technique that helps in finding interesting
association from the dataset. By determining the products that are bought
together helps the retailer to design the Store layout (Product Placement).
Product placement not only reduces customer’s shopping time but also suggest
other relevant items that he/she might be interested in buying. The three
common ways to measure association are, Support, Confidence and Lift. The
generation of frequent itemsets is done using algorithms like Apriori ,
FP-Growth.
· Association Rules:
Association
rules is a technique to identify various relationship between different items.
It is used to find association between combination of items in an itemset. The
three terms that are important in knowing association between items are:
Support:
Support refers to the combination of items bought together frequently.
It is nothing but a ratio of number of transactions in which the itemset of
products suppose, (A,B) to the total number of transactions.
Mathematical
Representation:
Confidence:
Confidence refers to the likelihood that an
item B is purchased if item A is bought. It is a ratio of number of transaction
where A and B both are bought by the number of transaction where A is bought.
Mathematical
Representation:
Lift:
Lift tells how strong our rule is. It also
refers to the increase in sale of B when A is sold. For itemset (A,B) it is a
ratio of Confidence of (A,B) to the Support of (B).
Mathematical
Representation:
If
the lift for (A,B) is 2 than we can say that chances of buying A and B together
is 2 times more than the chances of buying just B.
Lift
= 1, means there is no association between Product A and B.
Lift
> 1, means products are more likely to be bought together.
Lift
< 1, means products are not likely to be bought together.
These association rule can help retailers to develop marketing strategies in better way. Cross selling is one of the strategy and it concerns selling of those items which are interrelated to each other and can be integrated with the item which is being sold.
Also Association and Recommendation both are different. Association can be called as “Frequently bought together” and Recommendation can be thought as “Customers who bought/viewed Item A also bought Item B”. One of the finest example of Association and Recommendation is of amazon’s website/app. Whenever we search for any product it gives following recommendation based on the searched product.
· Algorithm => FP-Growth:
This algorithm scans the database only twice .It uses a Tree structure (FP-tree) to store all the information.The order is given by the alphabetical order. This algorithm uses a recursive divide-and-conquer approach to mine the frequent itemsets.
How to build a FP-Tree ?
§ The root represents null .
§ Each node represents an item , while the association of the nodes is the itemsets with the order maintained while forming the tree.
Example :
Let us consider a
dataset as given below with having different transactions .
Transaction ID |
Items Purchased |
1 |
FBAED |
2 |
BCE |
3 |
ABDE |
4 |
ABCE |
5 |
ABCDE |
6 |
BCD |
Here Total 6 transactions are there
and total 6 items (A,B,C,D,E,F) are there. Lets take min support as 3.
Now To
build the FP-Tree, frequent items support are first calculated and
sorted in decreasing order resulting in the following list :
Item |
Support |
B |
6 |
E |
4 |
A |
4 |
C |
4 |
D |
4 |
F |
1 |
Here item – F will not be considered in building FP – Tree because it
does not satisfy the min support value.
New
Transaction table :
This table is made
according to most purchase of items . For Constructing FP Tree this
new table is needed.
Transaction ID |
Items Purchased |
1 |
BEAD |
2 |
BEC |
3 |
BEAD |
4 |
BEAC |
5 |
BEACD |
6 |
BCD |
Constructing
FP Tree :
1.
For
1st transaction BEAD : (This
{} represents NULL )
2.
After 2nd transaction BEC
Now , Doing the same thing for all transactions .
At last we got the final FP Tree as shown below :
Now,
Conditional FP-Tree:
Items |
Conditional Pattern Base |
Conditional FP-Tree |
D |
{(BEA:2),(BEAC:1),(BC:1)} |
{(BEA:3)} |
C |
{(BEA:2),(BE:1),(B:1)} |
{(BE:3)} |
A |
{(BE:4)} |
{(BE:4)} |
E |
{(BE:5)} |
{(B:5)} |
B |
- |
- |
Frequent Pattern Generated :
D : DAE(3) , DAEB(3) , DAB(3) , DEB(3)
C : CE(3) , CEB(3) , CB(3)
A : AE(4) , AEB(4) , AB(4)
E : EB(4)
·
Implementation:
Here in our project we have considered a dataset having 9835 transactions.
(Please open images to see clearly)
Loaded dataset:
Then we are doing one hot encoding which means that the items that are purchased in particular transaction will have its entry as 1 and if not purchased will have its entry as 0. The column names will be the product name and the rows are the transactions.
Now printing Top
15 most frequent itemsets in ascending
order as per support.
Also representing
the above table in graphical format.
Getting the
information regarding the top 40 first buy item from the dataset and presenting
in a graphical format.
Now through
association rules we can get the information regarding the support, confidence
and lift of Frequent itemset.
Below given table
shows the top 10 itemset that have highest confidence and support value greater
than or equal to 0.005.
· Conclusion:
FP-Growth Algorithm gives result which helps a lot in
understanding the buying pattern of the customer.
Online
store leaders like Flipkart, Amazon uses this technique to suggests items in
customers basket/ Shopping cart. The General Store Owners can also make use of
such technique to manage Product Placement, Promotional Offers, etc and can
increase there sale which leads to increase in there profit. From the above
given graph of First Buy time, the store owner can use that to attract more and
more people by placing that products in front or at entrance. It not only helps
in increasing its sale but also saves customers time.
Thus
we can conclude that Market Basket Analysis plays an important role in retail
business.
Comments
Post a Comment