Market Basket Analysis using FP-Growth Algorithm

               Market Basket Analysis




·      Introduction:

 

The Number of Stores like Super Markets, Online Store and other nearby Grocery Store is increasing Day by day and thus the competition is also increasing rapidly between different stores. So to attract the customers to there store they need to understand there purchasing pattern in order to launch some sort of scheme. The entire process of analyzing shopping trends of the customers is called Market Basket Analysis. Market Basket Analysis helps in increasing sale in several ways. It also helps in making right decision in determining the sales strategy and developing the right target promotion that is knowing the consumers taste of buying.  

                             

          Market Basket Analysis helps in finding association between products. Because of which it makes easy to manage the product placement i.e two products A and B that are frequently bought together can be placed near to each other thus it attract the customer to buy B if he/she purchases A. It is also used in managing pricing of the items. It also helps to give discount offers on bundling items that are frequently bought together. Such that Buy A and B both and get 10% off on each.

 

Market Basket Analysis is a data mining process that focuses on discovering purchasing pattern by extracting association rules from the transactional database of store. Different Data mining techniques helps in analyzing the data. Association Rule Mining is one of the Data mining  technique that helps in finding interesting association from the dataset. By determining the products that are bought together helps the retailer to design the Store layout (Product Placement). Product placement not only reduces customer’s shopping time but also suggest other relevant items that he/she might be interested in buying. The three common ways to measure association are, Support, Confidence and Lift. The generation of frequent itemsets is done using algorithms like Apriori , FP-Growth.

 

     ·      Association Rules:

 

Association rules is a technique to identify various relationship between different items. It is used to find association between combination of items in an itemset. The three terms that are important in knowing association between items are:

 

Support:

 Support refers to the combination of items bought together frequently. It is nothing but a ratio of number of transactions in which the itemset of products suppose, (A,B) to the total number of transactions.

Mathematical Representation:

 


 

Confidence:

 Confidence refers to the likelihood that an item B is purchased if item A is bought. It is a ratio of number of transaction where A and B both are bought by the number of transaction where A is bought.

Mathematical Representation:

 


 

Lift:

 Lift tells how strong our rule is. It also refers to the increase in sale of B when A is sold. For itemset (A,B) it is a ratio of Confidence of (A,B) to the Support of (B).

Mathematical Representation:

 


 

If the lift for (A,B) is 2 than we can say that chances of buying A and B together is 2 times more than the chances of buying just B.

 

Lift = 1, means there is no association between Product A and B.

Lift > 1, means products are more likely to be bought together.

Lift < 1, means products are not likely to be bought together.

These association rule can help retailers to develop marketing strategies in better way. Cross selling is one of the strategy and it concerns selling of those items which are interrelated to each other and can be integrated with the item which is being sold.

Also Association and Recommendation both are different. Association can be called as “Frequently bought together” and Recommendation can be thought as “Customers who bought/viewed Item A also bought Item B”. One of the finest example of Association and Recommendation is of amazon’s website/app. Whenever we search for any product it gives following recommendation based on the searched product.







      ·      Algorithm =>  FP-Growth:

      FP-Growth is an improved version of the Apriori Algorithm which is widely used for frequent pattern mining. It use less memory as compared to Apriori (For larger database , This difference can be easily noticed).

    This algorithm scans the database only twice .It uses a Tree structure (FP-tree) to store all the information.The order is given by the alphabetical order. This algorithm uses a recursive divide-and-conquer approach to mine the frequent itemsets.

 How to build a FP-Tree ?  

§  The root represents null . 

§  Each node represents an item , while the association of the nodes is the itemsets with the order maintained while forming the tree.

 

          Example : 

          Let us consider a dataset as given below with having different transactions .

Transaction ID

Items Purchased

1

FBAED

2

BCE

3

ABDE

4

ABCE

5

ABCDE

6

BCD

Here Total 6 transactions are there and total 6 items (A,B,C,D,E,F) are there. Lets take min support as 3.

 

Now To build the FP-Tree, frequent items support are first calculated and sorted in decreasing order resulting in the following list : 

 

Item

Support

B 

6 

E 

4 

A 

4 

C 

4 

D 

4 

F 

1 

 

Here item – F will not be considered in building FP – Tree because it does not satisfy the min support value.

 

New Transaction table : 

 

This table is made according to most purchase of items . For Constructing FP Tree this new table is needed.

 

 

Transaction ID

Items Purchased

1

BEAD

2

BEC

3

BEAD

4

BEAC

5

BEACD

6

BCD

 

 

 

 

Constructing FP Tree : 

 

1.    For 1st  transaction BEAD : (This {} represents NULL )

                             

2.    After 2nd transaction BEC 

                             


 

Now , Doing the same thing for all transactions . 

 

At last we got the final FP Tree as shown below : 

                               

 

          Now, Conditional FP-Tree:

         

Items

Conditional Pattern Base

Conditional FP-Tree

D

{(BEA:2),(BEAC:1),(BC:1)}

{(BEA:3)}

C

{(BEA:2),(BE:1),(B:1)}

{(BE:3)}

A

{(BE:4)}

{(BE:4)}

E

{(BE:5)}

{(B:5)}

B

-

-

 

Frequent Pattern Generated :

D : DAE(3) , DAEB(3) , DAB(3) , DEB(3)

C : CE(3) , CEB(3) , CB(3)

A : AE(4) , AEB(4) , AB(4)

E : EB(4)


·      Implementation:

 

Here in our project we have considered a dataset having 9835 transactions.

(Please open images to see clearly)

          Loaded dataset:



Then we are doing one hot encoding which means that the items that are purchased in particular transaction will have its entry as 1 and if not purchased will have its entry as 0. The column names will be the product name and the rows are the transactions.




Now , we are applying FP-Growth Algorithm with min. support 0.005


Now printing Top 15  most frequent itemsets in ascending order as per support.



Also representing the above table in graphical format.


Getting the information regarding the top 40 first buy item from the dataset and presenting in a graphical format.



Now through association rules we can get the information regarding the support, confidence and lift of Frequent itemset.

Below given table shows the top 10 itemset that have highest confidence and support value greater than or equal to 0.005.



·      Conclusion:

 

FP-Growth Algorithm gives result which helps a lot in understanding the buying pattern of the customer.

Online store leaders like Flipkart, Amazon uses this technique to suggests items in customers basket/ Shopping cart. The General Store Owners can also make use of such technique to manage Product Placement, Promotional Offers, etc and can increase there sale which leads to increase in there profit. From the above given graph of First Buy time, the store owner can use that to attract more and more people by placing that products in front or at entrance. It not only helps in increasing its sale but also saves customers time.

Thus we can conclude that Market Basket Analysis plays an important role in retail business. 






Comments