- Written by Administrator
- Category: Uncategorised
Let''s say we want to mine some data for our supermarket, given as example previously.
A customer from a supermarket buys several products. This are categorical data. Let''s say he buys Honey, Milk, Sugar, Bread. We want to determine if other customers buy these items ( or part of them ), and if it''s possible to tell that one or more of these products determines the others. In other words, if for example a customer buys Milk and Sugar, it will also buy Honey and Bread.
First of all let''s see what other customers bought:
Customer 2: Milk, Paper, Honey
Customer 3: Sugar, Honey, Milk, Beer
Customer 4: Beer, Chips, Bread
Customer 5: Milk, Honey, Bread
We define an association rule as two related parts of an item set, such that first part determines the second part.
Example : Milk -> Sugar, Honey.
An itemset is a set of items such that each item is present only once.
For a given itemset, one can create multiple association rules.
An association rule has a support and a confidence. The support is defined as the number of customers ( transactions ) that bought all the items in the item set divided by the total number of transatctions. The confidence is defined as the number of transactions that include the first part of the association rule and the second part divided by total number of transactions that include the first part.
The support for our rule Milk -> Sugar, Honey is 2/5 e.g. 40 %. This is because Customer 1 and 3 have in their itemsets all the 3 items from our rule, and the total number of transactions is 5.
The confidence for our rule is 2/4 e.g. 50 % , because there are 4 customers that have Milk and only 2 which have Milk and Sugar, Honey.
For our rule we can say that buying Milk determines the buying of Sugar and Honey with a support of 40 % and a confidence of 50 %.