- Details
- Written by Administrator
- Category: Uncategorised

Let''s say we want to mine some data for our supermarket, given as example previously.

A customer from a supermarket buys several products. This are categorical data. Let''s say he buys *Honey, Milk, Sugar, Bread*. We want to determine if other customers buy these items ( or part of them ), and if it''s possible to tell that one or more of these products determines the others. In other words, if for example a customer buys Milk and Sugar, it will also buy Honey and Bread.

First of all let''s see what other customers bought:

*Customer 2: Milk, Paper, Honey*

*Customer 3: Sugar, Honey, Milk, Beer*

*Customer 4: Beer, Chips, Bread*

*Customer 5: Milk, Honey, Bread*

We define an **association rule** as two related parts of an item set, such that first part determines the second part.

Example : Milk -> Sugar, Honey.

An **itemset **is a set of items such that each item is present only once.

For a given itemset, one can create multiple association rules.

An association rule has a *support *and a *confidence*. The support is defined as the number of customers ( transactions ) that bought all the items in the item set divided by the total number of transatctions. The confidence is defined as the number of transactions that include the first part of the association rule and the second part divided by total number of transactions that include the first part.

The support for our rule Milk -> Sugar, Honey is 2/5 e.g. 40 %. This is because Customer 1 and 3 have in their itemsets all the 3 items from our rule, and the total number of transactions is 5.

The confidence for our rule is 2/4 e.g. 50 % , because there are 4 customers that have Milk and only 2 which have Milk **and** Sugar, Honey.

For our rule we can say that buying Milk **determines **the buying of Sugar and Honey with a *support *of 40 % and a *confidence *of 50 %.