Data mining Concepts part 1

What is Data Mining?

First of all what is Data?

In a rudimentary conception could be organized information. In the back end, it''s just some bits. Lot of bits.

Information and Data have the same meaning ? Perhaps not.

Searching the web has the purpose of finding data or information? When searching the web with a regular search engine, for example Google, you only look for keywords, either as meta data or as words throughout other documents. Google does not interpret the meaning of your words. Relations and associations are not being done. Data mining would look something like "If somebody in a supermarket buys beer and diapers... what are the odds that he would also buy chips ?"

Data mining is the process of finding out new information from existing information, by using mathematical, statistical, data bases techniques that combined can give out new information that was not present.

"If somebody buys beer and wine, how risky is for that person to have a heart attack? Is it profitable to give to such a man life insurance?" Examples can continue.

Data mining is a complex process that can give us knowledge from what we already have. What do we have to do before we can apply data mining ? Usually databases and information sets are noisy, missing, redundant or perhaps too big. What can we do to make our data easy to "mine" and to give us the best results that we can obtain ? Databases cleansing techniques can be applied at this level : sampling, noise reduction, even machine learning algorithms can be applied to show out missing attributes from our information tables. For example one can use a decision tree like ID3