Sztuka czytania między wierszami - R i Data mining

Post on 20-May-2015

373 views 1 download

description

Slajdy stanowią ramy warsztatu z R i data miningu (poziom podstawowy). Materiały przykładowe z komentarzami w języku polskim: https://gist.github.com/kmrowca/public

Transcript of Sztuka czytania między wierszami - R i Data mining

Sztuka czytania między wierszami

czyli język R i Data Mining w akcji

Katarzyna Mrowca

<me>

</me>

The deal

Agenda

• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …

Agenda

• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …

Exercise

Theory

Agenda

• Quick glance on theory - Data preparation• Exercises • Decision trees• Cluser analysis• Text mining• …

Exercise

Theory

Agile is everywhere!

Agile is everywhere!

• Retro after second break

Quick glance on theory!

What data mining is?

What „google” says?

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), [1] an interdisciplinary subfield of computer science,

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

What „google” says?

Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Source: wikipedia

Data mining – what is „inside”

• Predictive• Regression• Classification• Collaborative Filtering

• Descriptive• Clustering / similarity matching• Association rules and variants• Deviation detection

Data mining – what is „inside”

• Predictive:• Regression• Classification• Collaborative Filtering

• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection

Data mining – what is „inside”

• Predictive:• Regression• Classification• Collaborative Filtering

• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection

What data mining is not?

Why Data Mining is so popular?

What is a difference between statistics and data mining?

Exercise

Data preparation

Variables

Qualitative & Quantitative

Tame R console!

Take a break

Regression

Time series

Decision trees

Regression trees

Classification trees

K means

Text mining

Thank you!