R & Data mining in action

Post on 18-Dec-2014

182 views 2 download

description

Presentation from workshop "R & Data mining in action" given at JDD 2013. Code samples with description (in Polish): https://gist.github.com/kmrowca/public

Transcript of R & Data mining in action

R & data mining in action

Katarzyna Mrowca

Sztuka czytania między wierszami

czyli język R i Data Mining w akcji

Katarzyna Mrowca

<me>

</me>

The deal

Agenda

• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …

Agenda

• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …

ExerciseTheory

Agenda

• Quick glance on theory - Data preparation• Exercises • Regression• Time series• Decision trees• Cluser analysis• Text mining• …

ExerciseTheory

Quick glance on theory!

What data mining is?

What „google” says?

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

What „google” says?

Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Source: wikipedia

Data mining – what is „inside”

• Predictive• Regression• Classification• Collaborative Filtering

• Descriptive• Clustering / similarity matching• Association rules and variants• Deviation detection

Data mining – what is „inside”

• Predictive:• Regression• Classification• Collaborative Filtering

• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection

Data mining – what is „inside”

• Predictive:• Regression• Classification• Collaborative Filtering

• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection

What data mining is not?

Why Data Mining is so popular?

What is a difference between statistics and data mining?

Data preparation

Variables

Qualitative & Quantitative

Tame R console!

NetBeans + R

Source: https://blogs.oracle.com/geertjan/entry/r_plugin_for_netbeans_ide

RHIPE <– R+ Hadoop Find out more: http://www.datadr.org/

Revolution Analytics <- R + Hadoop + EnterpriseFind out more: http://www.revolutionanalytics.com

Take a break

Regression

Time series

Decision trees

Regression trees

Classification trees

K means

Text mining

Thank you!