R & Data mining in action

43
R & data mining in action Katarzyna Mrowca

description

Presentation from workshop "R & Data mining in action" given at JDD 2013. Code samples with description (in Polish): https://gist.github.com/kmrowca/public

Transcript of R & Data mining in action

Page 1: R & Data mining in action

R & data mining in action

Katarzyna Mrowca

Page 2: R & Data mining in action

Sztuka czytania między wierszami

czyli język R i Data Mining w akcji

Page 3: R & Data mining in action

Katarzyna Mrowca

<me>

</me>

Page 4: R & Data mining in action
Page 5: R & Data mining in action

The deal

Page 6: R & Data mining in action

Agenda

• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …

Page 7: R & Data mining in action

Agenda

• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …

ExerciseTheory

Page 8: R & Data mining in action

Agenda

• Quick glance on theory - Data preparation• Exercises • Regression• Time series• Decision trees• Cluser analysis• Text mining• …

ExerciseTheory

Page 9: R & Data mining in action

Quick glance on theory!

Page 10: R & Data mining in action

What data mining is?

Page 11: R & Data mining in action

What „google” says?

Page 12: R & Data mining in action

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,

Page 13: R & Data mining in action

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

Page 14: R & Data mining in action

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

Page 15: R & Data mining in action

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

Page 16: R & Data mining in action

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

Page 17: R & Data mining in action

What „google” says?

Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.

Page 18: R & Data mining in action

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

Page 19: R & Data mining in action

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

Page 20: R & Data mining in action

What „google” says?

The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.

Page 21: R & Data mining in action

What „google” says?

Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.

Source: wikipedia

Page 22: R & Data mining in action

Data mining – what is „inside”

• Predictive• Regression• Classification• Collaborative Filtering

• Descriptive• Clustering / similarity matching• Association rules and variants• Deviation detection

Page 23: R & Data mining in action

Data mining – what is „inside”

• Predictive:• Regression• Classification• Collaborative Filtering

• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection

Page 24: R & Data mining in action

Data mining – what is „inside”

• Predictive:• Regression• Classification• Collaborative Filtering

• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection

Page 25: R & Data mining in action

What data mining is not?

Page 26: R & Data mining in action

Why Data Mining is so popular?

Page 27: R & Data mining in action

What is a difference between statistics and data mining?

Page 28: R & Data mining in action

Data preparation

Page 29: R & Data mining in action

Variables

Page 30: R & Data mining in action

Qualitative & Quantitative

Page 31: R & Data mining in action

Tame R console!

Page 32: R & Data mining in action

NetBeans + R

Source: https://blogs.oracle.com/geertjan/entry/r_plugin_for_netbeans_ide

Page 33: R & Data mining in action

RHIPE <– R+ Hadoop Find out more: http://www.datadr.org/

Page 34: R & Data mining in action

Revolution Analytics <- R + Hadoop + EnterpriseFind out more: http://www.revolutionanalytics.com

Page 35: R & Data mining in action

Take a break

Page 36: R & Data mining in action

Regression

Page 37: R & Data mining in action

Time series

Page 38: R & Data mining in action

Decision trees

Page 39: R & Data mining in action

Regression trees

Page 40: R & Data mining in action

Classification trees

Page 41: R & Data mining in action

K means

Page 42: R & Data mining in action

Text mining

Page 43: R & Data mining in action

Thank you!