R & Data mining in action
-
Upload
kasia-mrowca -
Category
Technology
-
view
182 -
download
2
description
Transcript of R & Data mining in action
R & data mining in action
Katarzyna Mrowca
Sztuka czytania między wierszami
czyli język R i Data Mining w akcji
Katarzyna Mrowca
<me>
</me>
The deal
Agenda
• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …
Agenda
• Quick glance on theory - Data mining• Exercises on… paper• Quick glance on tool – R console• Exercises – became friend with R• …
ExerciseTheory
Agenda
• Quick glance on theory - Data preparation• Exercises • Regression• Time series• Decision trees• Cluser analysis• Text mining• …
ExerciseTheory
Quick glance on theory!
What data mining is?
What „google” says?
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science,
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
Data mining (the analysis step of the "Knowledge Discovery in Databases" process, or KDD), an interdisciplinary subfield of computer science, is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics.
What „google” says?
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
What „google” says?
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
What „google” says?
The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use.
What „google” says?
Aside from the raw analysis step, it involves database and data management aspects, data pre-processing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating.
Source: wikipedia
Data mining – what is „inside”
• Predictive• Regression• Classification• Collaborative Filtering
• Descriptive• Clustering / similarity matching• Association rules and variants• Deviation detection
Data mining – what is „inside”
• Predictive:• Regression• Classification• Collaborative Filtering
• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection
Data mining – what is „inside”
• Predictive:• Regression• Classification• Collaborative Filtering
• Descriptive:• Clustering / similarity matching• Association rules and variants• Deviation detection
What data mining is not?
Why Data Mining is so popular?
What is a difference between statistics and data mining?
Data preparation
Variables
Qualitative & Quantitative
Tame R console!
NetBeans + R
Source: https://blogs.oracle.com/geertjan/entry/r_plugin_for_netbeans_ide
RHIPE <– R+ Hadoop Find out more: http://www.datadr.org/
Revolution Analytics <- R + Hadoop + EnterpriseFind out more: http://www.revolutionanalytics.com
Take a break
Regression
Time series
Decision trees
Regression trees
Classification trees
K means
Text mining
Thank you!