4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e...
Transcript of 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e...
21st December, 2019
4th International Conference on Computational Systemsand Information Technology for Sustainable Solution
Ramshankar Yadhunath, Srivenkata Srikanth, Arvind Sudheer, Suja Palaniswamy
Presented by Ramshankar Yadhunath
Identification of Criminal Activity Hotspots using MachineLearning to aid in Effective Utilization of Police Patrolling in
Cities with High Crime Rates
CSITSS - 2019
AGENDA
RELEVANCE AND NEED FOR THE RESEARCH
PREDICTIVE MODELLING - HOW DOES IT HELP OUR PROBLEMSTATEMENT?
THE RESEARCH METHODOLOGY
What and Why is this research important?
A Machine Learning based approach
How did we go about our idea?
RESEARCH OUTCOME AND RESULTSHow did our models perform? What was the novelty in our work?
CONCLUSIONWrapping it Up!
21st December, 2019CSITSS - 2019
RELEVANCE AND NEED FORTHE RESEARCH
What and Why is this research important ?
21st December, 2019CSITSS - 2019
21st December, 2019
AN OVERLOOKED PROBLEM
"There can be no sustainable development without peace and no peace without
sustainable development"- United Nations 2030 Agenda for Sustainable Development
Crime is a major deterrent to a peaceful worldSeveral organizations are taking steps to reduce crime and its effects
BUT, WHAT IF THERE IS A FACTOR HIDDEN TO THESE ORGANIZATIONS?
"THE PROBLEM OF POOR POLICE-POPULATION RATIOS"
CSITSS - 2019
21st December, 2019
Number of Police personnelper 100,000 people
United Nations Recommendation = 222 police personnel per population
WORLD (POLICE:POPULATION) RATIOS
CSITSS - 2019
To facilitate effective distribution of police forces in a city among multiple districts based onthe extent to which each district is prone to crime at a given hour, in a given day, for a givenmonth.
Low Medium High
The Problem Statement
21st December, 2019
THE PROBLEM STATEMENT
CSITSS - 2019
PREDICTIVE MODELLING
A Machine Learning Based Approach
21st December, 2019CSITSS - 2019
Predictions!Historical Data ML Model
New Data
21st December, 2019
PREDICTIVE MODELLING IN A NUTSHELL
CSITSS - 2019
Predictions!Historical Data ML Model
New Data
21st December, 2019
Crime Records' Data ofpast years
"Probable" CrimeHotspots"Probable" Alarm Rates
USING PREDICTIVE MODELLING FOR OUR WORK
CSITSS - 2019
THE RESEARCHMETHODOLOGY
How did we go about our idea ?
21st December, 2019CSITSS - 2019
21st December, 2019
Model
Obtain
Scrub
ExploreOSEMN - Commonly used Data ScienceMethodologyOSEM - Our variantLinear Process
Model
AN OVERVIEW OF THE METHODOLOGY
CSITSS - 2019
21st December, 2019
Model
Obtain
Scrub
Explore
Model
City of Chicago Data PortalTraining Sample : Data from 2015 - 2019(May)Test Sample : Data from 2012 - 2014Over 11 lakh entries
OBTAIN DATA
CSITSS - 2019
21st December, 2019
Model
Obtain
Scrub
Explore
Model
Removing Missing Values : 98.55%entries retainedFeature Engineering : Decomposing"Date" featureData Aggregation : Grouping Data tocount crimes per time point
SCRUB (PRE-PROCESS) DATA
> 11,00,000rows
~ 45,000 rows
CSITSS - 2019
Model
Obtain
Scrub
Explore
EXPLORE DATA
21st December, 2019
Model
Create a "Target" featureLabel based on "Normal Distribution"and IQR
CSITSS - 2019
Model
Obtain
Scrub
Explore
EXPLORE DATA
21st December, 2019
Model
Create a "Target" featureLabel based on "Normal Distribution"and IQR
CSITSS - 2019
Model
Obtain
Scrub
Explore
MODEL THE DATA
21st December, 2019
Model
Supervised Classification ProblemImbalanced Dataset [28:51:21 ratio]3 Samples used7 ML algorithms testedGradient Boosting Tree worked best
CSITSS - 2019
RESEARCH OUTCOME ANDINNOVATIONHow did our model perform? What was thenovelty in our work?
21st December, 2019CSITSS - 2019
21st December, 2019
MODEL EVALUATION METRICS
Accuracy Precision Recall F1 ScoreUnweighted Average Recall
We have evaluated our models based on the following metrics that are common to most MLproblems :
But, we also need a "PROBLEM-SPECIFIC" metric to improve the robustness of our work.
CSITSS - 2019
21st December, 2019
Region X
Actual Label Predicted Label
High Alarm High Alarm
High Alarm Medium Alarm
High Alarm Low Alarm{ }
Best
Manageable
Bad!
A NOVEL METRIC FOR OUR PURPOSE
We must have a model that "minimizes" the scenarioProblem-specific metric : Percentage of misclassifications of "high alarm" regions as "lowalarm" regionsLet's call this metric "HL-mis" in the further slides
CSITSS - 2019
21st December, 2019
KEY CONSIDERATIONS WHILE CHOOSING AMODEL
Key Considerations while choosing a Model :High AccuracyHigh F1 score Low HL-mis
Testing Samples :Sample 1 : 25% of crimeDat (With class imbalance)Sample 2 : 25% of crimeDat (Without class imbalance - Achieved by oversampling)Sample 3 : All crime records from 2012-2014
CSITSS - 2019
MODEL COMPARISONS - TRADITIONALMETRICS
21st December, 2019CSITSS - 2019
MODEL COMPARISONS - OUR METRIC
21st December, 2019CSITSS - 2019
CONCLUSION
Wrapping it Up!
21st December, 2019CSITSS - 2019
21st December, 2019
A FEW IMPORTANT REFERENCESOUR CONTRIBUTIONS
Our work looks at predictive policing from the angle of "Optimizing low police force" to controlcrime even in those cities with very high crime ratesThis work can also be incorporated on a state-level or county-level basis and can be thefoundation to more complex police force allocation mechanismsOur paper is based along the notion of using "Data science as a means of promoting socialgood"The new problem specific metric is an effective way to evaluate the robustness of a model thatcan predict the alarm rate of a region
CSITSS - 2019
A FEW IMPORTANT REFERENCESA FEW IMPORTANT REFERENCES
21st December, 2019CSITSS - 2019
A FEW IMPORTANT REFERENCESA FEW IMPORTANT REFERENCES
21st December, 2019CSITSS - 2019
21st December, 2019
THANK YOU
CSITSS - 2019