4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e...

27
21st December, 2019 4th International Conference on Computational Systems and Information Technology for Sustainable Solution Ramshankar Yadhunath, Srivenkata Srikanth, Arvind Sudheer, Suja Palaniswamy Presented by Ramshankar Yadhunath Identification of Criminal Activity Hotspots using Machine Learning to aid in Effective Utilization of Police Patrolling in Cities with High Crime Rates CSITSS - 2019

Transcript of 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e...

Page 1: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

4th International Conference on Computational Systemsand Information Technology for Sustainable Solution

Ramshankar Yadhunath, Srivenkata Srikanth, Arvind Sudheer, Suja Palaniswamy

Presented by Ramshankar Yadhunath

Identification of Criminal Activity Hotspots using MachineLearning to aid in Effective Utilization of Police Patrolling in

Cities with High Crime Rates

CSITSS - 2019

Page 2: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

AGENDA

RELEVANCE AND NEED FOR THE RESEARCH

PREDICTIVE MODELLING - HOW DOES IT HELP OUR PROBLEMSTATEMENT?

THE RESEARCH METHODOLOGY

What and Why is this research important?

A Machine Learning based approach

How did we go about our idea?

RESEARCH OUTCOME AND RESULTSHow did our models perform? What was the novelty in our work?

CONCLUSIONWrapping it Up!

21st December, 2019CSITSS - 2019

Page 3: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

RELEVANCE AND NEED FORTHE RESEARCH

What and Why is this research important ?

21st December, 2019CSITSS - 2019

Page 4: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

AN OVERLOOKED PROBLEM

"There can be no sustainable development without peace and no peace without

sustainable development"- United Nations 2030 Agenda for Sustainable Development

Crime is a major deterrent to a peaceful worldSeveral organizations are taking steps to reduce crime and its effects

BUT, WHAT IF THERE IS A FACTOR HIDDEN TO THESE ORGANIZATIONS?

"THE PROBLEM OF POOR POLICE-POPULATION RATIOS"

CSITSS - 2019

Page 5: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

Number of Police personnelper 100,000 people

United Nations Recommendation = 222 police personnel per population

WORLD (POLICE:POPULATION) RATIOS

CSITSS - 2019

Page 6: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

To facilitate effective distribution of police forces in a city among multiple districts based onthe extent to which each district is prone to crime at a given hour, in a given day, for a givenmonth.

Low Medium High

The Problem Statement

21st December, 2019

THE PROBLEM STATEMENT

CSITSS - 2019

Page 7: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

PREDICTIVE MODELLING

A Machine Learning Based Approach

21st December, 2019CSITSS - 2019

Page 8: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

Predictions!Historical Data ML Model

New Data

21st December, 2019

PREDICTIVE MODELLING IN A NUTSHELL

CSITSS - 2019

Page 9: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

Predictions!Historical Data ML Model

New Data

21st December, 2019

Crime Records' Data ofpast years

"Probable" CrimeHotspots"Probable" Alarm Rates

USING PREDICTIVE MODELLING FOR OUR WORK

CSITSS - 2019

Page 10: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

THE RESEARCHMETHODOLOGY

How did we go about our idea ?

21st December, 2019CSITSS - 2019

Page 11: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

Model

Obtain

Scrub

ExploreOSEMN - Commonly used Data ScienceMethodologyOSEM - Our variantLinear Process

Model

AN OVERVIEW OF THE METHODOLOGY

CSITSS - 2019

Page 12: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

Model

Obtain

Scrub

Explore

Model

City of Chicago Data PortalTraining Sample : Data from 2015 - 2019(May)Test Sample : Data from 2012 - 2014Over 11 lakh entries

OBTAIN DATA

CSITSS - 2019

Page 13: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

Model

Obtain

Scrub

Explore

Model

Removing Missing Values : 98.55%entries retainedFeature Engineering : Decomposing"Date" featureData Aggregation : Grouping Data tocount crimes per time point

SCRUB (PRE-PROCESS) DATA

> 11,00,000rows

~ 45,000 rows

CSITSS - 2019

Page 14: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

Model

Obtain

Scrub

Explore

EXPLORE DATA

21st December, 2019

Model

Create a "Target" featureLabel based on "Normal Distribution"and IQR

CSITSS - 2019

Page 15: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

Model

Obtain

Scrub

Explore

EXPLORE DATA

21st December, 2019

Model

Create a "Target" featureLabel based on "Normal Distribution"and IQR

CSITSS - 2019

Page 16: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

Model

Obtain

Scrub

Explore

MODEL THE DATA

21st December, 2019

Model

Supervised Classification ProblemImbalanced Dataset [28:51:21 ratio]3 Samples used7 ML algorithms testedGradient Boosting Tree worked best

CSITSS - 2019

Page 17: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

RESEARCH OUTCOME ANDINNOVATIONHow did our model perform? What was thenovelty in our work?

21st December, 2019CSITSS - 2019

Page 18: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

MODEL EVALUATION METRICS

Accuracy Precision Recall F1 ScoreUnweighted Average Recall

We have evaluated our models based on the following metrics that are common to most MLproblems :

But, we also need a "PROBLEM-SPECIFIC" metric to improve the robustness of our work.

CSITSS - 2019

Page 19: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

Region X

Actual Label Predicted Label

High Alarm High Alarm

High Alarm Medium Alarm

High Alarm Low Alarm{ }

Best

Manageable

Bad!

A NOVEL METRIC FOR OUR PURPOSE

We must have a model that "minimizes" the scenarioProblem-specific metric : Percentage of misclassifications of "high alarm" regions as "lowalarm" regionsLet's call this metric "HL-mis" in the further slides

CSITSS - 2019

Page 20: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

KEY CONSIDERATIONS WHILE CHOOSING AMODEL

Key Considerations while choosing a Model :High AccuracyHigh F1 score Low HL-mis

Testing Samples :Sample 1 : 25% of crimeDat (With class imbalance)Sample 2 : 25% of crimeDat (Without class imbalance - Achieved by oversampling)Sample 3 : All crime records from 2012-2014

CSITSS - 2019

Page 21: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

MODEL COMPARISONS - TRADITIONALMETRICS

21st December, 2019CSITSS - 2019

Page 22: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

MODEL COMPARISONS - OUR METRIC

21st December, 2019CSITSS - 2019

Page 23: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

CONCLUSION

Wrapping it Up!

21st December, 2019CSITSS - 2019

Page 24: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

A FEW IMPORTANT REFERENCESOUR CONTRIBUTIONS

Our work looks at predictive policing from the angle of "Optimizing low police force" to controlcrime even in those cities with very high crime ratesThis work can also be incorporated on a state-level or county-level basis and can be thefoundation to more complex police force allocation mechanismsOur paper is based along the notion of using "Data science as a means of promoting socialgood"The new problem specific metric is an effective way to evaluate the robustness of a model thatcan predict the alarm rate of a region

CSITSS - 2019

Page 25: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

A FEW IMPORTANT REFERENCESA FEW IMPORTANT REFERENCES

21st December, 2019CSITSS - 2019

Page 26: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

A FEW IMPORTANT REFERENCESA FEW IMPORTANT REFERENCES

21st December, 2019CSITSS - 2019

Page 27: 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a l S y s t e m

21st December, 2019

THANK YOU

CSITSS - 2019