4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e...

21st December, 2019

4th International Conference on Computational Systemsand Information Technology for Sustainable Solution

Ramshankar Yadhunath, Srivenkata Srikanth, Arvind Sudheer, Suja Palaniswamy

Presented by Ramshankar Yadhunath

Identification of Criminal Activity Hotspots using MachineLearning to aid in Effective Utilization of Police Patrolling in

Cities with High Crime Rates

CSITSS - 2019

AGENDA

RELEVANCE AND NEED FOR THE RESEARCH

PREDICTIVE MODELLING - HOW DOES IT HELP OUR PROBLEMSTATEMENT?

THE RESEARCH METHODOLOGY

What and Why is this research important?

A Machine Learning based approach

How did we go about our idea?

RESEARCH OUTCOME AND RESULTSHow did our models perform? What was the novelty in our work?

CONCLUSIONWrapping it Up!

21st December, 2019CSITSS - 2019

RELEVANCE AND NEED FORTHE RESEARCH

What and Why is this research important ?


21st December, 2019

AN OVERLOOKED PROBLEM

"There can be no sustainable development without peace and no peace without

sustainable development"- United Nations 2030 Agenda for Sustainable Development

Crime is a major deterrent to a peaceful worldSeveral organizations are taking steps to reduce crime and its effects

BUT, WHAT IF THERE IS A FACTOR HIDDEN TO THESE ORGANIZATIONS?

"THE PROBLEM OF POOR POLICE-POPULATION RATIOS"

CSITSS - 2019

21st December, 2019

Number of Police personnelper 100,000 people

United Nations Recommendation = 222 police personnel per population

WORLD (POLICE:POPULATION) RATIOS

CSITSS - 2019

To facilitate effective distribution of police forces in a city among multiple districts based onthe extent to which each district is prone to crime at a given hour, in a given day, for a givenmonth.

Low Medium High

The Problem Statement

21st December, 2019

THE PROBLEM STATEMENT

CSITSS - 2019

PREDICTIVE MODELLING

A Machine Learning Based Approach


Predictions!Historical Data ML Model

New Data

21st December, 2019

PREDICTIVE MODELLING IN A NUTSHELL

CSITSS - 2019

Predictions!Historical Data ML Model

New Data

21st December, 2019

Crime Records' Data ofpast years

"Probable" CrimeHotspots"Probable" Alarm Rates

USING PREDICTIVE MODELLING FOR OUR WORK

CSITSS - 2019

THE RESEARCHMETHODOLOGY

How did we go about our idea ?


21st December, 2019

Model

Obtain

Scrub

ExploreOSEMN - Commonly used Data ScienceMethodologyOSEM - Our variantLinear Process

Model

AN OVERVIEW OF THE METHODOLOGY

CSITSS - 2019

21st December, 2019

Model

Obtain

Scrub

Explore

Model

City of Chicago Data PortalTraining Sample : Data from 2015 - 2019(May)Test Sample : Data from 2012 - 2014Over 11 lakh entries

OBTAIN DATA

CSITSS - 2019

21st December, 2019

Model

Obtain

Scrub

Explore

Model

Removing Missing Values : 98.55%entries retainedFeature Engineering : Decomposing"Date" featureData Aggregation : Grouping Data tocount crimes per time point

SCRUB (PRE-PROCESS) DATA

> 11,00,000rows

~ 45,000 rows

CSITSS - 2019

Model

Obtain

Scrub

Explore

EXPLORE DATA

21st December, 2019

Model

Create a "Target" featureLabel based on "Normal Distribution"and IQR

CSITSS - 2019

Model

Obtain

Scrub

Explore

MODEL THE DATA

21st December, 2019

Model

Supervised Classification ProblemImbalanced Dataset [28:51:21 ratio]3 Samples used7 ML algorithms testedGradient Boosting Tree worked best

CSITSS - 2019

RESEARCH OUTCOME ANDINNOVATIONHow did our model perform? What was thenovelty in our work?


21st December, 2019

MODEL EVALUATION METRICS

Accuracy Precision Recall F1 ScoreUnweighted Average Recall

We have evaluated our models based on the following metrics that are common to most MLproblems :

But, we also need a "PROBLEM-SPECIFIC" metric to improve the robustness of our work.

CSITSS - 2019

21st December, 2019

Region X

Actual Label Predicted Label

High Alarm High Alarm

High Alarm Medium Alarm

High Alarm Low Alarm{ }

Best

Manageable

Bad!

A NOVEL METRIC FOR OUR PURPOSE

We must have a model that "minimizes" the scenarioProblem-specific metric : Percentage of misclassifications of "high alarm" regions as "lowalarm" regionsLet's call this metric "HL-mis" in the further slides

CSITSS - 2019

21st December, 2019

KEY CONSIDERATIONS WHILE CHOOSING AMODEL

Key Considerations while choosing a Model :High AccuracyHigh F1 score Low HL-mis

Testing Samples :Sample 1 : 25% of crimeDat (With class imbalance)Sample 2 : 25% of crimeDat (Without class imbalance - Achieved by oversampling)Sample 3 : All crime records from 2012-2014

CSITSS - 2019

MODEL COMPARISONS - TRADITIONALMETRICS


MODEL COMPARISONS - OUR METRIC


CONCLUSION

Wrapping it Up!


21st December, 2019

A FEW IMPORTANT REFERENCESOUR CONTRIBUTIONS

Our work looks at predictive policing from the angle of "Optimizing low police force" to controlcrime even in those cities with very high crime ratesThis work can also be incorporated on a state-level or county-level basis and can be thefoundation to more complex police force allocation mechanismsOur paper is based along the notion of using "Data science as a means of promoting socialgood"The new problem specific metric is an effective way to evaluate the robustness of a model thatcan predict the alarm rate of a region

CSITSS - 2019

A FEW IMPORTANT REFERENCESA FEW IMPORTANT REFERENCES


21st December, 2019

THANK YOU

CSITSS - 2019

4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e...

Documents

Transcript of 4 t h In t e r n a t i o n a l C o n f e r e n c e o n C o m p u t a t i o n a … · 4 t h In t e...