Deep Learning for Public Safety in Chicago and San Francisco

Presentation on Deep Learning for Public Safety using open data sets from the cities of San Francisco and Chicago.

Published in: Software

Deep Learning for Public Safety in Chicago and San Francisco from Sri Ambati

Transcript

1. DEEP LEARNING FOR PUBLIC SAFETY: FIGHTING CRIME WITH OPEN CITY DATA AlexTellez, Michal Malohlava, and H2O.ai team
2. OPEN CITIES Many major cities around the world provide easily accessible public data sets with years of historical data Currently this data is underused
3. CHICAGO
4. OPEN CRIME DATA Crime Dataset: Crimes from 2001 - Present Day ~ 4.6 million crimes
5. THE WINDY CITY Harvest Chicago Weather data since 2001
6. SOCIOECONOMIC FACTORS Crimes segmented into Community Area IDs Percent of households below poverty, unemployed, etc.
7. SPARK + H2O Weather CrimesCensusWeatherWeather Data munging Spark SQL join Deep Learning Evaluate models GOAL: For a given crime, predict if an arrest is more / less likely to be made!
8. LOAD DATA INTO H2O Weather Data 5k rows Census Data 78 rows Crime Data ~4.5 Mn rows
9. JOIN DATASETS crime data weather data census data Using Spark, we join 3 datasets together to make one mega dataset!
10. CHICAGOVISUALIZATIONS arrest rate season of crime temperature during crime community crime is committed in
11. ARREST RATE BYTYPES OF CRIME
12. ARREST RATE VS % OF TOTAL CRIMES Arrest Rate % of all crimes recorded A large proportion of crimes are thefts Unfortunately, there is a much lower arrest rate for thefts than for less prevalent crimes like gambling
13. SPLIT DATA INTOTEST/TRAIN SETS training set arrest rate test set arrest rate train model on this segment, 80% of data validate the model on this segment (remaining 20%) ~40% of crimes lead to arrest
14. DEEP LEARNING Problem: For a given crime, is an arrest more / less likely? Deep Learning: A multi-layer feed-forward neural network that starts w/ an input layer (crime + weather data) followed by multiple layers of non-linear transformations
15. DEEP LEARNING MODEL Deep Neural Network w/ 2 layers of non-linear transformations Binomial prediction: Is an arrest made?Yes/No AUC onTraining Data ~ 0.91! ~ 3.5 Million Crimes
16. HOW’D WE DO? Train AUC ~ 0.91 Test AUC ~ 0.91
17. GEO-MAPPED PREDICTIONS Because each of the crimes reported comes with latitude-longitude coordinates, we scored our hold out data using the trained model and plotted the predictions on a map of Chicago - speciﬁcally, the Downtown district.
18. SAN FRANCISCO
19. OPEN CITY, OPEN DATA Crime Dataset: SFPD Incidents from 1/1/2003 - Present ~1 Million Crimes
20. WEATHER ANYONE? Harvest weather data from 1/1/2003
21. DATA INGESTION Weather Data: Temp,Visibility, Precipitation, Cloud Cover Crime Data: Category, Description, Weekend,Arrest, etc
22. SFVISUALIZATIONS Most common crimes? When is crime happening most? …midnight, noon, 6 PM
23. DEEP LEARNING MODEL Deep Neural Network w/ 3 layers of non-linear transformations Total RunTime: 6 mins. 42 sec. AUC ~ 0.95 onTraining Data
24. VALIDATIONTEST Model ‘trained’ on 80% of data, validated against remaining 20% AUC = 0.95 on validation data
25. WHAT’S NEXT? Can deploy each model in real-time to increase public safety and help police departments. Map of Model Accuracy - For each point on the map (place of crime) we can have different colors based on model prediction (0.999 = green, arrest likely vs. 0.67 = orange) Run prediction for speciﬁc subsets of the data (i.e. most dangerous area) We plan on doing all of the above! Ensemble - Model average by running prediction models for Chicago + San Francisco which may increase accuracy more?

Top of the world

Monday, April 27, 2015

Deep Learning for Public Safety in Chicago and San Francisco

Deep Learning for Public Safety in Chicago and San Francisco

Transcript