Deep Learning for Public Safety in Chicago and San Francisco
Published in: Software
Transcript
- 1. DEEP LEARNING FOR PUBLIC SAFETY: FIGHTING CRIME WITH OPEN CITY DATA AlexTellez, Michal Malohlava, and H2O.ai team
- 2. OPEN CITIES Many major cities around the world provide easily accessible public data sets with years of historical data Currently this data is underused
- 3. CHICAGO
- 4. OPEN CRIME DATA Crime Dataset: Crimes from 2001 - Present Day ~ 4.6 million crimes
- 5. THE WINDY CITY Harvest Chicago Weather data since 2001
- 6. SOCIOECONOMIC FACTORS Crimes segmented into Community Area IDs Percent of households below poverty, unemployed, etc.
- 7. SPARK + H2O Weather CrimesCensusWeatherWeather Data munging Spark SQL join Deep Learning Evaluate models GOAL: For a given crime, predict if an arrest is more / less likely to be made!
- 8. LOAD DATA INTO H2O Weather Data 5k rows Census Data 78 rows Crime Data ~4.5 Mn rows
- 9. JOIN DATASETS crime data weather data census data Using Spark, we join 3 datasets together to make one mega dataset!
- 10. CHICAGOVISUALIZATIONS arrest rate season of crime temperature during crime community crime is committed in
- 11. ARREST RATE BYTYPES OF CRIME
- 12. ARREST RATE VS % OF TOTAL CRIMES Arrest Rate % of all crimes recorded A large proportion of crimes are thefts Unfortunately, there is a much lower arrest rate for thefts than for less prevalent crimes like gambling
- 13. SPLIT DATA INTOTEST/TRAIN SETS training set arrest rate test set arrest rate train model on this segment, 80% of data validate the model on this segment (remaining 20%) ~40% of crimes lead to arrest
- 14. DEEP LEARNING Problem: For a given crime, is an arrest more / less likely? Deep Learning: A multi-layer feed-forward neural network that starts w/ an input layer (crime + weather data) followed by multiple layers of non-linear transformations
- 15. DEEP LEARNING MODEL Deep Neural Network w/ 2 layers of non-linear transformations Binomial prediction: Is an arrest made?Yes/No AUC onTraining Data ~ 0.91! ~ 3.5 Million Crimes
- 16. HOW’D WE DO? Train AUC ~ 0.91 Test AUC ~ 0.91
- 17. GEO-MAPPED PREDICTIONS Because each of the crimes reported comes with latitude-longitude coordinates, we scored our hold out data using the trained model and plotted the predictions on a map of Chicago - specifically, the Downtown district.
- 18. SAN FRANCISCO
- 19. OPEN CITY, OPEN DATA Crime Dataset: SFPD Incidents from 1/1/2003 - Present ~1 Million Crimes
- 20. WEATHER ANYONE? Harvest weather data from 1/1/2003
- 21. DATA INGESTION Weather Data: Temp,Visibility, Precipitation, Cloud Cover Crime Data: Category, Description, Weekend,Arrest, etc
- 22. SFVISUALIZATIONS Most common crimes? When is crime happening most? …midnight, noon, 6 PM
- 23. DEEP LEARNING MODEL Deep Neural Network w/ 3 layers of non-linear transformations Total RunTime: 6 mins. 42 sec. AUC ~ 0.95 onTraining Data
- 24. VALIDATIONTEST Model ‘trained’ on 80% of data, validated against remaining 20% AUC = 0.95 on validation data
- 25. WHAT’S NEXT? Can deploy each model in real-time to increase public safety and help police departments. Map of Model Accuracy - For each point on the map (place of crime) we can have different colors based on model prediction (0.999 = green, arrest likely vs. 0.67 = orange) Run prediction for specific subsets of the data (i.e. most dangerous area) We plan on doing all of the above! Ensemble - Model average by running prediction models for Chicago + San Francisco which may increase accuracy more?