Chapter 2 Data sources

We acquired our data through the NYC Open Data Website. It contains free public data published by New York City agencies and other partners. Specifically, we attained data from the Police Department data (NYPD) data set. The online database provides api and .csv version of data.

The primary data sources of this project are NYPD Complaint Data Historical and NYPD Arrests Data (Historic).

We also consider the missing value situation of several other data sets. Including Motor vehicle Collisions and Crashes, Arrest Data (YTD), Complaint Data (YTD), Criminal Court Summons (historical), Criminal Court Summons (YTD), Hate Crimes, Shooting Incident Data (historical), and Shooting Incident Data (YTD).

One can easily find these data sets on the original data provider website NYC Open Data Website, or check our Google Drive collection.

2.1 NYPD Complaints Dataset

NYPD Complaint Data Historical data set includes all valid felony, misdemeanor, and violation crimes reported to the New York City Police Department (NYPD) from 2006 to the end of last year (2019). For additional details, please see the attached data dictionary in the ‘About’ section. (Source)

Table 2.1: Major columns used in NYPD Complaints
Variables Description
RPT_DT Date in %m/%d/%y, Date event was reported to police
OFNS_DESC Text, Description of internal classification corresponding with KY code (more general category than PD description)

2.2 NYPD Arrest Dataset

NYPD Arrests Data (Historic) includes a list of every arrest in NYC going back to 2006 through the end of the previous calendar year. This is a breakdown of every arrest effected in NYC by the NYPD going back to 2006 through the end of the previous calendar year. This data is manually extracted every quarter and reviewed by the Office of Management Analysis and Planning before being posted on the NYPD website. Each record represents an arrest effected in NYC by the NYPD and includes information about the type of crime, the location and time of enforcement. In addition, information related to suspect demographics is also included. (Source)

Table 2.2: Major columns used in NYPD Arrest
Variables Description
LAW_CAT_CD Factor, Level of offense: felony, misdemeanor, violation
ARREST_DATE Date in %m/%d/%y, Exact date of arrest for the reported event
ARREST_BORO Factor, Borough of arrest. B(Bronx), S(Staten Island), K(Brooklyn), M(Manhattan), Q(Queens)
Latitude Text, Latitude coordinate for Global Coordinate System, WGS 1984, decimal degrees (EPSG 4326)
Longitude Text, Longitude coordinate for Global Coordinate System, WGS 1984, decimal degrees (EPSG 4326)

2.3 Other Data Sets

For more details on data sets mentioned above other than Complaints and Arrest, please check Chapter 04-Cleaning.