Crime Doesn't Climb

Animated GIF of crime at decreasing elevation in San Francisco

Among San Francisco's diverse neighborhoods and varied micro-climates, we've heard the phrase "Crime Doesn't Climb," meaning that the city's loftier areas are often associated with less crime. San Francisco, sometimes refered to as the "homeless capital of the United States", ranks in the bottom 10% of safest cities in the country (New York City is nearly three times safer). Although certain neighborhoods (e.g. the Tenderloin) have particularly high crime rates, we wondered if there was more granular data that could answer the question: does crime climb?

We used the DataSF repository, part of the Mayor's 2.0 City initiative to tap into local innovation. The site includes a CSV file of all reported SFPD incidents since 2003. Of the 83,991 incidents in 2013 (we pulled the data on 9/7 at 2:00pm), we selected 15,000 from January 1 to February 25. These crimes ranged from 7,691 "Grand Theft from Locked Auto" incidents to 610 cases of drunkenness. The incidents are tagged with latitude/longitude information and we used the Google Elevation API to translate to height in meters. Here's the script that reads the incident CSV file:

var fs = require('fs');
var csv = require('csv');
var _ = require('underscore');
var request = require('request');
var async = require('async');

var batchSize = 50;

csv()
  .from.path(__dirname + '/sfpd_incident_2013-partial.csv', { delimiter: ',', escape: '"' })
  .to.array( function(data) {
    var locations = [];
    var elevationFunctions = _.chain(data.slice(1))
      .groupBy(function(row, index) {
        return Math.floor(index / batchSize);
      })
      .map(function(dataGroup, groupIndex) {
          var locations = _.map(_.values(dataGroup), function(row) {
            return [row[10], row[9]];
          });

          return function (callback) {
            callElevationApiWithLocations(locations, function(elevations) {
              var startIndex = groupIndex * batchSize + 1;
              for (var i = 0; i < elevations.length; i++) {
                data[startIndex + i].push(elevations[i]);
              }
              setTimeout(callback, 200);
            });
          };
      })
      .value();

    async.series(elevationFunctions, function(err, results) {
        csv()
          .from.array(data)
          .to.stream(fs.createWriteStream(__dirname + '/incidentsWithElevation.csv'));
    });
  });

Here's the code snippet that queries the Google Elevation API (careful--Google rate limits agressively):

var callElevationApiWithLocations = function(locations, callback) {
  var baseUrl = 'http://maps.googleapis.com/maps/api/elevation/json?sensor=false&locations=';

  var locationsParam = _.reduce(locations, function(memo, loc) {
    return memo + '|' + loc[0] + ',' + loc[1];
  }, '');

  var url = baseUrl + locationsParam.substring(1);

  request.get({url: url, json: true}, function(error, response, body) {
    if (!error && response.statusCode == 200) {
      var elevations = _.map(body.results, function(result) {
        return result.elevation;
      });
      callback(elevations);
    }
  });
};

CartoDB lets you quickly slice-and-dice geospatial data and easily display it on a map. We segmented the data into incidents at 25 meter intervals (see the charts folder for more details) and found that crime levels drop off sharply at higher elevations:

San Francisco crime per elevation level

One flaw in this analysis is that it could be a byproduct of the fact that there is less land mass in the city at these higher altitudes. In other words, if 90% of the city is at an elevation less than 25m, then 90% of the crime would occur at lower altitudes, assuming an even distribution of incidents. To correct for this problem, we took a distributed sample of 10,000 locations in San Francisco and divided the number of incidents in each elevation range by the number of locations in that bucket. We manually selected latitude and longitude points to define San Francisco as a grid and created a CSV of latitude, longitude, elevation triplets:

var initialLat = 37.735121;
var initialLong = -122.469749;
var finalLat = 37.804596;
var finalLong = -122.405891;

var latStep = (finalLat - initialLat) / 100.0;
var longStep = (finalLong - initialLong) / 100.0;
var latArray = _.range(initialLat, finalLat, latStep);
var longArray = _.range(initialLong, finalLong, latStep);
var data = [];
_.each(latArray, function(latValue) {
  _.each(longArray, function(longValue) {
    data.push([latValue, longValue]);
  });
});

var elevationFunctions = _.chain(data)
  .groupBy(function(row, index) {
    return Math.floor(index / batchSize);
  })
  .map(function(dataGroup, groupIndex) {
      var locations = _.map(_.values(dataGroup), function(row) {
        return row;
      });

      return function (callback) {
        callElevationApiWithLocations(locations, function(elevations) {
          var startIndex = groupIndex * batchSize + 1;
          for (var i = 0; i < elevations.length; i++) {
            data[startIndex + i].push(elevations[i]);
          }
          setTimeout(callback, 200);
        });
      };
  })
  .value();

async.series(elevationFunctions, function(err, results) {
  csv()
    .from.array(data)
    .to.stream(fs.createWriteStream(__dirname + '/elevationSample.csv'));
});

When normalizing for land mass at different elevations, we found that the trend of lower crime at higher elevations was equally drastic:

San Francisco crime per elevation level per land area

Data Update: Census Comparison

Now included in the data directory is crime data summarized by 2010 census block. Within this dataset are additional attributes including 2010 total population from the US Census, block area, mean elevation (derived directly from the USGS National ELevation Dataset at 1/9 arc-second), and several derivatives including population density and crime density. Also included is a "Crime Index", calculated as the area-normalized crime count divided by the area-normalized population, at the level of the census block.

San Francisco crime index vs. elevation

An interactive comparison of the crime index with elevation can be found here.



Crime Doesn't Climb

旧金山海拔不足的动画GIF犯罪data-canonical-src

在旧金山多样化的社区和不同的微观气候中,我们听到了犯罪不攀登这个词,这意味着这个城市的高层往往与较少的犯罪有关。旧金山有时被称为 10%的最安全的城市国家(纽约市几乎是三次更安全)。虽然某些社区(例如 the Tenderloin )的犯罪率特别高,我们想知道如果有更多的粒度数据可以回答这个问题:犯罪是否爬升?

我们使用了 DataSF 存储库,这是市长 2.0城市计划,以利用当地的创新。该网站包括所有报告的SFPD的 CSV文件事件自2003年以来。在2013年的83,991起事件中(我们在下午2点将9/7的数据提取),我们从1月1日至2月25日期间选出了15,000人。这些罪行范围从7,691锁定汽车盗窃事件到610醉酒案事件标有纬度/经度信息,我们使用 Google Elevation API 以米为单位。这是读取事件CSV文件的脚本:

var fs = require('fs');
var csv = require('csv');
var _ = require('underscore');
var request = require('request');
var async = require('async');

var batchSize = 50;

csv() .from.path(__dirname + '/sfpd_incident2013-partial.csv', { delimiter: ',', escape: '"' }) .to.array( function(data) { var locations = []; var elevationFunctions = .chain(data.slice(1)) .groupBy(function(row, index) { return Math.floor(index / batchSize); }) .map(function(dataGroup, groupIndex) { var locations = .map(.values(dataGroup), function(row) { return [row[10], row[9]]; });

      <span class="pl-k">return</span> <span class="pl-k">function</span> (<span class="pl-smi">callback</span>) {
        <span class="pl-en">callElevationApiWithLocations</span>(locations, <span class="pl-k">function</span>(<span class="pl-smi">elevations</span>) {
          <span class="pl-k">var</span> startIndex <span class="pl-k">=</span> groupIndex <span class="pl-k">*</span> batchSize <span class="pl-k">+</span> <span class="pl-c1">1</span>;
          <span class="pl-k">for</span> (<span class="pl-k">var</span> i <span class="pl-k">=</span> <span class="pl-c1">0</span>; i <span class="pl-k">&lt;</span> <span class="pl-smi">elevations</span>.<span class="pl-c1">length</span>; i<span class="pl-k">++</span>) {
            data[startIndex <span class="pl-k">+</span> i].<span class="pl-c1">push</span>(elevations[i]);
          }
          <span class="pl-c1">setTimeout</span>(callback, <span class="pl-c1">200</span>);
        });
      };
  })
  .<span class="pl-c1">value</span>();

<span class="pl-k">async</span>.<span class="pl-en">series</span>(elevationFunctions, <span class="pl-k">function</span>(<span class="pl-smi">err</span>, <span class="pl-smi">results</span>) {
    <span class="pl-en">csv</span>()
      .<span class="pl-smi">from</span>.<span class="pl-en">array</span>(data)
      .<span class="pl-smi">to</span>.<span class="pl-en">stream</span>(<span class="pl-smi">fs</span>.<span class="pl-en">createWriteStream</span>(<span class="pl-c1">__dirname</span> <span class="pl-k">+</span> <span class="pl-s"><span class="pl-pds">&#39;</span>/incidentsWithElevation.csv<span class="pl-pds">&#39;</span></span>));
});

});

以下是查询Google Elevation API的代码段(谨慎 - Google速度极限):

var callElevationApiWithLocations = function(locations, callback) {
  var baseUrl = 'http://maps.googleapis.com/maps/api/elevation/json?sensor=false&locations=';

var locationsParam = _.reduce(locations, function(memo, loc) { return memo + '|' + loc[0] + ',' + loc[1]; }, '');

var url = baseUrl + locationsParam.substring(1);

request.get({url: url, json: true}, function(error, response, body) { if (!error && response.statusCode == 200) { var elevations = _.map(body.results, function(result) { return result.elevation; }); callback(elevations); } }); };

CartoDB 可让您快速切割和骰子地理空间数据,并轻松将其显示在地图上。我们将数据分成25米间隔的事件(见图表文件夹了解更多细节),并发现在较高海拔地区,犯罪率急剧下降:

旧金山犯罪每高程data-canonical-src

这个分析的一个缺点是,它可能是这个高海拔地区城市土地面积减少这一事实的副产品。换句话说,如果90%的城市海拔低于2500万,那么90%的犯罪将发生在较低的海拔地区,假设发生事故的平均分布。为了解决这个问题,我们在旧金山分享了10,000个地点的分布式样本,并将每个高程范围内的事件数量除以该数据包中的位置数。我们手动选择纬度和经度点以将旧金山定义为网格,并创建了纬度,经度,高度三角形的CSV:

var initialLat = 37.735121;
var initialLong = -122.469749;
var finalLat = 37.804596;
var finalLong = -122.405891;

var latStep = (finalLat - initialLat) / 100.0; var longStep = (finalLong - initialLong) / 100.0; var latArray = .range(initialLat, finalLat, latStep); var longArray = .range(initialLong, finalLong, latStep); var data = []; .each(latArray, function(latValue) { .each(longArray, function(longValue) { data.push([latValue, longValue]); }); });

var elevationFunctions = .chain(data) .groupBy(function(row, index) { return Math.floor(index / batchSize); }) .map(function(dataGroup, groupIndex) { var locations = .map(_.values(dataGroup), function(row) { return row; });

  <span class="pl-k">return</span> <span class="pl-k">function</span> (<span class="pl-smi">callback</span>) {
    <span class="pl-en">callElevationApiWithLocations</span>(locations, <span class="pl-k">function</span>(<span class="pl-smi">elevations</span>) {
      <span class="pl-k">var</span> startIndex <span class="pl-k">=</span> groupIndex <span class="pl-k">*</span> batchSize <span class="pl-k">+</span> <span class="pl-c1">1</span>;
      <span class="pl-k">for</span> (<span class="pl-k">var</span> i <span class="pl-k">=</span> <span class="pl-c1">0</span>; i <span class="pl-k">&lt;</span> <span class="pl-smi">elevations</span>.<span class="pl-c1">length</span>; i<span class="pl-k">++</span>) {
        data[startIndex <span class="pl-k">+</span> i].<span class="pl-c1">push</span>(elevations[i]);
      }
      <span class="pl-c1">setTimeout</span>(callback, <span class="pl-c1">200</span>);
    });
  };

}) .value();

async.series(elevationFunctions, function(err, results) { csv() .from.array(data) .to.stream(fs.createWriteStream(__dirname + '/elevationSample.csv')); });

在不同高程的陆地正常化的情况下,我们发现高海拔地区的犯罪率较低的趋势同样激烈:

旧金山每个地区每个海拔高度的犯罪数据data-canonical-src

数据目录 2010年人口普查块总结的犯罪数据。在这个数据集内是额外的属性,包括2010年总人口从美国人口普查,区块,平均高程(直接从 USGS National Elevation数据集以1/9弧秒得出),以及几种衍生品,包括人口密度和犯罪密度。 还包括一个犯罪指数,按区域标准化的犯罪计数除以区域标准化人口计算在普查区段的水平。

旧金山犯罪指数与海拔data-canonical-src

犯罪指数与海拔的互动比较可以在这里找到




相关问题推荐