finding-hotspots

Separating dependent and independent events

Additional comment 2010-04-16: separating non-random from random events, that is the core of analytical statistics, this is simply an application of that on events spread out in time and space.

Problem: In a set of events - for which we have coordinate and time data - some events are concentrated in time or space, or both time and space, in a way that makes it very unlikely that these events are independent. Instead, they are likely to be influenced by one or more common latent factor. Depending on what dimension the concentration is centered on, time, space or both, different types of latent factors are likely involved.

The events in this case is school fires classified as arsons, the data on events comes from a register of utryckningar over a period of almost 9 years, or 3593 days). Since schools occupy discrete spatial places, however the patterning effect that arises from this condition is not of interest here. Thus only coordinates which is the location of a school is included, and all coordinates that hosts the same school will be treated as one (1) place.

The problem that will be solved here is how to single out the arsons that can be meaningfully related to other arsons in a serie or regular pattern. A basic question of interest for preventive efforts is how large portion of the arsons is related to other arsons and how large portion is randomly placed.

type of concentration object type to calculate base rate for unit for the base rate
time and space a place events per time
space an area events per place in that area
time a year, month, week, day, or an hour events per time-unit

Time and space concentration

For a school, let us call it the test-school, with 3 arsons over the period of study (3593 days), the base rate is 3 events / 3593 days. To decide wheter two or more of the three arsons at test-school meaningfully can be related to each other, I have sampled a large number of events-over-period outcomes. (40.000 for each number of events), calculated distances between the events and used 95% as a cut-off limit.

The following table shows the shortest distance, in days, between two events which is longer than the 5% shortest distances.

Number of events, randomly placed in 3593 days shortest distance, in days, between two events which is longer than the 5% shortest distances.
2 93
3 60
4 46
5 37
6 31
7 27
8 24
9 21

10| 19

This table can now be used to classify events as random or non-random in regard to concentration in time and space. For every place, find its number of events in the table, and for each event of this place (each fire), decide whether or not its distance, in days, to the surrounding fires is less than the number in column 2 in the table. Of those events by this method classified as non-random, 5% can be expected to be false positives, and the 5% longest distances in the non-random pool should thus be reclassified as random events.

comments powered by Disqus


Back to the index

Blog roll

R-bloggers, Debian Weekly
Valid XHTML 1.0 Strict [Valid RSS] Valid CSS! Emacs Muse Last modified: oktober 17, 2019