Anomaly Detection in Driving Behaviour by Road Profiling - IEEE Xplore

Report 3 Downloads 67 Views
2013 IEEE Intelligent Vehicles Symposium (IV) June 23-26, 2013, Gold Coast, Australia

Anomaly Detection in Driving Behaviour by Road Profiling Gabriel Agamennoni, James R. Ward, Stewart Worrall and Eduardo M. Nebot∗

Abstract— This paper presents a statistical method for detecting anomalous driving behavior by analyzing cross-sectional profiles of the road. A profile captures the way vehicles normally traverse the road; a statistical hypothesis test determines whether the observed behavior is anomalous. Experimental results on genuine data collected by a fleet of vehicles demonstrates the potential of this new method.

A. Mathematical abstraction of the road map Let us represent a digital road map as an embedded graph. A graph G = (V, E) with vertices V and edges E ⊆ V × V , respectively, is said to be embedded in a set S if it is equipped with an operator s:V ×V →S

I. INTRODUCTION Over the last years, the convergence of digital road maps and global navigation satellite systems has facilitated a large number of electronic vehicle navigation and guidance technologies. However, there exists a much wider range of potential applications beyond those aimed at enhancing driver convenience. Massive volumes of position data offers the possibility of collecting meaningful statistics about the preferences and behavior of road users as well as detect unusual events.

that assigns each e = (i, j) ∈ E a unique s (i, j) ∈ S. For our purposes we take S to be a set of line segments in d-dimensional Euclidean space. Namely, s (i, j) = {(1 − t) vi + t vj : t ∈ [0, 1]} , where (vi ) ⊂ Rd is the sequence of segment endpoints. This representation provides a simple yet powerful abstraction. Road lanes are polygonal lines in the graph and lane splits and merges are vertices with in-degree or out-degree strictly greater than one.1

A. Rundown of the problem In this paper we are concerned with how to detect anomalous driving behavior by comparing a sequence of position data against historical records. Our assumptions are as follows: a. Only low-frequency position data are collected; b. The data are not labeled (i.e. both normal and anomalous sequences are mixed together); and c. An enhanced digital road map is available. With this in mind, we propose a statistically-motivated method that is principled, inexpensive and applicable on-thefly.

B. Matching data to graph edges Suppose we are given a sequence x = (x1 , . . . , xn ) ⊂ Rd of data. We wish to match these data to their corresponding sequence e = (e1 , . . . , en ) ⊂ E of edges in the graph. In order to do so, we define a pair of suitable cost functions and then solve the associated minimization problem via a dynamic programming algorithm. We define the edge cost function as follows:  0 if ik−1 = jk and (ik , jk ) ∈ E, f (ik−1 , jk−1 , ik , jk ) = −1 otherwise, 

B. Outline of the paper Section II describes how position data are transformed to lateral distances by matching with respect to the road map. Section III then explains how these data are processed to build profiles of road segments. Next, given a new sequence of data, section IV proposes a statistical test for deciding when a driver’s behavior is out of the ordinary. Section V presents experimental results on real data gathered by ground traffic in a large unstructured intersection. Finally, section VI concludes the paper and outlines directions for future research.

where  is a small but positive constant that accounts for edge transitions that do not respect the graph (e.g. lane violations, u-turns). The vertex cost function is defined as: g (xk , ik , jk ) = min (1 − t) vik + t vjk − xk  , t∈[0,1]

where t is obtained by orthogonally projecting xk onto the − v→ and restricting its value to the unit interval. line − vi− k jk Matching data to graph edges is equivalent to solving the following minimization problem:  n  n   min f (ik−1 , jk−1 , ik , jk ) + g (xk , ik , jk )

II. MAP MATCHING The first step, given a sequence of vehicle position data, is to match this sequence to the digital road map. A matching specifies which segment on the map corresponds to each of the latitude-longitude pair in the sequence.

(ek ) : ek =(ik ,jk )

k=1

The solution is found in O (n) time and memory by running a Viterbi-like algorithm [1].

∗ The authors are with the Australian Centre for Field Robotics at the University of Sydney, New South Wales, Australia.

978-1-4673-2754-1/13/$31.00 ©2013 IEEE

k=2

1 Note that we regard roads as having a specific orientation, i.e. a bidirectional road is considered as a pair of one-directional lanes.

25

C. Road map matching examples

D. Signed lateral distances After matching data to edges, we compute the signed lateral distances between vehicle positions and the road map. These distances are a by-product of our matching algorithm. Namely, the signed lateral distance yk for position xk is    yk = sign (xk − vik ) R π/2 (vjk − vik ) g (xk , ik , jk ) ,

Fig. 1 shows two road matching examples. The embedded graph, constructed by the method of principal road paths [2], is depicted as a collection of black arrows, each arrow representing a directed edge. The data are denoted by blue dashdotted line. Cyan-colored edges represent matches between vehicle positions and their corresponding road segments.



where Rθ =

cos θ −sin θ

sin θ cos θ



is a planar clockwise rotation matrix. In other words, the signed lateral distance is the shortest distance between a vehicle position and its associated road segment (i.e. the length of a cyan edge in fig. 1), weighted by ±1 according to whether the vehicle lies at the right (+1) or the left (−1) of the segment.

920 900 880

III. ROAD PROFILING The next step is to build a profile of the digital road map along its cross section. A cross-sectional profile summarizes the signed lateral distance data along each segment of the road map (i.e. each edge in the graph embedding).

860 840 820

880

A. The skew-normal distribution We build profiles by fitting parametric probability distributions to our data. The fitted parameters constitute the summary. Through exploratory analysis, we found that the skew-normal family of distributions [3] is a viable candidate in terms of its trade-off between goodness-of-fit and parametric complexity. Definition 1: A real-valued random variable y follows a skew-normal distribution, denoted by

860

y ∼ SN (μ, σ, α) ,

8700

8750

8800

(a) Vehicle performing an unusually wide turn.

920 900

if its probability density function is equal to  2  1 y−μ y−μ 1 exp − Φ α , p (y) =

2 σ σ π/2 σ (1) where Φ is the cumulative probability distribution function of a standard normal variable, i.e.

x 1 2 1 exp − t dt. Φ (x) = √ 2 2π −∞ Constants μ, σ and α, in that order, are the location, scale and shape parameters. The mean, standard deviation and root skewness of a skewnormal variable are, in that order,

840 820 8700

8750

8800

(b) Vehicle executing a forbidden u-turn.

Fig. 1: Anomalous cases arising when projecting position data onto the road graph. Observed positions are plotted as a blue dash-dotted line; cyan edges denote the projection of positions to their matching edge in the graph.

The upper panel shows a vehicle performing an unusually wide turn. A na¨ıve (i.e. point-by-point) matching would determine that the vehicle crosses over to the opposite lane. Our Viterbi algorithm, however, correctly ascertains that the vehicle turns late rather than changes lanes.

Mean [y] = μ + σδ,

Deviation [y] = σ 1 − δ 2 and  π  13 δ √ Skewness [y] = 2 − 2 1 − δ2 with δ defined as  α 2 √ δ= . π 1 + α2 Note from (1) that y ∼ N (μ, σ) for α = 0.

The lower panel shows another vehicle performing a forbidden u-turn. This maneuver is forbidden because it does not respect the structure imposed by the graph edges. Still, our edge cost function accounts for this (since non-zero ) and thus our Viterbi algorithm yields the correct matching. 26

B. Examples of cross-sectional profiles

A. The null hypothesis and test statistic

Fig. 2 shows three snapshots of the profile at three different segments. We fitted the parameters via maximum likelihood with the sn package [4] in the R programming language [5]. Each column in the figure corresponds to a particular segment of the digital road map and each row to a different view of the data and/or the fit. In the first row, the map is plotted in black —the current segment highlighted in yellow—and the position data in gray. In the middle row, the histogram of lateral distances appears in gray and two fits, normal and skew-normal, are depicted in blue and red. In the third row, the pair-wise probability plots2 for each fit are shown in their corresponding color. Segments 58, 28 and 59 were chosen so that they illustrate three typical cases. The first case is the occurs when vehicles drive precisely on the center of the lane so that lateral distances are normally-distributed. The normal and skewnormal densities for segment 58 are almost identical, and the probability plots indicate that both fit the data well. The second case arises when the data are strongly skewed, usually because of vehicles driving closer to the center than the shoulder of the road. For segment 28 the skew-normal accommodates the skewness in the data and thus produces a significantly better fit than the normal. The third case is the less frequent and arises when the data are neither normal nor skew-normal, which may be due to several different reasons. The density and probability plots for segment 59 suggest that the data are multi-modal, meaning that both distributions will be sub-optimal fits.

Assume we are given a sequence (z1 , . . . , zn ) of residuals and let β be a small positive constant. We want to test the null hypothesis, which we formulate as follows: H0 : zi |zi−1 ∼ SN (βzi−1 , 1, αi ) for i = 1, . . . , n. To do so, we define the test statistic (4)

and, assuming H0 holds true, we compute the probability of observing a statistic at least as extreme as s0 . If this probability, or p-value, is low enough (e.g. 1%) then we reject the notion that the residuals are distributed according to (3) and declare an anomaly. From corollary 1 in appendix I we know that ns0 |H0 ∼ χ2 (n) independently of (αi ). This allows us to conveniently compute the p-value from the upper tail of the χ2 cumulative distribution function. Specifically, n n  , s0 p=1−Γ (5) 2 2 where Γ is the regularized gamma function [6]. B. Accounting for temporal correlations In (3) we included an additional parameter β to account for the fact that residuals are not independent. They are correlated in time because of the sequential nature of our data. Intuitively, if a vehicle is driving near the edge of the road, it is likely to remain close to the edge for some period of time. This does not imply the behavior is more or less anomalous than in the first place. Omitting β (i.e. β = 0) would yield pessimistic results as the residuals would be judged on an individual basis rather than as a sequence. In addition, it would heighten the sensitivity to measurement noise and outliers. In our experiments we found that a value of β in the range 5–15% is adequate and produces qualitatively similar results.

After building our profile, we perform a linear transformation in order to standardize the data. Let yk be the signed lateral distance and μk , σk and αk the location, scale and shape parameters of the corresponding segment of the crosssectional profile. We defined the residual as yk − μ k . σk

1 2 (zi − βzi−1 ) n i=1 n

s0 

C. From signed lateral distances to residuals

zk 

(3)

(2)

If yk ∼ SN (μk , σk , αk ), then zk ∼ SN (0, 1, αk ) similarly to the normal distribution. From here on we assume that the vehicle position data has been matched to the digital road map and that the signed lateral distances have been normalized according to (2). Our data now comprises sequences of residuals. These are the data we feed to the anomaly detection algorithm.

C. Sliding window for adaptive detection To allow for adaptive detection, we apply a sliding window to our data. In other words, we compute (4) and (5) for subsequences of length m < n of the sequence of residuals. In our experiments we found that results are consistent for window lengths in the range 10–30.

IV. ANOMALY DETECTION

V. EXPERIMENTAL RESULTS

The next and final step is to decide how to detect anomalous behavior by processing sequences of residuals. We formulate this as a statistical test, where we may accept or reject a hypothesis based on its p-value.

A. Case study: traffic intersection The data we based our experiments on were gathered by a fleet of ground vehicles equipped with Global Positioning System (GPS) receivers. Our case study focuses on an intersection and its surrounding area. The intersection is not signalized and the road has no markings; berms on either side of the shoulders are the only physical delimiters.

2 A pair-wise probability plot, or P–P plot, contrasts two probability distribution functions against each other. Given two such functions, f and g, it plots the curve {(f (x) , g (x)) : x ∈ R} in the Cartesian plane.

27

1100

950

950

59

58 1050

900

1000

850 8650

8700

8750

28

850 8600

8800

900

8650

8700

8700

8750

8800

8850

(a) Digital road map, represented as a graph embedding, superimposed on the vehicle position data. Segment 58

Segment 28

0.2

Segment 59

0.35

0.2

Normal Skew−normal

Relative frequency

0.3 0.15

0.15

0.25 0.2

0.1

0.1 0.15 0.1

0.05

0.05

0.05 0 −10

−5

0

5

10

15

0 −10

−5 0 5 10 Lateral distance (meters)

15

0 −10

−5

0

5

10

15

(b) Cross-sectional profile along different segments of the digital road map (i.e. edges in the graph embedding).

Sample probabilities

Segment 58

Segment 28

Segment 59

1

1

1

0.8

0.8

0.8

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0 0

0.2

0.4

0.6

0.8

1

0 0

0.2 0.4 0.6 0.8 Model probabilities

1

0 0

Normal Skew−normal 0.2

0.4

0.6

0.8

1

(c) Pair-wise probability plots comparing empirical probabilities against fitted probabilities.

Fig. 2: Visualizations of the cross-sectional profile along three different segments. The left, middle and right columns correspond to segments 58, 28 and 59, respectively. The upper panels depict the digital road map (black), highlighting the relevant segment (yellow), and the vehicle position data (gray). The middle panels show a normalized histogram of the signed lateral distances matched to the corresponding segment (gray) together with normal (blue) and skew-normal (red) fits. The lower panels contain pair-wise probability plots as a means for visually comparing the two fits.

28

Figs. 3 and 4 show two typical examples of anomalous driving behavior detected by our method. Each figure is divided into two panels:

Trace 1 / Agent 42 920 Longitude (meters)

a. The upper panel depicts the digital road map (black) and the current sequence (blue) superimposed on the full set of data (gray); b. the lower panel plots the sequence of signed lateral distances and their corresponding probability values (blue) as they evolve through time, highlighting events in which the distance exceeds the 95% confidence intervals or the p-values fall below the 1% significance level (red).

900 880 860 840 820 800 8650

For these examples we set β = 1/10 and m = 20.

8700 8750 Latitude (meters)

8800

(a) Vehicle making a forbidden u-turn. Trace 1 (Agent 42) Probability values Lateral distance (%) (meters)

Trace 59 / Agent 51

Longitude (meters)

1000

950

5 0 −5

100

900

850 8650

8700 8750 Latitude (meters)

50 0 0

8800

10

20

30 40 50 Elapsed time (seconds)

60

70

(b) Signed lateral distances and probability values.

(a) Vehicle taking an overly tight turn.

Fig. 4: A vehicle making a forbidden u-turn (upper panel) is detected as an anomalous event. Unusual patterns in the signed lateral distances cause the probability values to drop below 1% (lower panel).

Trace 59 / Agent 51 Probability values Lateral distance (%) (meters)

10

15 10 5 0

B. Discussion

−5 100

In general, the statistical test reacts quickly to the anomaly. While it takes time to build up confidence (e.g. the sequence of p-values rises slowly), the null hypothesis is rejected almost instantly as soon as an atypical event occurs. Furthermore, it is consistent (i.e. the p-values remain significant throughout the event).

50 0 0

10

20 30 40 Elapsed time (seconds)

50

60

(b) Signed lateral distances and probability values.

VI. SUMMARY AND CONCLUSION

Fig. 3: A vehicle taking an overly tight turn (upper panel) is flagged as an anomaly. A sharp increase in the signed lateral distance causes the probability values to fall below beyond significance levels (lower panel).

This paper proposed a statistical method capable of detecting anomalous driving behavior from vehicle position data. The method consists of building a cross-sectional profile of a digital road map and performing a hypothesis test on the sequence of residuals. Our experiments showed that the test is sensitive to anomalous behavior and is capable of detecting tight/wide turns and u-turns. In addition, it reacts quickly and consistently. Looking into ways of extending our approach to allow for more general distributional assumptions is an interesting direction for future research. For instance, elliptical distributions [7] are attractive due to their generality and their compact parametrization. Another aspect that deserves attention is incorporating additional sources of information.

In fig. 3 a vehicle makes a tight right turn at the intersection, momentarily driving on the center of the road. The signed lateral distance rises sharply and the p-values quickly drop below the significance level, signaling an anomalous event. In fig. 4 a vehicle makes a forbidden u-turn but, although it stays close to the road at all times. What gives it away is the violent fluctuation of the distance; the incoherent pattern causes the p-value to shoot below 1%, indicating detection. 29

For example, many GPS devices provide heading and speed estimates that could be combined with position and incorporated into a more comprehensive test.

R EFERENCES [1] A. Viterbi, “Error bounds for convolutional codes and an asymptotically optimum decoding algorithm,” IEEE Transactions on Information Theory, vol. 13, no. 2, pp. 260–269, April 1967. [2] G. Agamennoni, J. Nieto, and E. Nebot, “Robust inference of principal road paths for intelligent transportation systems,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 1, pp. 298–308, March 2011. [3] A. Azzalini and A. Capitanio, “Statistical applications of the multivariate skew-normal distribution,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 61, no. 3, pp. 579–602, 1999. [4] A. Azzalini, R package sn:: The skew-normal and skew-t distributions (version 0.4-17), Universit`a di Padova, Italia, 2011. [Online]. Available: http://azzalini.stat.unipd.it/SN [5] R Development Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008. [Online]. Available: http://www.R-project.org [6] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, with Formulas, Graphs and Mathematical Tables. Dover Publications, Incorporated, 1974. [7] K.-T. Fang, S. Kotz, and K. Ng, Symmetric Multi-variate and Related Distributions. Chapman & Hall, 1987.

A PPENDIX I PROPOSITIONS AND PROOFS Definition 2: An n-dimensional, real-valued random variable x follows a multi-variate skew-elliptical distribution if its probability density function is of the form 1  2 x x s (x) , p (x) = g cn 2 where g is such that cn =

n/2

(2π) Γ ( n/2 )

∞ 0

t n/2 −1 g (t) dt < ∞,

Γ being the Gamma function, and s is such that 0 ≤ s (x) ≤ 1 and s (−x) = 1 − s (x) for all x ∈ Rn . Functions g and s are the density generator and the skewing function, respectively. Proposition 1: Let x be an n-dimensional skew-elliptical random variable. If h : Rn → R is an even function, i.e. h (−x) = h (x) for all x ∈ Rn , then the distribution of h (x) does not depend on s. Proof Consider the characteristic function of h (x), ϕh(x) (t) = E [exp (ih (x) t)] , and let w ∈ Rn be an arbitrary non-zero vector. Since g is symmetric about the origin,

1  2 x x s (x) dx exp (ih (x) t) g ϕh(x) (t) = cn 2

1  2 x x s (x) dx = exp (ih (x) t) g cn xw≥0 2

1  2 + x x s (x) dx exp (ih (x) t) g cn xw