462
IEEE COMMUNICATIONS LETTERS, VOL. 13, NO. 6, JUNE 2009
Detecting Anomalies in Network Traffic Using the Method of Remaining Elements P. Velarde-Alvarado, C. Vargas-Rosales, Senior Member, IEEE, D. Torres-Roman, and A. Martinez-Herrera
Abstract—Attacks, such as port scans, DDoS and worms, threaten the functionality and reliability of IP networks. Early and accurate detection is crucial to mitigate their impact. We use the Method of Remaining Elements (MRE) to detect anomalies based on the characterization of traffic features through a proportional uncertainty measure. MRE has the functionality and performance to detect abnormal behavior and serve as the foundation for next generation network intrusion detection systems.
use of a modified version of the low-bias balanced estimator in [5]. For a discrete dataset X, which can take a finite number, M , of possible values X = {x1 , . . . , xM }; the balanced estimator for a data set of size N , is defined as ⎡ ⎤ M N +2 1 1 bal (X) = ⎣(nk + 1) ⎦, H (1) S N +2 j j=n +2
Index Terms—Anomaly detection, traffic anomalies, entropy based intrusion detection.
with nk number of counts value xk appears in the set. The second summation in (1) is a partial harmonic series using the Euler-Mascheroni constant, [6], γ = 0.5772156649 · · ·, with asymptotic expansion of the n-th harmonic number as Hn ∼ log (n) + γ + (1/2)n−1 − (1/12)n−2 + (1/120)n−4 − (1/250)n−6 + · · · , which gives the following
T
I. I NTRODUCTION
RADITIONAL security measures, like firewalls or antivirus solutions, are not sufficient for the variety and sophistication of attacks. Early detection of potential attacks is crucial to mitigate their impact. Intrusion Detection Systems (IDS) based on entropy, [1], can be effective because malicious activities change the network traffic nature. Anomalies are characterized by unusual and significant changes in patterns of network activities disrupting behavior of traffic features. The entropy of intrinsic features extracted from packet headers e.g., source IP, destination IP, source port and destination port numbers, [2], does not provide sufficient sensitivity to detect some attacks that are short-term or distributed over time. We propose the use of proportional uncertainty (P U ) to determine the remaining values of sequences of those intrinsic features, since it provides better sensitivity to define the cutoff between remnants and significant elements than that of the relative uncertainty (RU ) in [3]. Our results indicate that by adjusting time-slot duration and cutoff threshold β in the remaining calculations, anomalies are exposed with relatively high levels of remnant elements with respect to typical behavior. II. M EASURES OF E NTROPY The application of Shannon’s entropy, [4], in anomaly detection has disadvantages since short-term or attacks distributed over time are not clearly detected because uncertainty is either negligible or distributed as well, then, we propose the Manuscript received March 22, 2009. The associate editor coordinating the review of this letter and approving it for publication was G. Lazarou. This work was partially sponsored by SEP-CONACyT project 61183 and CONACyT project 67360Y. C. Vargas-Rosales and A. Martinez-Herrera are with the Center for Electronics and Telecommunications, ITESM-Campus Monterrey, Monterrey, N.L., 64849, Mexico (e-mail:
[email protected]). P. Velarde-Alvarado is with Autonomous University of Nayarit (e-mail:
[email protected]). D. Torres-Roman is with CINVESTAV Guadalajara, Mexico (e-mail:
[email protected]). Digital Object Identifier 10.1109/LCOMM.2009.090689
k=1
N +2 j=nk +2
1 j
=
N +2 j=1
1 j
−
k
n k +1 j=1
1 j
= HN +2 − H
nk +1 N +2 = log nk +1 + ρN +2 − ρnk +1 ,
(2)
where (ρN +2 − ρnk +1 ) approaches zero when N and nk increase indefinitely. This simplification gives us a more efficient computational formula expressed as
M N +2 bal−II (X) = 1 H . (3) (nk + 1) log N +2 nk + 1 k=1
Maximum uncertainty in (3) occurs when nk = 1 for k = 1, 2, . . . , M , which results in N = M , hence from (3), we get
M +2 2M bal−II HMAX (X) = log , (4) M +2 2 which exceeds the upper bound log(M ) in Shannon’s entropy. Equation (3) is our measure of entropy, [7], and has a significant effect on data sets for which M is closed to N , which is related to anomalous activities, i.e., when high diversity in IP addresses or port numbers occurs, (e.g., port and IP scanning or DoS attacks). Using (3) to calculate the PU in MRE, we achieve a superior and controlled anomaly exposure than that using Shannon’s entropy estimator as in [3], and in [8]. III. T HE M ETHOD OF R EMAINING E LEMENTS A. Proportional Uncertainty PU provides an index of uncertainty with respect to the maximum Shannon’s entropy, i.e., the ratio of (3) to log(M ). Considering (4), we can see that such ratio will be bounded above, thus for data set X and taking the limit as the alphabet size increases, we define P U as, M+2 2M bal−II (X) H M+2 log 2 ≤ lim = 2, (5) P U (X) = M→∞ log(M ) log(M )
c 2009 IEEE 1089-7798/09$25.00
Authorized licensed use limited to: CINVESTAV IPN. Downloaded on July 15, 2009 at 11:16 from IEEE Xplore. Restrictions apply.
VELARDE-ALVARADO et al.: DETECTING ANOMALIES IN NETWORK TRAFFIC USING THE METHOD OF REMAINING ELEMENTS
1: Parameters: Sir , β r 2: define alphabet from Sir and Items = |Si | 3: if Items = = 1 r 4: Significant = 0, Ri = Items 5: else 6: compute PU(Sir , β ) 7: if PU >= β r 8: Significant = 0, Ri = Items 9: else 10: build table T : sort T in decreasing order PU = 0, j = 1 12: r 13: while PU 2 do 14: Sir = Sir \ T (b( j )) 15: compute PU (Sir , β ) 16: j++ 17: end while 18: Significant = j − 1 Rir = items – Significant 19: 20: end if 21: end if
1.9
1.8
1.7
1.5
1.4
M=20 PUMax-rel=1.455
1.2
1.1
1 0 10
1
10
2
10
10 3
10 4
M
Maximum values for P UM ax−rel for different alphabet sizes M .
40
35
35
30
β=1, r = 1
25
Fix a maximum slot size of td seconds. Let τ be a traffic trace divided into non-overlapping slots, each with duration t ≤ td . An i-slot is composed of Wi packets, and four sequences extracted from the packet headers Sir , r = 1, 2, 3, 4, each of length Wi , symbolizing the four intrinsic features of source IP, destination IP, source port and destination port numbers, respectively. The remaining in Sir for an intrinsic feature r in slot-i is denoted by 1 ≤ Rir ≤ M , where M is the alphabet size of Sir . Remaining Rir is the cardinality of the residual alphabet of Sir that attains β ≤ P U (Sir ) ≤ P UMax-rel as a result of an iterative process of selective extraction of significant elements, i.e., elements with higher occurrences in the original sequence Sir . Algorithm 1, shown in Figure 2, presents a pseudo-code to calculate these remaining elements with a threshold β. As β approaches P UMax-rel from below, the remaining values for sequences Sir with |Sir | < M are vanished to the minimum value, prevailing only the values for |Sir | ≥ M . Thus, the action of β approaching P UMax-rel (M ) is a control parameter that allows one to filter out normal behavior in the traffic trace and emphasize the anomalies. Figure 3 shows the remaining elements of IP source address feature of a traffic trace with a port scan at time slots 95 to 162 for values of β = 1, and β = 1.4, respectively. It can clearly be seen the filtering effect of such parameter in the anomaly detection. To define a baseline that characterizes normal behavior for the remaining in a sequence, one must use traffic traces considered anomaly-free. The next section explains the set of traces used in this letter.
15
15
10
10
5
5
1000
1500
2000
2500
3000
3000
β=1.4, r = 1
25
20
500
2500
30
B. Remnants in a set
0 0
50
40
45
45
40
45
35
50 30
50
25
which needs the condition M > 1, i.e., log(M ) = 0 which implies M = 1. Let P UMax-rel be the maximum relative P U for a given data set X with alphabet size M , i.e., bal−II (X)/log(M ). Figure 1 shows, for P UMax-rel (X) = H MAX different alphabet sizes M , the values of P UMax-rel that are used to define the upper bound for a cutoff parameter β, which controls the sensitivity to compute the remnants in a data set. Values of interest are 1 ≤ β ≤ P UMax-rel .
Fig. 2. Calculation of the remaining Rri in a sequence Sir for a given β. Table T, consists of (a, b) value pairs, a means frequency and b is a specific r-instance (i.e., IP address or port number).
Remaining, R 1i
Fig. 1.
Remaining, R 1i
P U M ax -rel
1.6
1.3
463
20
0 0
500
1000
Time slot i
1500
2000
Time slot i
Fig. 3. Remaining values R1i for β = 1, 1.4, td = 0.5, for trace SC1-D6-01.
IV. E XPERIMENTAL R ESULTS A. Experimental Platform and Data sets The evaluation of MRE was conducted in two different scenarios: the first scenario (SC1) is an academic LAN, previously described in [8], where thirty TCP traffic traces during typical working hours were collected. These traces were arranged into five data sets (D1 to D5) of six traces each (01 - 06), to be used for training purposes. In addition, data set D6 contains a trace with two worm attacks (Blaster and Sasser) and a port scan attack on its proxy server. The second scenario (SC2), based on a sub-set of the 1998 MITDARPA data, [9], public benchmark for testing IDS, adds five more attacks to our experiments. B. Performance evaluation Considering td ≤ 0.5, and a cut-off threshold of β = 1, we can see that the behavior of the remaining measurements Rir for traces of typical traffic, (data set D1 to D5), is similar in terms of its mean μ, variance σ 2 , and intensity factor (σ 2 /μ).
Authorized licensed use limited to: CINVESTAV IPN. Downloaded on July 15, 2009 at 11:16 from IEEE Xplore. Restrictions apply.
464
IEEE COMMUNICATIONS LETTERS, VOL. 13, NO. 6, JUNE 2009
TABLE I B EHAVIOR OF Rri ON DATA SET 5.
Trace, r = SC1-D5-01 SC1-D5-02 SC1-D5-03 SC1-D5-04 SC1-D5-05 SC1-D5-06
Intensity Factor 1 2 3 4 2.6 2.3 4.0 5.2 3.2 2.6 3.8 5.9 3.0 2.8 4.2 5.9 4.8 3.6 4.8 6.1 2.7 2.5 4.1 5.6 2.2 2.2 3.7 5.2
1 26 40 39 38 39 30
Maximum 2 3 36 87 46 70 53 72 39 46 49 69 33 64
4 94 73 50 62 50 77
TABLE II R EPORT OF ANOMALIES OF Rri .
SC1-D6-01 Compromised Slot r-feature PDT % 95 − 162 1, 4 5.3 − 142 3790 − 8116 2, 3, 4 5 − 1595 10733 − 12860 1, 2, 3, 4 5 − 1380 SC2-D1-01 Compromised Slot r-feature PDT % 10 − 11 1, 2 200 2 − 71 1, 2 328 − 506 5550 3 140 7743 3, 4 133 607 − 698 3, 4 300
Attack portscan blaster sasser
Upon the application of MRE to the traces, we find that the anomalies, at the moment of occurrence, generate values of remnants in a range over the defined threshold for normal traffic. In Table II, we can see, for instance, that trace SC1-D601 contains three attacks: a port scan and two worm attacks. For the port attack, the application of MRE shows a cluster of slots with peaks of remaining values for the intrinsic feature r = 1, the compromised slots are 95 to 162, and the percentage of the difference of these values and the threshold T1 , denoted as PDT, is within 5.3 and 142%. This PDT shows a clear indication of the presence of the attack. Using a small td allows timely anomaly detection, however, some attacks are distributed in time to avoid detection; this is the case of the ipsweep attack. In a similar way, Table II also summarizes the report of the anomalies detected using MRE in the traces of scenario SC2 using a slot size td ≤ 0.5 seconds, except for the ipsweep attack where we used a slot size td ≤ 60 seconds. Hence, it is necessary to consider simultaneously at least two slot sizes, one for early detection and another for time distributed attacks. V. C ONCLUSIONS
Attack POD smurf neptune portsweep ipsweep
Table I presents a sample of a typical behavior of intensity factor and maximum for Rir in the six traces that form the data set D5 of scenario SC1. In Table I, the numbers 1 to 4 refer to the values of r, i.e., the intrinsic features. These statistical values for each r-feature in Table I can be considered within a typical range. Malicious activities such as port scanning and DDoS attacks increase the randomness of the sequences Sir causing the cardinality of their residual alphabets, i.e., Rir , to increase, even over the maximum shown in Table I. Exposure of anomalies is achieved by increasing the cut off threshold β = 1 to an exposure threshold βr . These values are bounded by βr < P UMax-rel (Tr ), where values Tr are the remaining thresholds for exposure defined in the training phase for each r-feature. Tr , r = 1, 2, 3, 4, are obtained in an experimental way using the data sets for the training phase with β = 1 and a given slot size td . These thresholds vary according to defined policies, e.g., a policy may allow some level of benign scanning. For instance, for SC1, the empirical results with β = 1 and td ≤ 0.5 allowed to define the thresholds T1 = T2 = 20, and T3 = T4 = 40. The corresponding cut-off thresholds βr for SC1 are determined by means of Figure 1, for example, for T1 = T2 = 20, we can choose empirically the values β1 = β2 = 1.4, similarly for β3 = β4 = 1.50.
In this letter, we propose the use of MRE for network traffic anomaly detection. The experimental results show that MRE characterizes the behavior of network traffic through the remaining measurements. The profiles built in the training phase help to identify the presence of anomalies arising from various kinds of attacks. By varying the exposure threshold βr , it is possible to highlight the slots in which the anomaly occurs. MRE also needs to use simultaneously at least two slot sizes. As future work, we plan to further investigate the influence of exposure threshold βr for the process of exposure of anomalies in reducing false positives caused by benign scans. Also, a possible hardware implementation for MRE seeing the process as a discrete-time filter could be promising. R EFERENCES [1] A. Nucci and S. Bannerman, “Controlled chaos,” IEEE Spectrum, vol. 44, no. 12, pp. 42-48, Dec. 2007. [2] A. Wagner and B. Plattner, “Entropy based worm anomaly detection in fast IP netwoks,” in Proc. 14th IEEE International Workshop on Enabling Technologies, pp. 172-177, 2005. [3] K. Xu, Z. L. Zhang, and S. Bhattacharyya, “Internet traffic behavior profiling for network security monitoring,” IEEE/ACM Trans. Networking, vol. 16, no. 6, pp. 1241-1252, Dec. 2008. [4] J. Aczl and Z. Darczy, “On measures of information and their characterizations,” Mathematics in Science and Engineering. Academic Press, vol. 115, pp. 26-29, 1975. [5] J. Bonachela, H. Hinrichsen, and A. Munoz, “Entropy estimates of small data sets,” J. Phys. A: Math. Theor., vol. 41, Apr. 2008 [6] J. H. Conway and R. K. Guy, The Book of Numbers. New York: SpringerVerlag, pp. 143 and 258-262, 1996. [7] J. N. Kapur, “Four families of measures of entropy,” Indian J. Pure Appl. Math., vol. 17, pp. 429-449, Apr. 1986. [8] P. Velarde-Alvarado, C. Vargas-Rosales, D. Torres-Roman, and A. Martinez-Herrera, “Entropy-based profiles for intrusion detection in LAN traffic,” Res. in Computing Science, vol. 40, pp. 119-130, 2008. [9] Lincoln Laboratory, MIT. DARPA Intrusion Detection Data Sets, http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data
Authorized licensed use limited to: CINVESTAV IPN. Downloaded on July 15, 2009 at 11:16 from IEEE Xplore. Restrictions apply.