Collusion Detection for Grid Computing Eugen Staab and Thomas Engel University of Luxembourg
[email protected] 9th IEEE/ACM CCGrid Symposium Shanghai, May 18–21, 2009
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Outline Introduction Background & Assumptions The Approach Basic Idea The Algorithm Analysis and Simulation Theoretical Analysis Simulation Conclusion
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 2 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Motivation Result Verification I
Setting: Computations are outsourced to autonomous “workers” (e.g. SETI@home [1]).
I
How to make sure that results are correct? Common approach (“redundancy”/“replication”):
I
I I
A computation is outsourced to multiple workers. Correctness assumed if results are the same.
Problem for Redundancy: Collusion I
Several workers work together (“collude”), and
I
return the same incorrect result.
I
Aim: save resources/harm the master/. . .
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 3 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Approach & Contribution
Research Question How to prevent collusion?
Approach We propose an algorithm for collusion detection: I
Determines how similar workers behave.
I
Uses graph clustering to cluster workers accordingly.
I
Allows to detect collusion.
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 4 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Background Model I
Master-Worker model of computation: 1. Master repeatedly outsources “work units” to workers. 2. Workers are expected to return correct results.
I
Result checking through majority voting: 1. Work unit outsourced to 2m − 1 different workers. 2. Voting on results: Majority of matching results is accepted.
Collusion Attack I
Assume: Colluders can communicate without delay.
I
Incorrect result accepted if it occurs more often than correct result.
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 5 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Attacker Models Unconditional Collusion (UC) In all votes in which an unconditional colluder is involved, he returns the same incorrect result as other colluders in this vote.
Conditional Collusion (CC) I
Conditional colluders collude only if they know that at least m workers in the vote are colluders (⇒ they will win the vote).
I
Otherwise they return correct result.
CC requires that colluders know m ⇒ argues for a variation of m.
Assumption We assume less than 50% of colluders in the population. Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 6 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Basic Idea
Basic Idea I I
Observe a number of voting outcomes. Count for each pair of workers how often: I I
I
they are together in the majority/minority of a vote (α) they are in opposite groups (β)
Estimate for each pair of workers correlation: sample correlation :=
I I
α+1 α+β+2
probability for being in the same group (maj./min.) of a vote measure of similarity
I
Cluster workers by sample correlation.
I
Honest workers correlate strongly, and form a big cluster.
I
Workers in smaller clusters are suspects.
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 7 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Basic Idea
A First Impression A correlation matrix contains estimated correlations of each pair of workers. Shown for UC after a certain number of votes: 100
1
80
0.8
60
0.6
worker j
correlation 40
0.4
20
0.2
0
0 0
20
40
60
80
100
worker i Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 8 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
The Algorithm
Colusion Detection Algorithm Parameters I
a set N of worker IDs,
I
a data structure C that contains the αs and βs (how often a pair of workers was together in maj./min., and how often not),
I
a parameter P for the clustering algorithm.
1: 2: 3: 4: 5: 6: 7: 8:
procedure Detect Collusion(N, C, P ) M ← Compute Correlation Matrix(C) G ← Construct Graph(M ) {C1 , . . . , Ck } ← cluster(G, P ) Cmax ← max(C1 , . . . , Ck ) . Select largest cluster S ← N \ Cmax . Take all but largest cluster return S . Return IDs of suspects end procedure
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 9 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
The Algorithm
Collusion Detection Illustrated w1
w2
w3
suspect workers
w4 honest workers
w5
w6
1. Outsource work units to workers w1 , . . . , w6 , count α and β. 2. Compute the correlation: Find high and low correlation. 3. Cluster graph. 4. Workers in largest cluster are assumed to be honest. 5. Return all other workers as suspects. Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 10 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Theoretical Analysis
Theoretical Analysis
Does correlation really separate honest workers from colluders?
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 11 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Theoretical Analysis
Theoretical Correlation (Details in the paper. . . ) pc :=p({a, b} ⊆ 4 ∨ {a, b} ⊆ 5) =p({a, b} ⊆ 4) + p({a, b} ⊆ 5) .
(1) (2)
p({a, b} ⊆ 4) =pf 2 + 2(pf (1 − pf ))P (1)
(3) (4)
+ (1 − pf )2 P (2) + 2(pf (1 − pf ))pi P (k) :=
2m−3 X i=k
·
j=0 Collusion Detection for Grid Computing
,
(6)
2m − 3 pmal i i
min(2m−3−i, i−k)
X
(5) 2m−3
(7) 2m − 3 − i ph j pi 2m−3−i−j . j — University of Luxembourg, 12 / 26
[email protected] m−1
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Theoretical Analysis
Theoretical Analysis (UC) Correlation in the presence of UC: 1 0.8 0.6 correlation 0.4
both colluders both honest 1 colluder, 1 honest
0.2 0 0
0.1
Collusion Detection for Grid Computing
0.2 0.3 0.4 fraction of colluders
0.5
[email protected] — University of Luxembourg, 13 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Theoretical Analysis
Theoretical Analysis (CC) Correlation in the presence of CC: 1 0.8 0.6 correlation 0.4
both colluders both honest 1 colluder, 1 honest
0.2 0 0
0.1
Collusion Detection for Grid Computing
0.2 0.3 0.4 fraction of colluders
0.5
[email protected] — University of Luxembourg, 14 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Simulation
Implementation Main part Implementation is straightforward.
Graph clustering Used 2 different graph clustering algorithms: I
MinCTC [3] I I I
I
based on minimum cut trees for graphs. one parameter, automatically determined (see paper). preprocessed correlation values with varying exponent θ.
MCL [4] I I
based on simulation of stochastic flows in graphs. one parameter (I).
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 15 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Simulation
Simulation
Simulations are run for: I
Random selection of workers.
I
Average accuracy is determined over 1000 runs.
I
Fraction of 0.1 colluders (UC or CC). Vary:
I
I I
cluster parameters average number of observed votes for a pair of workers
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 16 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Simulation
Accuracy Measure Output of our algorithm: ``` ``` worker is: ``` output is: `` “suspect” “not suspect”
malicious
honest
true positive (tp) false negative (fn)
false positive (fp) true negative (tn)
F-Measure Reflects all classes (tp, tn, fp and fn): F-Measure :=
2 · #tp 2 · #tp + #fp + #fn
Examples: I
F-Measure = 1 iff no fp and no fn
I
F-Measure = 0 iff no tp but #fp + #fn > 0
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 17 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Simulation
Evaluation for MinCTC and UC
F-measure
1 0.8 0.6 14 12 10
0.4 0.2 0
20 18 16 14 12 10 8 θ
Collusion Detection for Grid Computing
6
4
2
0 0
2
4
8 6 #obs./edge
[email protected] — University of Luxembourg, 18 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Simulation
Evaluation for MCL and UC
F-measure
1 0.8 0.6 14 12 10
0.4 0.2 0
30
25
20 I
Collusion Detection for Grid Computing
15
10
5
0 0
2
4
8 6 #obs./edge
[email protected] — University of Luxembourg, 19 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Simulation
Evaluation for MCL and CC
F-measure
1 0.8 0.6 0.4 0.2 0
30
25
20 I
15
10
5
0 0
60 50 40 30 20 #obs./edge 10
(element-wise exponentiation with θ = 20)
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 20 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Simulation
Runtime Comparison 90 80 70
MinCTC MCL
seconds
60 50 40 30 20 10 0 200
400
600
800
1000
number of hosts
Both algorithms are (theoretically) in O(n3 ).
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 21 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Related Work
Silaghi et al. [2] proposed collusion detection mechanism: I
Count how many votes a node gets against.
I
Need less observations.
I
Use again redundancy to detect CC attackers.
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 22 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Summary & Conclusions Problem: Collusion in Redundancy with Majority Voting. Approach: 1. Measure the similarity of workers in past votings. 2. Cluster the workers. 3. Workers not in the main cluster are suspects. Accuracy: I
Possible to detect UC and CC attackers.
I
Many observations are needed.
I
Helpful: Vary redundancy parameter m to avoid CC attackers.
Performance: I
Most costly: Graph clustering (in O(n3 )).
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 23 / 26
Introduction
Background & Assumptions
The Approach
Analysis and Simulation
Conclusion
Future Work I
Reduction of needed observations. 100
1
80
0.8
60
0.6
40
0.4
20
0.2
worker j
correlation
0
0 0
20
40
60
80
100
worker i
I I I
Use pattern recognition techniques for detection? Complement the mechanism with a spot-checking component. Idea: Use m = 2 and sometimes m = 3. Makes CC harder. Overhead?
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 24 / 26
References [1]
David P. Anderson, Jeff Cobb, Eric Korpela, Matt Lebofsky, and Dan Werthimer. SETI@home: an experiment in public-resource computing. Commun. ACM, 45(11):56–61, 2002.
[2]
Gheorghe Cosmin Silaghi, Filipe Araujo, Luis Moura Silva, Patricio Domingues, and Alvaro Arenas. Defeating colluding nodes in desktop grid computing platforms. In Proc. of the 2nd Workshop on Desktop Grids and Volunteer Computing (PCGrid ’08). IEEE Computer Society, 2008.
[3]
Gary William Flake, Robert Endre Tarjan, and Kostas Tsioutsiouliklis. Graph clustering and minimum cut trees. Internet Mathematics, 1(4), 2004.
[4]
Stijn van Dongen. Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, 2000.
Collusion Detection for Grid Computing
[email protected] — University of Luxembourg, 25 / 26
Thanks for your attention! Questions?
(source: http://www.cultural-china.com)
This presentation can be found on my website: http://wiki.uni.lu/secan-lab/Eugen+Staab.html (→ simply google “Eugen Staab”).