UNIQUE IDENTIFICATION OF RADIO FREQUENCY IDENTIFICATION ...

Report 2 Downloads 96 Views
UNIQUE IDENTIFICATION OF RADIO FREQUENCY IDENTIFICATION TAGS Corey Miller Advisor: Dr. Mark Hinders Department of Applied Science Nondestructive Evaluation Laboratory The College of William and Mary

Abstract

mation about the type of object, but also a unique serial number used to distinguish the object individ-

Radio Frequency Identication (RFID) tags are used

ually. RFID tags also eliminate the need for line-of-

in credit cards and passports for automatic identity

sight scanning that barcodes have, avoiding scanning

recognition and expense transfers as well as through-

orientation requirements.

out the supply chain to track inventory. The unauthorized electronic reproduction of these RFID signals is easily performed despite the use of developed encryption methods and can lead to critical security breaches and lost inventory.

A method for de-

termining spoofed RFID tags is presented based on ngerprinting unintentional modulations in the electromagnetic signal of RF emitters. Improvements to existing supervised pattern classication techniques are presented, utilizing the Dynamic Wavelet Fingerprint (DWFP) technique for feature extraction.

While RFID technology is the primary focus of our research, another similar form short-range wireless communication technology is known as near eld communication (NFC). Compatible with already existing RFID infrastructures, NFC involves an initiator that generates an RF eld and a passive target, although interactions between two powered devices are possible. The smartphone industry is one of the leading areas for NFC research, as many manufacturers have already begun putting NFC technology to their products. With applications enabling users to pay for such items as groceries and subway tick-

Introduction

ets by waving their phone in front a machine, NFC payment systems are an attractive alternative to the are

multitude of credit cards available today. Air France

from

is testing a new Pass and Fly boarding program to

tracking supplies to allowing remote building access

evaluate NFC-based boarding passes on specic do-

through secure ID badges. RFID tags contain an an-

mestic ight routes [1] [2]. Using NFC-enabled mo-

tenna used to receive and transmit the RF signals,

bile phones, passengers have the option to swipe their

and an integrated circuit used to process and modu-

phones at a Pass and Fly reader where the machine

late/demodulate signals. Passive RFID tags harness

identies the traveler and uploads a digital boarding

the energy required for signal manipulation from the

pass onto their phone. At the subsequent security in-

RF waves themselves, while active RFID tags use an

spection point, the traveler simply swipes their phone

onboard power source.

Since they don't contain a

across another NFC reader which displays the board-

power source, passive tags can be made extremely

ing pass to security ocers. From there, passengers

thin, to the point where conductive inks can be used

need only to verify identication with the airline sta

to literally print the antenna design as needed. This

as a third NFC reader checks the boarding pass and

continuous decrease in the price of technology has

prints the passenger's seat information. This simpli-

led the way for RFID tags to become a cost sav-

ed process involving instant passenger recognition

ing replacement for barcodes.

and paperless boarding passes promises a time-saving

Radio

frequency

widespread

identication

throughout

the

(RFID)

modern

tags

world,

Instead of relaying a

sequence of numbers known as a Universal Product

and ecient airport experience for travelers.

Code (UPC) that identies only the type of object a

With both RFID tracking and NFC applications,

barcode is attached to, RFID tags use an Electronic

security is an important component of wireless com-

Product Code (EPC) which contains not only infor-

munications. Since RFID technology presents unique

Miller

1

Data Collection

identication of tags, problems clearly arise when two tags contain the same ID information. Simple eavesdropping on an RFID tag communication can pro-

The data used in this analysis was collected from

vide the pair of challenge/response values required to

Avery-Dennison AD-612 RFID Inlays congured to

crack the security built into RFID technology, open-

t protocol standards for EPC Class 1 Gen 2. There

ing the door for a cloned signal to begin imitating the

were 25 individual tags, all with the same code writ-

original. A team of researchers from Johns Hopkins

ten onto them with Thing Magic Mercury 5e RFID

University and RSA Laboratories successfully cloned

Reader. The tags were read with an omnidirectional

and simulated an ignition key for their own car, and

antenna (Larid Technologies) through both a Ettus

did the same for their gasoline needs by cloning their

Research USRP2 software radio system with a Flex

own SpeedPass

900 daughterboard as well as a vector signal analyzer

TM tokens [3].

Boeing uses RFID la-

bels applied during the manufacturing stage to track

operating at a 3.2 MHz sampling frequency.

life-limited parts on their 787 Dreamliners to better

one tag was read at a time, with the rest placed in

Only

manage part maintenance and repair history [3]. It

a shielded box to reduce the risk of transmission col-

is necessary for these tracking tags to be veried as

lisions.

legitimate; counterfeit parts with cloned RFID tags

data with the VSA recording a 327ms section from

could result in any number of serious failures. Most

within that time window. Each tag was recorded at

proposed solutions to the security issues in RFID

all of the three frequencies 902Mhz, 915MHz, and

technology involves stronger encryption or restrict-

928MHz in each of parallel, oblique, upside-down

ing physical access to the tags. These solutions are

orientations, simulating real-world factors that arise

costly, however, and still leave the cheaper devices

when recording RFID signals.

The USRP2 data consists of 3-6 seconds of

unprotected. Our research is aimed at resolving the diculty in distinguishing between cloned RFID signals by applying pattern classication algorithms to uniquely identify individual RFID tags by their unintentional variations in signal, usually arising from the manufacturing process and/or the tag-reading process. RFID readers sold today perform all of the signal amplication, modulation/demodulation, mixing, etc. in special-purpose hardware. While this is benecial for standard RFID use where only the demodulated EPC is of interest, it is inadequate for our research because we seek to extract the raw EPC signal. For this reason, a vector signal analyzer (VSA) was used in the past to record the incoming RF signal from the antenna [4]. Rather than continuing to use an expensive vector signal analyzer to collect, we

Figure 1: The experimental setup used to read the RFID tags is shown, with the tag reader, antenna, USRP2, and tag displayed.

instead are interested in a cheaper, more controlled software-dened radio (SDR) system. SDR systems

Data Analysis

are benecial over standard RFID units as they contain their own A/D converters and the majority of their signal processing is software controlled, allowing

In its active state, the RFID reader sends repeated

them to transmit and receive a wide variety of radio

queries out, searching for the presence of an RFID

protocols based solely on the software used. The SDR

tag.

system we used is from the Universal Software Radio

queries with 16 random numbers.

Peripheral (USRP) family of products developed by

edged by the reader, the rest of the EPC code is then

Ettus Research LLC, specically the USRP2. With

sent to the reader. This process is repeated as long

board schemes and open source drivers widely avail-

as the tag remains within the read/write range while

able, the exibility of the USRP system proved to be

the reader is active. Because this process is repeated

a perfect, simple solution as our RF interface.

continuously, a recording of this tag-to-reader com-

Miller

When in range, a tag will respond to these Once acknowl-

2

munication will contain many repetitions of the EPC code along with multiple reader queries and communication timeouts.

Figure 3:

Dierent EPC compression techniques:

Figure 2: Features of the RFID signal show the tag-

From the top down, the real and imaginary parts of

to-reader communication.

the raw signal, the amplitude, phase, and instantaneous frequency of the raw EPC.

An algorithm is therefore required to extract the EPC from the whole recorded signal. Since the data collects are ideally very short to save both process-

the data followed by slicing and projecting the result-

ing time and storage space, being able to extract

ing wavelet coecients onto the time-scale plane, re-

every single EPC from the signal is important to

sulting in a binary ngerprint image. This process

obtain as much data as possible.

is summarized in Figure 4.

This process is

From these binary im-

done through a combination of mean/variance cross-

ages, properties are collected using image processing

correlation with a manually extracted query region,

routines and used as our feature set. Because more

and amplitude/signal windowing routines.

The re-

features are generated than can be of use, a Euclidean

sulting complex-valued signals are then broken down

distance metric is applied to the DWFP feature set

into their modulus,

to indicate the most highly-separable interclass dis-

instantaneous frequency,

and

tances and the times this distance is greatest.

phase values using the following formulae [5]:

Wavelet Packet Decomposition requires the appli-

s(t) α(t)

= r(t) p + ic(t) = r2 (t)+ c2(t) θ(t) = tan−1 r(t) c(t)

fi (t) where

θh (t)

is

=

θ(t)

cation of a Wavelet Packet Transform (WPT) which results in a tree of coecients. The normalized energy (1)

a matrix, and singular value decomposition returns

1 d 2π dt θh (t)

the eigenvalues with the highest energy. The WPT

unwrapped whenever the phase

passes through multiples of

2π .

for the two classes being compared are inserted into

This type of reduc-

elements corresponding to those singular values are used as features.

Figure 3

The statistical calculations used as features con-

compares the dierent EPC compression results on

sist of: the mean of the EPC, the maximum cross-

a complex signal. For the results presented in this re-

correlation of the EPC signal with another EPC from

port, the modulus was the only compression method

the same tag, the variance of the EPC, the Shannon

used.

entropy, the second central moment, the skewness,

tion is referred to as EPC compression.

We then generate features from these EPCs using three dierent methods: Dynamic Wavelet Fingerprinting (DWFP), Wavelet Packet Decomposition (WPD), and statistical methods.

The DWFP

method performs a stationary wavelet transform on

Miller

and the kurtosis.

Further details on these calcula-

tions can be found in the MATLAB help guide. In the features extracted to identify a tag, the actual classes will be

ωj = 1 whenever the EPC belongs ωj = −1 whenever it does not.

to the correct tag and

3

ρ

method, while

controls the amount of undersam-

pling. Once the feature set has been developed and organized for the EPCs, it then needs to be run through the classier. The rst step in that process is to split the data roughly in half using the hold-out method into training and testing data sets. Then a classier is used to map the feature matrices to their predicted labels. Several classiers are used in this study, including quadratic discriminant classier (QDC), linear discriminant classier (LDC), k-nearest-neighbor (kNN), and support vector machines (SVM).

Classier Evaluation

Figure 4: The DWFP technique [6] begins with a) the ultrasonic signal, where it generates b) wavelet

L(i, j), can be generated once the

coecients indexed by time and scale, where scale

A confusion matrix,

is related to frequency. Then c) the coecients are

data has been classied that represents the number

sliced and projected onto the time-scale plane (d).

of EPCs from the testing tag that get classied as the

The nal binary image is used to select features for

classier tag, where

the pattern classication algorithm.

the testing tag. For the holdout method, the value of

τi

is the classier tag and

the confusion matrix is the proportion of

τj

is

y = 1 labels

for the EPCs of the testing tag in the testing set, or Because there are as many as 24 tags that are not the same as the classier tag, issues in class imbal-

L(i, j) =

ance come into play. The level of imbalance aects the results depending on the complexity of the system the features were drawn from, but even a small imbalance can have a large eect on the results [7] [8].

to help

This aspect of classier design is incorporated in the RFID classication algorithm and are represented by two variables,

η

and

ρ,

dened by Eqns 2 and 3:

|(τj = τS )&(j ∈ T )| η= |τj = τL |

An example can be seen in Figure 5, where the value of

L

has been matched to a greyscale color in-

(2)

tensity, so

are more likely to be classied as the classier tag whenever the testing tag is the same as the classier tag. The confusion matrix provides the percent of EPCs from the testing tag that were classied as originating from the classier tag. In order to evaluate the performance of the classier a threshold

|ωj = −1| ρ= ,j ∈ R |ωj = 1|

(3)

S are subsets of {1, . . . , N } that indicate indices of xj,k corresponding with the classier tag (τj = τL ) and the testing tag (τj = τS ). Similarly, R and T represents subsets of {1, . . . , N } corresponding to the training set (R) and testing set (T ). Therefore, η represents the fraction of EPCs from the testing tag that were withheld for T , with the rest inserted into R, so that 0 < η ≤ 1; and ρ represents the fraction of negative versus positive EPCs in R, so + that 0 < ρ < (|ωj = −1|/|ωj = 1|) and ρ ∈ Z . Effectively, η is the variable that controls the sampling Here,

Miller

(4)

0 → black and 1 → white. Whenever ωj = −1 is samτ = τ , L approaches 1, and that the lowest valL S a smaller size relative to ωj = 1, is employed ues of L occur for τL 6= τS , meaning the testing tags correct the sampling imbalance.

An undersampling method, where pled to

|(xj ∈ T )&(yj = 1)| |xj ∈ T |

L

and

h

is applied

so that the false positive (f+ ), false negative (f− ), true positive (t+ ), and true negative (t− ) rates are given by Eqn 5:

f+ t+ f− t−

= = = =

|L(i, j) > h|, i 6= j |L(i, j) > h|, i = j |L(i, j) ≤ h|, i = j |L(i, j) ≤ h|, i 6= j

(5)

As these values are a function of threshold, a useful summary for discrimination changes is the receiver operating characteristic (ROC) curve [9]. An comprehensive measurement of this curve is the area under the ROC curve (AUC), and usually classiers with

4

classify the tags with the USRP2 data, it stands a reasonable conclusion that no critical information is lost by using the USRP2 to collect data as compared to the VSA. Data taken in the parallel orientation (RFID tag lined up parallel to the RFID antenna) at a frequency of 902MHz with both the VSA and the USRP2 were run through the pattern classiers, the results of which can be seen in Table 1 and Table 2. These results are presented from the most rigorous test of the classier, where

η = 1,

meaning that

the classier is not trained on the EPCs from the tag being tested.

While the VSA results contain errors

in classication, this is due in part to the size of the data set collected with the VSA. A collection time of Figure 5: An example of a confusion matrix is shown

327ms resulted in as few as 3 EPCs for a given tag,

tags. The color in-

so when the data sets were split into training and

tensity relates to the percent of EPCs from the test-

testing sets, only one or two EPCs were included in

as a grayscale image for the ing tag

τS

AD

that were identied as coming from the

classier tag,

each set. In an eort to decrease these sampling size eects as much as possible, every EPC collected was

τL .

included twice in the classication. This allowed for better splitting of the data, but an ideal sample size

an ROC curve leading to a larger

should be much larger. The USRP2 data set did not

formy better than those with a

AUC value perlower AUC. There-

have any sampling size problems, as the read time

is used to narrow the results of all the

was much longer and more EPCs were collected for

available classier combinations to choose a few of

each tag, and shows very promising classication re-

f+ , f− , t + , t −

sults. These results show that the USRP2 is adequate

fore,

AUC

the best ones. The summary statistics

still remain as useful measures of the classier performance for a given threshold

for data collection. It is helpful to examine the extreme values of

h, so another metric used

min(f+ + f− ) over the decision threshold, and the minimum f+ and f− statistics will be presented as

η

a percentage of the total number of combinations of

plied, but a second restriction of

is

ρ and

for the USRP2 data to examine the robustness of

the classier. Not only is the restriction of

η = all

ρ=1

ap-

is applied

η

the binary classier. The percentage of false positives

to our classier. This second restriction on

(f+ [%]) and false negatives (f− [%]) at that decision

that all the EPCs from all the tags are included in the

means

threshold are divided by the possible number of false

training set except those withheld for testing. These

positives and false negatives respectively. The deci-

results can be seen in Table 3. Again, very few errors

sion threshold at which the minimum rate occurred

were made, although a slightly lower

will also be given.

ment than in Table 2.

|AUC| measure-

Results

Conclusions and Future Work

With so many variables in the conguration of a sin-

A classication routine that can identify whether or

gle classier, the dimensionality of the results spaces

not RFID tag A is the same as RFID tag B, de-

is large. With the main application for this classier

spite the fact that tag A and tag B present the same

being security, the best results will have fewest num-

EPC information, has been presented and tested on

ber of false positives as well as the best accuracy. As

two unique data sets collected at the same time but

there are often many classiers that meet the min-

with dierent methods.

imum number of false positives, which is zero, the

has some limitations, namely, it cannot determine the

results are narrowed by additional criteria.

identication of tag A, only whether or not it is the

The comparison of interest is the classier perfor-

This classication routine

same as tag B, which is better in some applications

mance between the VSA data set and the USRP2

(such as ID badges) than others.

data set.

ed that both an expensive, sophisticated vector sig-

Miller

If the pattern classiers can accurately

It has been veri-

5

software radio can pick up the unintentional qualities

Applications of Pattern Classication to Time-Domain Signals. PhD thesis,

of the EPCs being recorded that are required for this

The College of William and Mary, 2010.

nal analyzer and an aordable, user-dened universal

classier. Future work on this project includes further testing to asses the robustness of the pattern classiers.

Since data was recorded at multiple frequen-

cies and various orientations, an analysis including a frequency and orientation comparison/substitution can be run.

For example, the pattern classier can

be trained on data collected at 902MHz, but tested on data collected at 915MHz. This type of substitution is important in real-world situations because the EPC Class 1 Gen 2 standards allow a range of 902928MHz to be used, so the classier needs to be able to handle input from all RFID readers within this range.

The same thing can be done with the three

orientations recorded for each tag. In addition to orientation variations and frequency uctuations, more real world factors can be introduced to the system such as temperature and humidity uctuations and accelerated aging of the tags themselves. These ideas have potential to improve the robustness of the clas-

[4] Crystal Bertoncini.

[5] K J Ellis and N Serinken.

Characteristics of

radio transmitter ngerprints.

Radio Science,

36(4):585597, 2001. [6] J Hou and M K Hinders.

Dynamic wavelet n-

gerprint identication of ultrasound.

Evaluation, 60(9):10891093, 2002.

Materials

[7] Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: a systematic study.

gent Data Analysis, 6(5):429449, 2002.

Intelli-

[8] Gary M Weiss and Foster Provost. Eect of class distribution on classier learning:

An empirical

study. Technical Report ML-TR-44, Rutgers University Department of Computer Science, August 2001. [9] Tom Fawcett. An introduction to ROC analysis.

Pattern Recognition Letters, 27:861874, 2006.

siers, and therefore improve the solution to RFID cloning.

Acknowledgments We thank Dr. Crystal Bertoncini and Bryan Nousain for their help in learning these pattern classication routines and methods and for general RFID information. This work was performed in part using computational facilities at the College of William and Mary which were provided with the assistance of the National Science Foundation, the Virginia Port Authority, Sun Microsystems, and Virginia's Commonwealth Technology Research Fund.

References [1] Sarah Clark. Air france tests nfc boarding passes at nice airport. Press Release, 2009. [2] Jo Best.

Airline gets rst nfc boarding.

Press

Release, 2009. [3] S Bono and M Green.

Security analysis of a

cryptographically-enabled rd device. Daniel, editor,

In P Mc-

14th USENIX Security Sympo-

sium, pages 116, 2005. Miller

6

Table 1: The holdout classication results for the VSA 902MHz data set where all the EPCs from the testing tag were withheld from the training set(η

= 1)

are displayed here.

Classier Conguration EPC

#DWFP

Compression

Features

α α α α α α

Classier

[%]

Results

η

ρ

min(f+ + f− )

f+

f−

h

|AUC|

75

SVM

1

9

0.138

0.072

1.724

80.1

0.9954

100

SVM

1

7

0.138

0.072

1.724

80.1

0.9931

75

SVM

1

6

0.138

0.072

1.724

85.8

0.9930

100

SVM

1

11

0.276

0.072

5.172

85.8

0.9923

100

SVM

1

4

0.207

0.144

1.724

83.4

0.9915

100

SVM

1

18

0.276

0.000

6.896

85.8

0.9914

Table 2: The holdout classication results for the USRP2 902MHz data set where all the EPCs from the testing tag were withheld from the training set(η

= 1)

are displayed here.

Classier Conguration EPC

#DWFP

Compression

Features

α α α α α

Results

[%] |AUC|

Classier

η

ρ

min(f+ + f− )

f+

f−

h

100

QDC (PRTools)

1

8

0.000

0.000

0.000

92.5

1.0000

100

QDC (PRTools)

1

9

0.000

0.000

0.000

92.5

1.0000

100

QDC (PRTools)

1

12

0.000

0.000

0.000

91.0

1.0000

100

QDC (PRTools)

1

17

0.000

0.000

0.000

92.5

1.0000

100

QDC (PRTools)

1

6

0.000

0.000

0.000

92.5

0.9998

Table 3: The holdout classication results for the USRP2 902MHz data set where all the EPCs from the testing tag were withheld from the training set(η

= 1)

and all of the EPCs are included in the training data

set are displayed below. Classier Conguration EPC

#DWFP

Compression

Features

α α α α α α

Miller

Results

[%] |AUC|

Classier

η

ρ

min(f+ + f− )

f+

f−

h

100

QDC (PRTools)

1

all

0.000

0.000

0.000

91.0

0.9997

1

QDC (PRTools)

1

all

0.059

0.062

0.000

90.3

0.9986

1

QDC (MATLAB)

1

all

0.119

0.124

0.000

76.2

0.9982

5

QDC (MATLAB)

1

all

0.059

0.000

1.470

88.9

0.9981

5

QDC (PRTools)

1

all

0.059

0.062

0.000

90.3

0.9975

75

QDC (PRTools)

1

all

0.119

0.000

2.941

74.7

0.9922

7