UNIQUE IDENTIFICATION OF RADIO FREQUENCY IDENTIFICATION TAGS Corey Miller Advisor: Dr. Mark Hinders Department of Applied Science Nondestructive Evaluation Laboratory The College of William and Mary
Abstract
mation about the type of object, but also a unique serial number used to distinguish the object individ-
Radio Frequency Identication (RFID) tags are used
ually. RFID tags also eliminate the need for line-of-
in credit cards and passports for automatic identity
sight scanning that barcodes have, avoiding scanning
recognition and expense transfers as well as through-
orientation requirements.
out the supply chain to track inventory. The unauthorized electronic reproduction of these RFID signals is easily performed despite the use of developed encryption methods and can lead to critical security breaches and lost inventory.
A method for de-
termining spoofed RFID tags is presented based on ngerprinting unintentional modulations in the electromagnetic signal of RF emitters. Improvements to existing supervised pattern classication techniques are presented, utilizing the Dynamic Wavelet Fingerprint (DWFP) technique for feature extraction.
While RFID technology is the primary focus of our research, another similar form short-range wireless communication technology is known as near eld communication (NFC). Compatible with already existing RFID infrastructures, NFC involves an initiator that generates an RF eld and a passive target, although interactions between two powered devices are possible. The smartphone industry is one of the leading areas for NFC research, as many manufacturers have already begun putting NFC technology to their products. With applications enabling users to pay for such items as groceries and subway tick-
Introduction
ets by waving their phone in front a machine, NFC payment systems are an attractive alternative to the are
multitude of credit cards available today. Air France
from
is testing a new Pass and Fly boarding program to
tracking supplies to allowing remote building access
evaluate NFC-based boarding passes on specic do-
through secure ID badges. RFID tags contain an an-
mestic ight routes [1] [2]. Using NFC-enabled mo-
tenna used to receive and transmit the RF signals,
bile phones, passengers have the option to swipe their
and an integrated circuit used to process and modu-
phones at a Pass and Fly reader where the machine
late/demodulate signals. Passive RFID tags harness
identies the traveler and uploads a digital boarding
the energy required for signal manipulation from the
pass onto their phone. At the subsequent security in-
RF waves themselves, while active RFID tags use an
spection point, the traveler simply swipes their phone
onboard power source.
Since they don't contain a
across another NFC reader which displays the board-
power source, passive tags can be made extremely
ing pass to security ocers. From there, passengers
thin, to the point where conductive inks can be used
need only to verify identication with the airline sta
to literally print the antenna design as needed. This
as a third NFC reader checks the boarding pass and
continuous decrease in the price of technology has
prints the passenger's seat information. This simpli-
led the way for RFID tags to become a cost sav-
ed process involving instant passenger recognition
ing replacement for barcodes.
and paperless boarding passes promises a time-saving
Radio
frequency
widespread
identication
throughout
the
(RFID)
modern
tags
world,
Instead of relaying a
sequence of numbers known as a Universal Product
and ecient airport experience for travelers.
Code (UPC) that identies only the type of object a
With both RFID tracking and NFC applications,
barcode is attached to, RFID tags use an Electronic
security is an important component of wireless com-
Product Code (EPC) which contains not only infor-
munications. Since RFID technology presents unique
Miller
1
Data Collection
identication of tags, problems clearly arise when two tags contain the same ID information. Simple eavesdropping on an RFID tag communication can pro-
The data used in this analysis was collected from
vide the pair of challenge/response values required to
Avery-Dennison AD-612 RFID Inlays congured to
crack the security built into RFID technology, open-
t protocol standards for EPC Class 1 Gen 2. There
ing the door for a cloned signal to begin imitating the
were 25 individual tags, all with the same code writ-
original. A team of researchers from Johns Hopkins
ten onto them with Thing Magic Mercury 5e RFID
University and RSA Laboratories successfully cloned
Reader. The tags were read with an omnidirectional
and simulated an ignition key for their own car, and
antenna (Larid Technologies) through both a Ettus
did the same for their gasoline needs by cloning their
Research USRP2 software radio system with a Flex
own SpeedPass
900 daughterboard as well as a vector signal analyzer
TM tokens [3].
Boeing uses RFID la-
bels applied during the manufacturing stage to track
operating at a 3.2 MHz sampling frequency.
life-limited parts on their 787 Dreamliners to better
one tag was read at a time, with the rest placed in
Only
manage part maintenance and repair history [3]. It
a shielded box to reduce the risk of transmission col-
is necessary for these tracking tags to be veried as
lisions.
legitimate; counterfeit parts with cloned RFID tags
data with the VSA recording a 327ms section from
could result in any number of serious failures. Most
within that time window. Each tag was recorded at
proposed solutions to the security issues in RFID
all of the three frequencies 902Mhz, 915MHz, and
technology involves stronger encryption or restrict-
928MHz in each of parallel, oblique, upside-down
ing physical access to the tags. These solutions are
orientations, simulating real-world factors that arise
costly, however, and still leave the cheaper devices
when recording RFID signals.
The USRP2 data consists of 3-6 seconds of
unprotected. Our research is aimed at resolving the diculty in distinguishing between cloned RFID signals by applying pattern classication algorithms to uniquely identify individual RFID tags by their unintentional variations in signal, usually arising from the manufacturing process and/or the tag-reading process. RFID readers sold today perform all of the signal amplication, modulation/demodulation, mixing, etc. in special-purpose hardware. While this is benecial for standard RFID use where only the demodulated EPC is of interest, it is inadequate for our research because we seek to extract the raw EPC signal. For this reason, a vector signal analyzer (VSA) was used in the past to record the incoming RF signal from the antenna [4]. Rather than continuing to use an expensive vector signal analyzer to collect, we
Figure 1: The experimental setup used to read the RFID tags is shown, with the tag reader, antenna, USRP2, and tag displayed.
instead are interested in a cheaper, more controlled software-dened radio (SDR) system. SDR systems
Data Analysis
are benecial over standard RFID units as they contain their own A/D converters and the majority of their signal processing is software controlled, allowing
In its active state, the RFID reader sends repeated
them to transmit and receive a wide variety of radio
queries out, searching for the presence of an RFID
protocols based solely on the software used. The SDR
tag.
system we used is from the Universal Software Radio
queries with 16 random numbers.
Peripheral (USRP) family of products developed by
edged by the reader, the rest of the EPC code is then
Ettus Research LLC, specically the USRP2. With
sent to the reader. This process is repeated as long
board schemes and open source drivers widely avail-
as the tag remains within the read/write range while
able, the exibility of the USRP system proved to be
the reader is active. Because this process is repeated
a perfect, simple solution as our RF interface.
continuously, a recording of this tag-to-reader com-
Miller
When in range, a tag will respond to these Once acknowl-
2
munication will contain many repetitions of the EPC code along with multiple reader queries and communication timeouts.
Figure 3:
Dierent EPC compression techniques:
Figure 2: Features of the RFID signal show the tag-
From the top down, the real and imaginary parts of
to-reader communication.
the raw signal, the amplitude, phase, and instantaneous frequency of the raw EPC.
An algorithm is therefore required to extract the EPC from the whole recorded signal. Since the data collects are ideally very short to save both process-
the data followed by slicing and projecting the result-
ing time and storage space, being able to extract
ing wavelet coecients onto the time-scale plane, re-
every single EPC from the signal is important to
sulting in a binary ngerprint image. This process
obtain as much data as possible.
is summarized in Figure 4.
This process is
From these binary im-
done through a combination of mean/variance cross-
ages, properties are collected using image processing
correlation with a manually extracted query region,
routines and used as our feature set. Because more
and amplitude/signal windowing routines.
The re-
features are generated than can be of use, a Euclidean
sulting complex-valued signals are then broken down
distance metric is applied to the DWFP feature set
into their modulus,
to indicate the most highly-separable interclass dis-
instantaneous frequency,
and
tances and the times this distance is greatest.
phase values using the following formulae [5]:
Wavelet Packet Decomposition requires the appli-
s(t) α(t)
= r(t) p + ic(t) = r2 (t)+ c2(t) θ(t) = tan−1 r(t) c(t)
fi (t) where
θh (t)
is
=
θ(t)
cation of a Wavelet Packet Transform (WPT) which results in a tree of coecients. The normalized energy (1)
a matrix, and singular value decomposition returns
1 d 2π dt θh (t)
the eigenvalues with the highest energy. The WPT
unwrapped whenever the phase
passes through multiples of
2π .
for the two classes being compared are inserted into
This type of reduc-
elements corresponding to those singular values are used as features.
Figure 3
The statistical calculations used as features con-
compares the dierent EPC compression results on
sist of: the mean of the EPC, the maximum cross-
a complex signal. For the results presented in this re-
correlation of the EPC signal with another EPC from
port, the modulus was the only compression method
the same tag, the variance of the EPC, the Shannon
used.
entropy, the second central moment, the skewness,
tion is referred to as EPC compression.
We then generate features from these EPCs using three dierent methods: Dynamic Wavelet Fingerprinting (DWFP), Wavelet Packet Decomposition (WPD), and statistical methods.
The DWFP
method performs a stationary wavelet transform on
Miller
and the kurtosis.
Further details on these calcula-
tions can be found in the MATLAB help guide. In the features extracted to identify a tag, the actual classes will be
ωj = 1 whenever the EPC belongs ωj = −1 whenever it does not.
to the correct tag and
3
ρ
method, while
controls the amount of undersam-
pling. Once the feature set has been developed and organized for the EPCs, it then needs to be run through the classier. The rst step in that process is to split the data roughly in half using the hold-out method into training and testing data sets. Then a classier is used to map the feature matrices to their predicted labels. Several classiers are used in this study, including quadratic discriminant classier (QDC), linear discriminant classier (LDC), k-nearest-neighbor (kNN), and support vector machines (SVM).
Classier Evaluation
Figure 4: The DWFP technique [6] begins with a) the ultrasonic signal, where it generates b) wavelet
L(i, j), can be generated once the
coecients indexed by time and scale, where scale
A confusion matrix,
is related to frequency. Then c) the coecients are
data has been classied that represents the number
sliced and projected onto the time-scale plane (d).
of EPCs from the testing tag that get classied as the
The nal binary image is used to select features for
classier tag, where
the pattern classication algorithm.
the testing tag. For the holdout method, the value of
τi
is the classier tag and
the confusion matrix is the proportion of
τj
is
y = 1 labels
for the EPCs of the testing tag in the testing set, or Because there are as many as 24 tags that are not the same as the classier tag, issues in class imbal-
L(i, j) =
ance come into play. The level of imbalance aects the results depending on the complexity of the system the features were drawn from, but even a small imbalance can have a large eect on the results [7] [8].
to help
This aspect of classier design is incorporated in the RFID classication algorithm and are represented by two variables,
η
and
ρ,
dened by Eqns 2 and 3:
|(τj = τS )&(j ∈ T )| η= |τj = τL |
An example can be seen in Figure 5, where the value of
L
has been matched to a greyscale color in-
(2)
tensity, so
are more likely to be classied as the classier tag whenever the testing tag is the same as the classier tag. The confusion matrix provides the percent of EPCs from the testing tag that were classied as originating from the classier tag. In order to evaluate the performance of the classier a threshold
|ωj = −1| ρ= ,j ∈ R |ωj = 1|
(3)
S are subsets of {1, . . . , N } that indicate indices of xj,k corresponding with the classier tag (τj = τL ) and the testing tag (τj = τS ). Similarly, R and T represents subsets of {1, . . . , N } corresponding to the training set (R) and testing set (T ). Therefore, η represents the fraction of EPCs from the testing tag that were withheld for T , with the rest inserted into R, so that 0 < η ≤ 1; and ρ represents the fraction of negative versus positive EPCs in R, so + that 0 < ρ < (|ωj = −1|/|ωj = 1|) and ρ ∈ Z . Effectively, η is the variable that controls the sampling Here,
Miller
(4)
0 → black and 1 → white. Whenever ωj = −1 is samτ = τ , L approaches 1, and that the lowest valL S a smaller size relative to ωj = 1, is employed ues of L occur for τL 6= τS , meaning the testing tags correct the sampling imbalance.
An undersampling method, where pled to
|(xj ∈ T )&(yj = 1)| |xj ∈ T |
L
and
h
is applied
so that the false positive (f+ ), false negative (f− ), true positive (t+ ), and true negative (t− ) rates are given by Eqn 5:
f+ t+ f− t−
= = = =
|L(i, j) > h|, i 6= j |L(i, j) > h|, i = j |L(i, j) ≤ h|, i = j |L(i, j) ≤ h|, i 6= j
(5)
As these values are a function of threshold, a useful summary for discrimination changes is the receiver operating characteristic (ROC) curve [9]. An comprehensive measurement of this curve is the area under the ROC curve (AUC), and usually classiers with
4
classify the tags with the USRP2 data, it stands a reasonable conclusion that no critical information is lost by using the USRP2 to collect data as compared to the VSA. Data taken in the parallel orientation (RFID tag lined up parallel to the RFID antenna) at a frequency of 902MHz with both the VSA and the USRP2 were run through the pattern classiers, the results of which can be seen in Table 1 and Table 2. These results are presented from the most rigorous test of the classier, where
η = 1,
meaning that
the classier is not trained on the EPCs from the tag being tested.
While the VSA results contain errors
in classication, this is due in part to the size of the data set collected with the VSA. A collection time of Figure 5: An example of a confusion matrix is shown
327ms resulted in as few as 3 EPCs for a given tag,
tags. The color in-
so when the data sets were split into training and
tensity relates to the percent of EPCs from the test-
testing sets, only one or two EPCs were included in
as a grayscale image for the ing tag
τS
AD
that were identied as coming from the
classier tag,
each set. In an eort to decrease these sampling size eects as much as possible, every EPC collected was
τL .
included twice in the classication. This allowed for better splitting of the data, but an ideal sample size
an ROC curve leading to a larger
should be much larger. The USRP2 data set did not
formy better than those with a
AUC value perlower AUC. There-
have any sampling size problems, as the read time
is used to narrow the results of all the
was much longer and more EPCs were collected for
available classier combinations to choose a few of
each tag, and shows very promising classication re-
f+ , f− , t + , t −
sults. These results show that the USRP2 is adequate
fore,
AUC
the best ones. The summary statistics
still remain as useful measures of the classier performance for a given threshold
for data collection. It is helpful to examine the extreme values of
h, so another metric used
min(f+ + f− ) over the decision threshold, and the minimum f+ and f− statistics will be presented as
η
a percentage of the total number of combinations of
plied, but a second restriction of
is
ρ and
for the USRP2 data to examine the robustness of
the classier. Not only is the restriction of
η = all
ρ=1
ap-
is applied
η
the binary classier. The percentage of false positives
to our classier. This second restriction on
(f+ [%]) and false negatives (f− [%]) at that decision
that all the EPCs from all the tags are included in the
means
threshold are divided by the possible number of false
training set except those withheld for testing. These
positives and false negatives respectively. The deci-
results can be seen in Table 3. Again, very few errors
sion threshold at which the minimum rate occurred
were made, although a slightly lower
will also be given.
ment than in Table 2.
|AUC| measure-
Results
Conclusions and Future Work
With so many variables in the conguration of a sin-
A classication routine that can identify whether or
gle classier, the dimensionality of the results spaces
not RFID tag A is the same as RFID tag B, de-
is large. With the main application for this classier
spite the fact that tag A and tag B present the same
being security, the best results will have fewest num-
EPC information, has been presented and tested on
ber of false positives as well as the best accuracy. As
two unique data sets collected at the same time but
there are often many classiers that meet the min-
with dierent methods.
imum number of false positives, which is zero, the
has some limitations, namely, it cannot determine the
results are narrowed by additional criteria.
identication of tag A, only whether or not it is the
The comparison of interest is the classier perfor-
This classication routine
same as tag B, which is better in some applications
mance between the VSA data set and the USRP2
(such as ID badges) than others.
data set.
ed that both an expensive, sophisticated vector sig-
Miller
If the pattern classiers can accurately
It has been veri-
5
software radio can pick up the unintentional qualities
Applications of Pattern Classication to Time-Domain Signals. PhD thesis,
of the EPCs being recorded that are required for this
The College of William and Mary, 2010.
nal analyzer and an aordable, user-dened universal
classier. Future work on this project includes further testing to asses the robustness of the pattern classiers.
Since data was recorded at multiple frequen-
cies and various orientations, an analysis including a frequency and orientation comparison/substitution can be run.
For example, the pattern classier can
be trained on data collected at 902MHz, but tested on data collected at 915MHz. This type of substitution is important in real-world situations because the EPC Class 1 Gen 2 standards allow a range of 902928MHz to be used, so the classier needs to be able to handle input from all RFID readers within this range.
The same thing can be done with the three
orientations recorded for each tag. In addition to orientation variations and frequency uctuations, more real world factors can be introduced to the system such as temperature and humidity uctuations and accelerated aging of the tags themselves. These ideas have potential to improve the robustness of the clas-
[4] Crystal Bertoncini.
[5] K J Ellis and N Serinken.
Characteristics of
radio transmitter ngerprints.
Radio Science,
36(4):585597, 2001. [6] J Hou and M K Hinders.
Dynamic wavelet n-
gerprint identication of ultrasound.
Evaluation, 60(9):10891093, 2002.
Materials
[7] Nathalie Japkowicz and Shaju Stephen. The class imbalance problem: a systematic study.
gent Data Analysis, 6(5):429449, 2002.
Intelli-
[8] Gary M Weiss and Foster Provost. Eect of class distribution on classier learning:
An empirical
study. Technical Report ML-TR-44, Rutgers University Department of Computer Science, August 2001. [9] Tom Fawcett. An introduction to ROC analysis.
Pattern Recognition Letters, 27:861874, 2006.
siers, and therefore improve the solution to RFID cloning.
Acknowledgments We thank Dr. Crystal Bertoncini and Bryan Nousain for their help in learning these pattern classication routines and methods and for general RFID information. This work was performed in part using computational facilities at the College of William and Mary which were provided with the assistance of the National Science Foundation, the Virginia Port Authority, Sun Microsystems, and Virginia's Commonwealth Technology Research Fund.
References [1] Sarah Clark. Air france tests nfc boarding passes at nice airport. Press Release, 2009. [2] Jo Best.
Airline gets rst nfc boarding.
Press
Release, 2009. [3] S Bono and M Green.
Security analysis of a
cryptographically-enabled rd device. Daniel, editor,
In P Mc-
14th USENIX Security Sympo-
sium, pages 116, 2005. Miller
6
Table 1: The holdout classication results for the VSA 902MHz data set where all the EPCs from the testing tag were withheld from the training set(η
= 1)
are displayed here.
Classier Conguration EPC
#DWFP
Compression
Features
α α α α α α
Classier
[%]
Results
η
ρ
min(f+ + f− )
f+
f−
h
|AUC|
75
SVM
1
9
0.138
0.072
1.724
80.1
0.9954
100
SVM
1
7
0.138
0.072
1.724
80.1
0.9931
75
SVM
1
6
0.138
0.072
1.724
85.8
0.9930
100
SVM
1
11
0.276
0.072
5.172
85.8
0.9923
100
SVM
1
4
0.207
0.144
1.724
83.4
0.9915
100
SVM
1
18
0.276
0.000
6.896
85.8
0.9914
Table 2: The holdout classication results for the USRP2 902MHz data set where all the EPCs from the testing tag were withheld from the training set(η
= 1)
are displayed here.
Classier Conguration EPC
#DWFP
Compression
Features
α α α α α
Results
[%] |AUC|
Classier
η
ρ
min(f+ + f− )
f+
f−
h
100
QDC (PRTools)
1
8
0.000
0.000
0.000
92.5
1.0000
100
QDC (PRTools)
1
9
0.000
0.000
0.000
92.5
1.0000
100
QDC (PRTools)
1
12
0.000
0.000
0.000
91.0
1.0000
100
QDC (PRTools)
1
17
0.000
0.000
0.000
92.5
1.0000
100
QDC (PRTools)
1
6
0.000
0.000
0.000
92.5
0.9998
Table 3: The holdout classication results for the USRP2 902MHz data set where all the EPCs from the testing tag were withheld from the training set(η
= 1)
and all of the EPCs are included in the training data
set are displayed below. Classier Conguration EPC
#DWFP
Compression
Features
α α α α α α
Miller
Results
[%] |AUC|
Classier
η
ρ
min(f+ + f− )
f+
f−
h
100
QDC (PRTools)
1
all
0.000
0.000
0.000
91.0
0.9997
1
QDC (PRTools)
1
all
0.059
0.062
0.000
90.3
0.9986
1
QDC (MATLAB)
1
all
0.119
0.124
0.000
76.2
0.9982
5
QDC (MATLAB)
1
all
0.059
0.000
1.470
88.9
0.9981
5
QDC (PRTools)
1
all
0.059
0.062
0.000
90.3
0.9975
75
QDC (PRTools)
1
all
0.119
0.000
2.941
74.7
0.9922
7