Engineering & Technology Degree Level: Bachelor of Science

Comment

Report 3 Downloads 112 Views

Undergraduate Category: Engineering & Technology Degree Level: Bachelor of Science Abstract ID#: 670

Improving FIR Filtering and AES Encryp7on with OpenCL 2.0 Carter McCardwell, Tuan Dao, Saoni Mukherjee, David Kaeli

Abstract

Introduc)on

The growth in demand for heterogeneous accelerators has sJmulated the development of cuong-‐edge features in newer accelerators. The heterogeneous programming frameworks such as OpenCL have matured over the years and introduced new features for developers. We explore one of these programming frameworks, OpenCL 2.0. To drive our study, we consider a number of new features in OpenCL 2.0 using two popular applicaJons from two diﬀerent compuJng domains -‐ cyber security and signal processing. These applicaJons are: 1) the AES-‐128 encrypJon standard, and 2) Finite Impulse Response ﬁltering. In this work, we introduce the latest runJme features enabled in OpenCL 2.0, and discuss how well our applicaJons can beneﬁt from some of these features.

•  EvaluaJon of the new features available in OpenCL 2.0, primarily, Shared Virtual Memory (SVM) and dynamic parallelism •  SVM removes the need for explicit data copies between host and device, helps speed up program execuJon •  EvaluaJon of two applicaJons, exploiJng OpenCL 2.0 features: 1) FIR ﬁltering and 2) AES encrypJon •  Both wriXen and opJmized for both OpenCL 1.2 and 2.0 •  Discuss improvements in terms of code readability and simplicity

o  Earlier GPUs are designed to handle 3-‐D graphics -‐ later evolved devices to speed up scienJﬁc compuJng o  GPUs have many cores that can run thousand threads simultaneously o  Provide great computaJonal power at low cost ²  OpenCL is the leading general purpose programming framework for heterogeneous systems, used to program CPUs, GPUs, FPGAs ²  Based on C99, designed by Apple, maintained by Khronos Group ²  Most widely used version: 1.2 and latest revision: OpenCL 2.0 ²  Code is more portable than other HPC programming languages such as CUDA

Finite Impulse Response •  •  •  • 

Four diﬀerent operaJons to encrypt the data Data read in from a ﬁle linearly as 16-‐byte 4x4 arrays called “states” States are individually processed through the algorithm-‐ potenJal parallelism A series of 4 diﬀerent transformaJons to the states applied: o  SubBytes replaces some bytes in the state with values from a pre-‐generated array of data. o  ShidRows shids the bytes in each row by a certain oﬀset. o  AddRoundKey XORs each row with a 4-‐byte word generated from the private key. o  MixColumns performs a polynomial operaJon on each column •  To decrypt, perform the inverse of the operaJons. •  •  •  • 

Implemen7ng with OpenCL 2.0:

Implemen7ng with OpenCL 2.0:

ü  The AES private key expanded on the CPU ü  A shared space for the input and output data allocated in SVM using clSVMAlloc ü  The host parcels the data into states and copies it into SVM ü  The kernel started and each work-‐unit based on its local ID reads its delegate state into its register ü  Each work-‐unit processes the state through the AES encrypJon algorithm ü  The work-‐unit copies the processed data back to SVM ü  The host writes the encrypted data to a ﬁle

Results

Results OpenCL 1.2 Time (miliseconds)

OpenCL 2.0 Time (miliseconds)

5000

1973

1740

10000

3493

2896

15000

4371

3981

20000

6393

5123

Time (seconds)

Dimension

Conclusion and Future Work •  •  •  • 

Using SVM reduces coding eﬀort and increases code readability Decreases both applicaJons since no explicit copies of data In terms of future work, next target is to implement a ﬁne-‐grain SVM on Kaveri APUs Develop and opJmize more applicaJons supporJng OpenCL 2.0 – ongoing work with the HSA foundaJon

Time (seconds)

Input dimensions size (5000x)

•  Dynamic Parallelism: Allows kernels start other kernels without interacJon with the host. •  Shared Virtual Memory: Allows shared memory space and pointers between host and device. Reduces data copying between the two devices and simplify memory management. •  Android Installable Client Driver Extension: Allows OpenCL implementaJons to be discovered and loaded as shared objects on Android systems. •  Image Support: Improved image support including sRGB and 3D image writes and the ability for mulJple kernels to read and write to the same image.

Advanced Encryp)on Standard

Used to calculate the weighted sum of the most recent input values Compared to IIR, more ﬁne tuned responses, although consumes more computaJon Jme Each compute unit calculates the output for its coeﬃcient Used in digital audio applicaJons for ﬁltering audio signals

ü  The input and coeﬃcients allocated using clSVMAlloc ü  The memory space mapped (clEnqueueSVMMap) for the host to write the input data ü  The memory space is unmapped (clEnqueueSVMUnmap) and passed to the device using clSetKernelArgSVMPointer ü  The device runs the kernel and calculates the results ü  The memory space is mapped for the host to read out the results ü  The memory space is freed using clSVMFree

New Features in OpenCL 2.0

GPGPU and OpenCL

1M 10M 100M 1000M Input size (MB)

Size

OpenCL 1.2 Time (seconds)

CPU Time (seconds)

OpenCL 2.0 Time (seconds)

1M

0.806

1.271

1.24

10M

12.27

19.947

12.539

100M

116.783

211.023

117.054

1000M

1160.867

2117.082

1146.273

References

[1] B. Gaster, L. Howes, D. R. Kaeli, P. Mistry, and D. Schaa, Heterogeneous CompuJng with OpenCL, 1st ed. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2011. [2] K. O. W. Group et al., “OpenCL 2.0 speciﬁcaJon,” Khronos Group, Nov, 2013. [3] A. V. Oppenheim, A. S. Willsky, and S. H. Nawab, Signals and systems. PrenJce-‐Hall Englewood Cliﬀs, NJ, 1983, vol. 2. [4] J. Daemen and V. Rijmen, The design of Rijndael: AES-‐the advanced encrypJon standard. Springer, 2002.

Recommend Documents

Degree Level: Bachelor of Science Undergraduate Category ...

Health Science Degree Level: Bachelor of Science Abstract ID

Undergraduate Category: Engineering and Technology Degree Level ...

Category: Engineering and Technology Degree Level: Undergraduate ...

Graduate Category: Engineering & Technology Degree Level: Ph.D ...