Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity
American Journal of Software Engineering and Applications
Volume 5, Issue 3-1, May 2016, Pages: 10-14
Received: Feb. 2, 2016;
Accepted: Feb. 6, 2016;
Published: Jun. 24, 2016
Views 4484 Downloads 153
Neda Mohammadi, Department of Computer Engineering, Shiraz University of Technology, Shiraz, Iran
Nowadays a number of applications with high volume of calculations are constantly increasing. Central Processing Unit (CPU) consists of finite number of cores. Degree of parallelism and implementation speed are issues that data high volume on CPU is low. Using the thread concept in programming, algorithms which have the parallelism capabilities, can be executed in parallel. There are many issues which in order to solving them, finding similar items in a metric space and grouping them in these issues is necessary. Computational complexity finding nearest neighbors is a challenge for run time. To evaluate the performance of GPUs speed in searching nearest neighbors, GPGPU and CUDA are used and compared with CPU usage. In this paper parallel implementation of the algorithm on GPU with access to its shared memory, is compared with parallel implementation of the algorithm on CPU through threads. It is understood that threads use graphics card's shared memory for communications, storing temporary data and retrieving data. Therefore, the parallelism on GPU is more useful than parallelism on CPU in High-Dimensional spaces. Finally, it is discussed that GPU reduces complexity to a considerable amount and is scalable.
Evaluation of GPU Performance Compared to CPU for Implementing Algorithms with High Time Complexity, American Journal of Software Engineering and Applications. Special Issue: Advances in Computer Science and Information Technology in Developing Countries.
Vol. 5, No. 3-1,
2016, pp. 10-14.
M., Guevara, G., Chris, K., Hazelwood, and K., Skadron. "Enabling task parallelism in the cuda scheduler." In Workshop on Programming Models for Emerging Architectures, 2009, pp. 69-76.
V., Garcia,, E., Debreuve, F., Nielsen, and M., Barlaud,. “K-nearest neighbor search: Fast GPU-based implementations and application to high-dimensional feature matching”. In Image Processing (ICIP), 2010, September. 17th IEEE International Conference on (pp. 3757-3760).
CUDA C Programming Guide: CUDA Toolkit Documentation, http://docs.nvidia.com/cuda/,cuda-c-programming-guide/index.html.
S., Liang, Y., Liu, C. Wang, and L., Jian,. “Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU”. In Web Society (SWS), 2010 IEEE 2nd Symposium on (pp. 53-60). IEEE, 2010, August.
Hawick, Kenneth A., Arno Leist, and Daniel P. Playne. "Parallel graph component labelling with GPUs and CUDA". Parallel Computing (2010) 36, no. 12 655-678.
S., Liang, Y., Liu, C., Wang, and L., Jian, 2009, October. “A CUDA-based parallel implementation of K-nearest neighbor algorithm”. In Cyber-Enabled Distributed Computing and Knowledge Discovery, 2009. CyberC'09. International Conference on (pp. 291-296). IEEE
D., Qiu, S., May, & A., Nüchter. “GPU-accelerated nearest neighbor search for 3D registration”. In Computer Vision Systems, 2009. (pp. 194-203). Springer Berlin Heidelberg.
V., Garcia, E., Debreuve, & M. Barlaud, "Fast k nearest neighbor search using GPU". In Computer Vision and Pattern Recognition Workshops, 2008. CVPRW'08. 2008, June. IEEE Computer Society Conference on (pp. 1-6). IEEE.
N. Mohammadi. "Presentation of a Parallel Algorithm for Nearest Neighbor Search on GPU Using CUDA". Current Trends in Technology and Sciences (CTTS) 2015.
J., Nickolls, I., Buck, M., Garland, & K.Skadron, "Scalable parallel programming with CUDA". Queue, 6(2), 2008. 40-53.
Nourzad. "The introduction of context-sensitive hashing algorithm for finding nearest neighbors in high-dimensional spaces". Amirkabir University of Technology, (2010).
X., Wu, V., Kumar, J. R., Quinlan, J., Ghosh, Q., Yang, H., Motoda, & D. Steinberg. "Top 10 algorithms in data mining. Knowledge and Information Systems", 2008.14(1), 1-37.
S., Liang, C., Wang, Y., Liu, and L., Jian,. “CUKNN: A parallel implementation of K-nearest neighbor on CUDA-enabled GPU”. In Information, Computing and Telecommunication, 2009. YC-ICT'09. 2009, September. IEEE Youth Conference on (pp. 415-418).
A. Neumann, " Parallel reduction of multidimensional arrays for supporting online analytical processing (olap) on a graphics processing unit (gpu)". The University of Western Australia, 2008.
S. Ryoo, C. I Rodrigues, S. S Baghsorkhi, S. S Stone, D. B. Kirk,, & W. M. W. Hwu, "Optimization principles and application performance evaluation of a multithreaded GPU using CUDA".. (2008, February) In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming. (pp. 73-82). ACM.