The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining
International Journal on Data Science and Technology
Volume 4, Issue 1, March 2018, Pages: 6-14
Received: Apr. 26, 2018; Published: Apr. 27, 2018
Views 767      Downloads 53
Authors
Junlong Zhang, School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China
Dan Zhao, School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China
Huijie Wang, School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China
Article Tools
Follow on us
Abstract
To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.
Keywords
Data Mining, Outlier Excavation, Machine Learning, Talent Identification
To cite this article
Junlong Zhang, Dan Zhao, Huijie Wang, The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining, International Journal on Data Science and Technology. Vol. 4, No. 1, 2018, pp. 6-14. doi: 10.11648/j.ijdst.20180401.12
References
[1]
E. Knorr and V. Tucakov, “Distance-based outliers: algorithms and applications,” Vldb Journal, 2000, vol. 8, pp. 237-253.
[2]
F. Jiang, J. W. Du, Y. F. Sui, et al, “Outlier detection based on boundary and distance,” Acta Electronica Sinica, 2010, vol. 38, pp. 700-705.
[3]
M. M. Breuing, H. P. Kriegel and R. T. Ng, “LOF: identifying density-based local outliers,” ACM Sigmord Record, 2000, vol. 29, pp. 93-104.
[4]
A. K. Jain, M. N. Murty and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, 1999, vol. 31, pp. 264-323.
[5]
L. V. Utkin, A. I. Chekh and Y. A. Zhuk, “Binary classification svm-based algorithms with interval-valued training data using triangular and epanechnikov kernels,” Neural Networks, 2016, vol. 80, pp. 53-66.
[6]
L. Breiman, “Random forest,” Machine Learning, 2001, vol. 45, pp. 5-32.
[7]
Y. Freund and L. Mason, “The alternating decision tree learning agorithm,” Machine Learning: Sixteenth International Conference, 1999, vol. 99, pp. 124-133.
[8]
G. K. Smyth, “Linear models and empirical bayes methods for assessing differential expression in microarray experiments,” Statistical Applications in Genetics and Molecular Biology, 2004, vol. 3, pp. 1-25.
[9]
R. K. Pearson, “Outliers in process modeling and identification,” IEEE Transactions on Control Systems, 2008, vol. 10, pp. 55-63.
[10]
D. Yu, G. Sheikholeslami and A. Zhang, “Findout: finding outliers in very large datasets,” Knowledge and Information Systems, 2002, vol. 4, pp. 387-412.
[11]
R. D. Banker and H. Chang, “The super-efficiency procedure for outlier identification, not for ranking efficient units,” European Journal of Operational Research, 2006, vol. 175, pp. 1311-1320.
[12]
C. C. Aggarwal and P. S. Yu, “Outlier detection for high dimensional data,” ACM Sigmod Record, 2001, vol. 30, pp. 37-46.
[13]
M. S. Chen, J. Han and P. S. Yu, “Data mining: an overview from a database perspective,” IEEE Transactions on Knowledge and Data Engineering, 1996, vol. 8, pp. 866-883.
[14]
F. Jiang, J. W. Du, Y. F. Sui, et al, “Outlier detection based on boundary and distance,” Acta Electronica Sinica, 2010, vol. 38, pp. 700-705.
ADDRESS
Science Publishing Group
548 FASHION AVENUE
NEW YORK, NY 10018
U.S.A.
Tel: (001)347-688-8931