The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining

Junlong Zhang; Dan Zhao; Huijie Wang

doi:doi:10.11648/j.ijdst.20180401.12

| Peer-Reviewed

The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining

Junlong Zhang, Dan Zhao, Huijie Wang

Published in International Journal on Data Science and Technology (Volume 4, Issue 1)

Received: 26 April 2018 Accepted: Published: 27 April 2018

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.

Published in	International Journal on Data Science and Technology (Volume 4, Issue 1)
DOI	10.11648/j.ijdst.20180401.12
Page(s)	6-14
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Data Mining, Outlier Excavation, Machine Learning, Talent Identification

References

[1]	E. Knorr and V. Tucakov, “Distance-based outliers: algorithms and applications,” Vldb Journal, 2000, vol. 8, pp. 237-253.
[2]	F. Jiang, J. W. Du, Y. F. Sui, et al, “Outlier detection based on boundary and distance,” Acta Electronica Sinica, 2010, vol. 38, pp. 700-705.
[3]	M. M. Breuing, H. P. Kriegel and R. T. Ng, “LOF: identifying density-based local outliers,” ACM Sigmord Record, 2000, vol. 29, pp. 93-104.
[4]	A. K. Jain, M. N. Murty and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, 1999, vol. 31, pp. 264-323.
[5]	L. V. Utkin, A. I. Chekh and Y. A. Zhuk, “Binary classification svm-based algorithms with interval-valued training data using triangular and epanechnikov kernels,” Neural Networks, 2016, vol. 80, pp. 53-66.
[6]	L. Breiman, “Random forest,” Machine Learning, 2001, vol. 45, pp. 5-32.
[7]	Y. Freund and L. Mason, “The alternating decision tree learning agorithm,” Machine Learning: Sixteenth International Conference, 1999, vol. 99, pp. 124-133.
[8]	G. K. Smyth, “Linear models and empirical bayes methods for assessing differential expression in microarray experiments,” Statistical Applications in Genetics and Molecular Biology, 2004, vol. 3, pp. 1-25.
[9]	R. K. Pearson, “Outliers in process modeling and identification,” IEEE Transactions on Control Systems, 2008, vol. 10, pp. 55-63.
[10]	D. Yu, G. Sheikholeslami and A. Zhang, “Findout: finding outliers in very large datasets,” Knowledge and Information Systems, 2002, vol. 4, pp. 387-412.
[11]	R. D. Banker and H. Chang, “The super-efficiency procedure for outlier identification, not for ranking efficient units,” European Journal of Operational Research, 2006, vol. 175, pp. 1311-1320.
[12]	C. C. Aggarwal and P. S. Yu, “Outlier detection for high dimensional data,” ACM Sigmod Record, 2001, vol. 30, pp. 37-46.
[13]	M. S. Chen, J. Han and P. S. Yu, “Data mining: an overview from a database perspective,” IEEE Transactions on Knowledge and Data Engineering, 1996, vol. 8, pp. 866-883.
[14]	F. Jiang, J. W. Du, Y. F. Sui, et al, “Outlier detection based on boundary and distance,” Acta Electronica Sinica, 2010, vol. 38, pp. 700-705.

Cite This Article

Plain Text BibTeX RIS

APA Style

Junlong Zhang, Dan Zhao, Huijie Wang. (2018). The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. International Journal on Data Science and Technology, 4(1), 6-14. https://doi.org/10.11648/j.ijdst.20180401.12

Copy | Download

ACS Style

Junlong Zhang; Dan Zhao; Huijie Wang. The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. Int. J. Data Sci. Technol. 2018, 4(1), 6-14. doi: 10.11648/j.ijdst.20180401.12

Copy | Download

AMA Style

Junlong Zhang, Dan Zhao, Huijie Wang. The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. Int J Data Sci Technol. 2018;4(1):6-14. doi: 10.11648/j.ijdst.20180401.12

Copy | Download

@article{10.11648/j.ijdst.20180401.12,
  author = {Junlong Zhang and Dan Zhao and Huijie Wang},
  title = {The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining},
  journal = {International Journal on Data Science and Technology},
  volume = {4},
  number = {1},
  pages = {6-14},
  doi = {10.11648/j.ijdst.20180401.12},
  url = {https://doi.org/10.11648/j.ijdst.20180401.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20180401.12},
  abstract = {To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.},
 year = {2018}
}

Copy | Download

TY  - JOUR
T1  - The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining
AU  - Junlong Zhang
AU  - Dan Zhao
AU  - Huijie Wang
Y1  - 2018/04/27
PY  - 2018
N1  - https://doi.org/10.11648/j.ijdst.20180401.12
DO  - 10.11648/j.ijdst.20180401.12
T2  - International Journal on Data Science and Technology
JF  - International Journal on Data Science and Technology
JO  - International Journal on Data Science and Technology
SP  - 6
EP  - 14
PB  - Science Publishing Group
SN  - 2472-2235
UR  - https://doi.org/10.11648/j.ijdst.20180401.12
AB  - To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.
VL  - 4
IS  - 1
ER  -

Copy | Download

Author Information

Junlong Zhang

School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China
Dan Zhao

School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China
Huijie Wang

School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Junlong Zhang, Dan Zhao, Huijie Wang. (2018). The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. International Journal on Data Science and Technology, 4(1), 6-14. https://doi.org/10.11648/j.ijdst.20180401.12

Copy | Download

ACS Style

Junlong Zhang; Dan Zhao; Huijie Wang. The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. Int. J. Data Sci. Technol. 2018, 4(1), 6-14. doi: 10.11648/j.ijdst.20180401.12

Copy | Download

AMA Style

Junlong Zhang, Dan Zhao, Huijie Wang. The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. Int J Data Sci Technol. 2018;4(1):6-14. doi: 10.11648/j.ijdst.20180401.12

Copy | Download

@article{10.11648/j.ijdst.20180401.12,
  author = {Junlong Zhang and Dan Zhao and Huijie Wang},
  title = {The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining},
  journal = {International Journal on Data Science and Technology},
  volume = {4},
  number = {1},
  pages = {6-14},
  doi = {10.11648/j.ijdst.20180401.12},
  url = {https://doi.org/10.11648/j.ijdst.20180401.12},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20180401.12},
  abstract = {To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.},
 year = {2018}
}

Copy | Download

TY  - JOUR
T1  - The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining
AU  - Junlong Zhang
AU  - Dan Zhao
AU  - Huijie Wang
Y1  - 2018/04/27
PY  - 2018
N1  - https://doi.org/10.11648/j.ijdst.20180401.12
DO  - 10.11648/j.ijdst.20180401.12
T2  - International Journal on Data Science and Technology
JF  - International Journal on Data Science and Technology
JO  - International Journal on Data Science and Technology
SP  - 6
EP  - 14
PB  - Science Publishing Group
SN  - 2472-2235
UR  - https://doi.org/10.11648/j.ijdst.20180401.12
AB  - To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.
VL  - 4
IS  - 1
ER  -

Copy | Download