Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction

Yixuan Li; Zixuan Chen

doi:doi:10.11648/j.acm.20180704.15

| Peer-Reviewed

Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction

Yixuan Li, Zixuan Chen

Published in Applied and Computational Mathematics (Volume 7, Issue 4)

Received: 17 October 2018 Published: 18 October 2018

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.

Published in	Applied and Computational Mathematics (Volume 7, Issue 4)
DOI	10.11648/j.acm.20180704.15
Page(s)	212-216
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2018. Published by Science Publishing Group

Keywords

Data Mining, Breast Cancer, Classification Models, Prediction

References

[1]	Harbeck, N. & Gnant, M. (2017). Breast cancer. The Lancet, 389, 1134-1150.
[2]	Wass, J. (2007). The R language. Scientific Computing, 24, 40-41.
[3]	Patrício, M., Pereira, J., & Crisóstomo, J. et al. (2018). Using resistin, glucose, age, and BMI to predict the presence of breast cancer. BMC Cancer, 18, 21-29.
[4]	Chaurasia, V., Pal, S., & Tiwari, B. B. (2018). Prediction of benign and malignant breast cancer using data mining techniques. Journal of Algorithms & Computational Technology, 12(2), 119-126.
[5]	Cakir, A. & Demirel, B. (2011). A software tool for determination of breast cancer treatment methods using data mining approach. Journal of Medical Systems, 35(6), 1503-1511.
[6]	Takada, M., Sugimoto, M., & Ohno, S. et al. (2012). Prediction of the pathological response to neoadjuvant chemotherapy in patients with primary breast cancer using a data mining technique. Breast Cancer Research and Treatment, 134(2), 661-670.
[7]	Liu, X. Q., Li, Q. M., & Li, T. (2017). Differentially private classification with decision tree ensemble. Applied Soft Computing, 62, 807-816.
[8]	O’Neil, G. L., Goodhall, J. L., & Watson, L. T. (2018). Evaluating the potential for site-specific modification of LiDAR DEM derivatives to improve environmental planning-scale wetland identification using random forest classification. Journal of Hydrology, 559, 192-208.
[9]	Zhang, H., Gao, C., & Zhang, M. (2017). Prediction of soil organic carbon in an intensively managed reclamation zone of eastern China: a comparison of multiple linear regressions and the random forest model. Science of the Total Environment, 592, 704-713.
[10]	Li, L., Paxton, E. W., & Fan, J. (2017). Predicting risk for adverse health events using random forest. Journal of Applied Statistics, 45(12), 2279-2294.
[11]	Clark, J. W. (1991). Neural network modeling. Physics in Medicine & Biology, 36, 1259-1317.
[12]	Suthar, V., Tarmizi, R. A., & Midi, H. et al. (2010). Students’ belief on mathematics and achievement of university students: logistic regression analysis. Procedia-Social and Behavioral Science, 8, 525-531.

Cite This Article

Plain Text BibTeX RIS

APA Style

Yixuan Li, Zixuan Chen. (2018). Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Applied and Computational Mathematics, 7(4), 212-216. https://doi.org/10.11648/j.acm.20180704.15

Copy | Download

ACS Style

Yixuan Li; Zixuan Chen. Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Appl. Comput. Math. 2018, 7(4), 212-216. doi: 10.11648/j.acm.20180704.15

Copy | Download

AMA Style

Yixuan Li, Zixuan Chen. Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Appl Comput Math. 2018;7(4):212-216. doi: 10.11648/j.acm.20180704.15

Copy | Download

@article{10.11648/j.acm.20180704.15,
  author = {Yixuan Li and Zixuan Chen},
  title = {Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction},
  journal = {Applied and Computational Mathematics},
  volume = {7},
  number = {4},
  pages = {212-216},
  doi = {10.11648/j.acm.20180704.15},
  url = {https://doi.org/10.11648/j.acm.20180704.15},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20180704.15},
  abstract = {Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.},
 year = {2018}
}

Copy | Download

TY  - JOUR
T1  - Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction
AU  - Yixuan Li
AU  - Zixuan Chen
Y1  - 2018/10/18
PY  - 2018
N1  - https://doi.org/10.11648/j.acm.20180704.15
DO  - 10.11648/j.acm.20180704.15
T2  - Applied and Computational Mathematics
JF  - Applied and Computational Mathematics
JO  - Applied and Computational Mathematics
SP  - 212
EP  - 216
PB  - Science Publishing Group
SN  - 2328-5613
UR  - https://doi.org/10.11648/j.acm.20180704.15
AB  - Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.
VL  - 7
IS  - 4
ER  -

Copy | Download

Author Information

Yixuan Li

School of Mathematics and Statistics, University of Sheffield, Sheffield, UK
Zixuan Chen

School of Information, Zhejiang University of Finance and Economics, Hangzhou, China

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Yixuan Li, Zixuan Chen. (2018). Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Applied and Computational Mathematics, 7(4), 212-216. https://doi.org/10.11648/j.acm.20180704.15

Copy | Download

ACS Style

Yixuan Li; Zixuan Chen. Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Appl. Comput. Math. 2018, 7(4), 212-216. doi: 10.11648/j.acm.20180704.15

Copy | Download

AMA Style

Yixuan Li, Zixuan Chen. Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction. Appl Comput Math. 2018;7(4):212-216. doi: 10.11648/j.acm.20180704.15

Copy | Download

@article{10.11648/j.acm.20180704.15,
  author = {Yixuan Li and Zixuan Chen},
  title = {Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction},
  journal = {Applied and Computational Mathematics},
  volume = {7},
  number = {4},
  pages = {212-216},
  doi = {10.11648/j.acm.20180704.15},
  url = {https://doi.org/10.11648/j.acm.20180704.15},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20180704.15},
  abstract = {Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.},
 year = {2018}
}

Copy | Download

TY  - JOUR
T1  - Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction
AU  - Yixuan Li
AU  - Zixuan Chen
Y1  - 2018/10/18
PY  - 2018
N1  - https://doi.org/10.11648/j.acm.20180704.15
DO  - 10.11648/j.acm.20180704.15
T2  - Applied and Computational Mathematics
JF  - Applied and Computational Mathematics
JO  - Applied and Computational Mathematics
SP  - 212
EP  - 216
PB  - Science Publishing Group
SN  - 2328-5613
UR  - https://doi.org/10.11648/j.acm.20180704.15
AB  - Breast cancer is the most common invasive cancer in women and the second main cause of cancer death in females, which can be classified Benign or Malignant. Research and prevention on breast cancer have attracted more concern of researchers in recent years. On the other hand, the development of data mining methods provides an effective way to extract more useful information from complex database, and some prediction, classification and clustering can be made according to extracted information. In this study, to explore the relationship between breast cancer and some attributes so that the death probability of breast cancer can be reduced, five different classification models including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), Neural Network (NN) and Logistics Regression (LR) are used for the classification of two different datasets related to breast cancer: Breast Cancer Coimbra Dataset (BCCD) and Wisconsin Breast Cancer Database (WBCD). Three indicators including prediction accuracy values, F-measure metric and AUC values are used to compare the performance of these five classification models. comparative experiment analysis shows that random forest model can achieve better performance and adaptation than other four methods. Therefore, the model of this study is approved to possess clinical and referential values in practical applications.
VL  - 7
IS  - 4
ER  -

Copy | Download