Application of Cross-Validation Techniques to Handle Overfitting in a Case Study of Decision Tree Implementation for Lung Cancer Prediction

Faurika Faurika; Ahsanun Naseh Khudori; M. Syauqi Haris

doi:10.25181/rt.v2i2.3631

Authors

Faurika Faurika Institut Teknologi, Sains, dan Kesehatan RS.DR.Soepraoen Kesdam V/BRW
Ahsanun Naseh Khudori Institut Teknologi, Sains, dan Kesehatan RS. DR. Soepraoen Kesdam V/BRW
M. Syauqi Haris Institut Teknologi, Sains, dan Kesehatan RS. DR. Soepraoen Kesdam V/BRW

DOI:

https://doi.org/10.25181/rt.v2i2.3631

Keywords:

Machine learning, Decision tree, Aturan, Cross-validation, lung cancer

Abstract

Lung cancer is a condition caused by cancer cells growing in the lungs. Lung cancer causes a weakened immune system, tumors, and other abnormalities that prevent the body from functioning properly. Lung cancer examination uses various technologies, namely CT Scan, X-ray, and others. However, the examination is relatively expensive and takes a long time. The use of machine learning makes it possible to support lung cancer diagnosis. With the large amount of medical data available today, machine learning can recognize patterns in the data so that it will help the process of diagnosing lung cancer more effectively. This study aims to correct overfitting in previous research which used the decision tree method to predict lung cancer with cross-validation techniques. In this research, we use a public dataset from Data World. This dataset consists of 25 data attributes and has 1000 data. The results of this research are rules obtained from decision trees which are then evaluated to produce 96.7% accuracy, 96.7% precision, 96.7% recall, and 96.7% f1-score. These results show that the decision tree method performs well in predicting lung cancer early and the cross-validation technique can overcome overfitting in decision trees with more general and stable results.