Implementation of Statistical Feature Selection and Feature Extraction on Cancer Classification
How to cite (IJASEIT) :
Nowadays, cancer classification has used advanced technology such as microarray technology to conduct a research. Microarray
is a technology that allows us to measured thousands of genes simultaneously. This technology also have successfully applied in many
problems, for example in medical science. Microarray also has shown it ability to diagnose a patient that have specific disease. Thus, this
technology used to detect a disease such as cancer, which usually have a binary class. The major drawback in terms of classification of this
disease is, the gene expression data produced by microarray have high dimension. To counter this problems, an important genes should be
identify and reduce the dimensionality of the microarray data. In this research, six feature selections (Receiver Operating Characteristic curve,
Wilcoxon rank sum test, t-statistic, Kruskal-Wallis test statistic, Fisher score, and Gini index) has been used with the combination of Principal
Component Analysis (feature extraction) to solve the high dimension problem and produce a new subset of original datasets. Then, the new
dataset is classified according to their class. Three classifications (K-Nearest Neighbour, Linear Discriminant Analysis, and Support Vector
Machine) are used in this research and the performance of each classifier are calculated and compared. The experimental result shows that,
among the feature selections, both Wilcoxon rank sum test with Principal Component Analysis for Linear Discriminant Analysis classifier and
Receiver Operating Characteristic curve with Principal Component Analysis for Support Vector Machine classifier shows highest correct rate
with 96% which outperformed other feature selections.