Implementation of Statistical Feature Selection and Feature Extraction on Cancer Classification

Muhammad Azharuddin Arif (1), Zuraini Ali Shah (2)
(1)
(2)
Fulltext View | Download
How to cite (IJASEIT) :
Citation Format :

Nowadays, cancer classification has used advanced technology such as microarray technology to conduct a research. Microarray


is a technology that allows us to measured thousands of genes simultaneously. This technology also have successfully applied in many


problems, for example in medical science. Microarray also has shown it ability to diagnose a patient that have specific disease. Thus, this


technology used to detect a disease such as cancer, which usually have a binary class. The major drawback in terms of classification of this


disease is, the gene expression data produced by microarray have high dimension. To counter this problems, an important genes should be


identify and reduce the dimensionality of the microarray data. In this research, six feature selections (Receiver Operating Characteristic curve,


Wilcoxon rank sum test, t-statistic, Kruskal-Wallis test statistic, Fisher score, and Gini index) has been used with the combination of Principal


Component Analysis (feature extraction) to solve the high dimension problem and produce a new subset of original datasets. Then, the new


dataset is classified according to their class. Three classifications (K-Nearest Neighbour, Linear Discriminant Analysis, and Support Vector


Machine) are used in this research and the performance of each classifier are calculated and compared. The experimental result shows that,


among the feature selections, both Wilcoxon rank sum test with Principal Component Analysis for Linear Discriminant Analysis classifier and


Receiver Operating Characteristic curve with Principal Component Analysis for Support Vector Machine classifier shows highest correct rate


with 96% which outperformed other feature selections.