Multiclass classification and gene selection with a stochastic algorithm

TitleMulticlass classification and gene selection with a stochastic algorithm
Publication TypeJournal Article
Year of Publication2009
AuthorsLe Cao, KA, Bonnet, A, Gadat, S
JournalComputational Statistics and Data Analysis

Microarray technology allows for the monitoring of thousands of gene expressions invarious biological conditions, but most of these genes are irrelevant for classifying theseconditions. Feature selection is consequently needed to help reduce the dimension of thevariable space. Starting from the application of the stochastic meta-algorithm ‘‘OptimalFeature Weighting’’ (OFW) for selecting features in various classification problems,focus is made on the multiclass problem that wrapper methods rarely handle. From acomputational point of view, one of the main difficulties comes from the unbalancedclasses situation that is commonly encountered in microarray data. From a theoreticalpoint of view, very few methods have been developed so far to minimize the classificationerror made on the minority classes. The OFW approach is developed to handle multiclassproblems using CART and one-vs-one SVM classifiers. Comparisons are made with othermulticlass selection algorithms such as Random Forests and the filter method F-test onfive public microarray data sets with various complexities. Statistical relevancy of thegene selections is assessed by computing the performances and the stability of thesedifferent approaches and the results obtained show that the two proposed approaches arecompetitive and relevant to selecting genes classifying the minority classes.Application to a pig folliculogenesis study follows and a detailed interpretation of thegenes that were selected shows that the OFW approach answers the biological question.