Fisher linear discriminant analysis as a method for protein classification

Pavel Schlesinger

Charles University in Prague, Faculty of Mathematics and Physics,
Institute of Formal and Applied Linguistics,
Malostranské náměstí 25, CZ-118 00 Praha 1
schlesinger@ufal.ms.mff.cuni.cz

Keyword: Fisher linear discriminant analysis, protein classification, prediction of microarrays

Statistical decision rules can be used in genetics, more accurately in microarray, to classify proteins into several classes according information taken from other measured variables.

The contribution concerns on using Fisher linear discriminant analysis on a real data consisting of 268 proteins classified according 400 variables into 42 classes. The classification task of such a case is typical for microarray where number of variables is higher than number of observed objects (also known as $p \gg n$ problem). The data was investigated several times earlier in e. g. [1], [2] and [3]; the project web page of these studies as well as with other results is http://www.dkfz.de/biostatistics/protein/DEF.html. A lot of linear and nonlinear methods was used, the lowest error was reached with nonlinear method called Support Vector Machines.

It will be shown that with a good implementation of Fisher discriminants the linear method can beat nonlinear ones. A small comparison with classical implementation by lda() function in R (http://www.r-project.org) will be shown as well.

References:

[1] L. Edler and J. Grassmann (1999): Protein fold prediction is a new field for statistical classification and regression. In Seillier-Moiseiwitsch F (Ed): Statistics in Molecular Biology and Genetics. IMS Lecture Notes Monograph Series 33, 288-313

[2] L. Edler, J. Grassmann and S. Suhai (2001): Role and results of statistical methods in protein fold class prediction. Mathematical and Computer Modelling, vol. 33, 1401-1417

[3] F. Markowetz, L. Edler and M. Vingron (2003): Support Vector Machines for Protein Fold Class Prediction. Biometrical Journal, vol. 45, issue 3 , 377-389

2005-05-23