Analysis of the real EADGENE data set: multivariate approaches and post analysis (open access publication)

TitleAnalysis of the real EADGENE data set: multivariate approaches and post analysis (open access publication)
Publication TypeJournal Article
Year of Publication2007
AuthorsSorensen, P, Bonnet, A, Buitenhuis, B, Closset, R, Déjean, S, Delmas, C, Duval, M, Glass, L, Hedegaard, J, Hornshoj, H, Hulsegge, I, Jaffrezic, F, Jensen, K, Jiang, L, de Koning, DJ, Le Cao, KA, Nie, H, Petzl, W, Pool, MH, Robert-Granié, C, San Cristobal, M, Lund, MS, van Schothorst, EM, Schuberth, HJ, Seyfert, HM, Tosser-Klopp, G, Waddington, D, Watson, M, Yang, W, Zerbe, H
JournalGenet Sel Evol
Date PublishedNov-Dec
Keywords*Databases, Animals, Bovine/genetics, Cattle/genetics, Data Interpretation, Domestic/genetics, Escherichia coli Infections/genetics/veterinary, Europe, Female, Gene Expression Profiling/*statistics & numerical data, Genetic, Host-Pathogen Interactions/genetics, Mastitis, Multivariate Analysis, Oligonucleotide Array Sequence Analysis/*statistics & numerical data, Staphylococcal Infections/genetics/veterinary, Statistical

The aim of this paper was to describe, and when possible compare, the multivariate methods used by the participants in the EADGENE WP1.4 workshop. The first approach was for class discovery and class prediction using evidence from the data at hand. Several teams used hierarchical clustering (HC) or principal component analysis (PCA) to identify groups of differentially expressed genes with a similar expression pattern over time points and infective agent (E. coli or S. aureus). The main result from these analyses was that HC and PCA were able to separate tissue samples taken at 24 h following E. coli infection from the other samples. The second approach identified groups of differentially co-expressed genes, by identifying clusters of genes highly correlated when animals were infected with E. coli but not correlated more than expected by chance when the infective pathogen was S. aureus. The third approach looked at differential expression of predefined gene sets. Gene sets were defined based on information retrieved from biological databases such as Gene Ontology. Based on these annotation sources the teams used either the GlobalTest or the Fisher exact test to identify differentially expressed gene sets. The main result from these analyses was that gene sets involved in immune defence responses were differentially expressed.