"Every picture tells a story: Visual data analysis using the cluster heat map"
Speaker: Jim Bezdek
This talk is about the cluster heat map (aka reordered dissimilarity image), which is a very popular visual data analysis method in the bioinformatics community. I will start with definitions and examples of the three canonical problems of cluster analysis: tendency assessment, clustering, and cluster validity. Then I will give a short history of visual approaches that began in 1873 for these three problems. Definitions and examples of the VAT (visual assessment of tendency) and recursive, improved (iVAT) approaches for building a reordered dissimilarity image from relational data are next. This is followed by scalable VAT (sVAT) for arbitrarily large square data. And finally, two versions of coVAT for rectangular dissimilarity data are discussed. coVAT is arguably the most interesting of the methods discussed, as it provides evidence for the existence of clusters for all four of the clustering problems associated with rectangular relational data. I will conclude with an example illustrating the use of coVAT for precluster analysis of some data from a microarray experiment that are expression values of 517 genes in the presence of 18 treatments.
Jim received the PhD in Applied Mathematics from Cornell University in 1973. Jim is past president of NAFIPS (North American Fuzzy Information Processing Society), IFSA (International Fuzzy Systems Association) and the IEEE CIS (Computational Intelligence Society): founding editor the Int'l. Jo. Approximate Reasoning and the IEEE Transactions on Fuzzy Systems: Life fellow of the IEEE and IFSA; and a recipient of the IEEE 3rd Millennium, IEEE CIS Fuzzy Systems Pioneer, and IEEE technical field award Rosenblatt medals. Jim's interests: woodworking, optimization, motorcycles, pattern recognition, cigars, clustering in very large data, fishing, co-clustering, blues music, wireless sensor networks, poker and visual clustering. Jim retired in 2007, and will be coming to a university near you soon.
"Flexible modeling and artificial neural networks for the analysis of failure time data"
Speaker: Elia Biganzoli, Section of Medical Statistics, Biometry and Bioinformatics "Giulio A. Maccacaro", Department of Clinical and Community Health Sciences, University of Milan
Campus Cascina Rosa, Fondazione IRCCS Istituto Nazionale Tumori
Via A.Vanzetti 5, 20133 Milano (Italy)
The aim of the talk is to introduce and discuss the problem of modeling the hazard function of censored failure time data accounting for the prognostic information provided by putative prognostic variables, possibly measured on a quantitative scale. A number of methodological aspects will be treated, with particular reference to the role of spline functions and feed forward artificial neural networks.
Linear and non-linear flexible regression analysis techniques, such as those based on splines and (FFANNs), have been proposed for the analysis of survival time data. Among survival functions, the hazard has a biological interest for the study of the disease dynamics. Starting from generalized linear models (GLM) with Poisson or binomial errors and piecewise parametric or grouped time survival models, we proposed their extension as FFANNs, allowing for non-linear and non-proportional effects of covariates. This led to Partial Logistic Artificial Neural Network (PLANN) discrete time models and their extension to the competing risks framework (PLANNCR). They can provide relevant indications on the underlying risk patterns, thus substantially contributing to the individual risk profiling. According to standard practice, penalized estimation was adopted to modulate model complexity. Statistical approaches for choosing the size of the weight decay term, based on the expected test error, were proposed. Namely, the Network Information Criterion (NIC), the ICOMP criterion and Non Linear Cross Validation (NLCV). In further developments model selection was performed according to a Bayesian extension or using Genetic Algorithms. Clinical examples in cancer on the evaluation of the shape of the cause specific hazard function for disease recurrence and the prognostic effect of tumor markers are provided. The talk is addressed to all researchers who are interested in the evaluation of the failure dynamics in survival data analysis according to the role of covariates. Therefore, the necessity of integrating the methodologies of biological, clinical and statistical research in the assessment of prognosis will be stressed looking to interdisciplinary applications.
Elia Mario Biganzoli. Born 18 April 1966, Tradate (VA). Faculty of Sciences-University of Milano, degree in Biology 1988 Summa Cum Laude. Post-graduate degree, PhD in Medical Statistics in 1994-1996 at University, Milan, Summa Cum Laude. 1989-94 Research fellow, Institute of Pharmacology, Faculty of Sciences - Giovanni Lorenzini Foundation, Milano. 1989-94 Consultant Researcher for drug discovery, assay development, high throughput screening and data analysis, at the Lepetit Research Center (Marion Merrel Dow, USA), Gerenzano (VA). 1995 - Senior Biostatistician, Unit of Medical Statistics and Biometry, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan. 2000 - 2007 Contract Professor of Biostatistics, Faculty of Statistics, University of Milano-Bicocca. 2000-2004 Executive Committee of the International Society for Clinical Biostatistics. 2004 - 2008 Task Force Leader Evaluation & Benchmarking, BIOPATTERN Network of Excellence FP6 project:" Computational Intelligence for Biopattern Analysis in Support of eHealthcare". 2005-2009 Board of the International Biometric Society, Italian Region - 2007- Associate Professor in Medical Statistics at the Faculty of Medicine of the University of Milan. 2009 Cofounder of the IEEE Neural Network Society Special Interest Group Biopattern.
He was responsible and participated to National and International projects with Associazione Italiana per la Ricerca sul Cancro, National Research Council-Polish Academy of Sciences, Italian Ministry of Health, Italian Ministry of the University and Research, European Commission. His main research fields concern statistical methods for survival analysis and biological assay development. He participated in the planning of diagnostic and prognostic studies in cancer, cardiovascular diseases, rheumatology, hematology and the analysis of their results with special interest on molecular biomarkers and bioprofiles. He developed statistical approaches for the extension of generalized linear models with artificial neural networks and splines for the flexible analysis of censored survival data. His main research focus with his group was to join biostatistics with biomedical informatics through multivariate analysis and pattern recognition approaches in oncology. Biostatistical consultant for the application of bioinformatics in drug discovery and development in industrial biotechnology and pharmaceutical research (Nerviano-MS, Italy; Themis-ICTA, France).
He is author of more than 130 scientific full publications on international journals, books, technical reports and international conference proceedings. He participated with his research group on Biostatistics for Bionformatics and Clinical Translational Research (BBCTR) of the Fondazione IRCCS Istituto Nazionale dei Tumori (INT) and University of Milano, to National and European consortia such as the (ISS-ACC) RNBIO project National Network for Bioinformatics, the FP6 Network of Excellence BIOPATTERN (FP6-2002-IST-1-N° 508803) involving more than 30 EU centers focused on biomedical informatics and the FP6 BIOPTRAIN Marie Curie Early Stage Training Fellowship. In particular, he was the leader of the Biostatistics for Bioinformatics Group of the RNBIO ISS ACC Oncology Bioinformatics Network. He is co-organiser of the EU Lifelong Learning Programme Erasmus Intensive Programme (IP) 2010-2013: "Interdisciplinary Approaches to Microarray Data Analysis" with the universities of Salerno, Napoli, Warwick and Helsinki. He was panel discussant with FDA delegates in the 3rd International Symposium on Integrated Biomarkers in Cardiovascular Diseases (Seattle, WA, USA / July 9-11, 2008). He was the main organizer of the 8th International Meeting on Computational Intelligence for Bioinformatics and Biostatistics CIBB 2011(Gargnano del Garda, BS Italy / June 30-July 2, 2011).He is in the Steering Committee of CIBB conferences.
"Predictive modeling and pattern analysis of high-dimension data with JMP Genomics/Clinical"
Speaker: Doug Robinson
Predictive modeling with high dimensional data sets presents a challenge as there are often too many predictors to serve as candidates. One strategy is to reduce the number of candidates with a statistical test such as ANOVA on the entire data set. However, this a priori filtering ignores the fact that some of these selected candidates may results from properties of the data set rather than the variable under question. With JMP Genomics or JMP Clinical from SAS, we will show how predictor reduction and candidate selection can be performed with each cross-validation iteration in a simple-to-use, point and click interface. Additionally, multiple models can be run concurrently and with visual results, you can easily determine the best model and candidate predictors for the data. We will also demonstrate correlation of microRNA and mRMA across all samples in the data set, with all possible combinations of the two marker types. This has particular value as most of the strong correlations are unrelated to the miRNA target and are likely downstream effects.
Douglas Robinson is an Applications Specialist for JMP Life Sciences software. Prior to joining the JMP Genomics team at SAS late in 2005, he worked in field applications support at Dharmacon and at Affymetrix, where he managed the Field Application Support team. Dr. Robinson has also held several post-doctoral positions in molecular biology and physiology. He received his Ph.D. in Physiology from Dartmouth College and his undergraduate B.S. is from Arizona State University.