Université Lille 1, FRANCE
Christophe Biernacki is professor of statistics at the University of Lille (France), scientific leader of the Inria Modal research team and also scientific head of the whole research Inria center of Lille. His research focuses on model-based classification and clustering. He is a specialist of dealing with complex data including continuous, categorical/ordinal/rank and interval/missing data. He co-developped the software MIXMOD for mixture models which is now a leading software in the domain.
Unifying Data Units and Models in (Co-)Clustering
Statisticians are already aware that any modelling process issue (exploration, prediction) is wholly data unit dependent, to the extend that it should be impossible to provide a statistical outcome without specifying the couple (unit,model). In this talk, this general principle is formalized with a particular focus in model-based clustering and co-clustering in the case of possibly mixed data types (continuous and/or categorical and/or counting features), being also the opportunity to revisit what the related data units are. Such a formalization allows to raise three important spots: (i) the couple (unit,model) is not identifiable so that different interpretations unit/model of the same whole modelling process are always possible; (ii) combining different “classical” units with different “classical” models should be an interesting opportunity for a cheap, wide and meaningful enlarging of the whole modelling process family designed by the couple (unit,model); (iii) if necessary, this couple, up to the non identifiability property, could be selected by any traditional model selection criterion. Some experiments on real data sets illustrate in detail practical beneficits from the previous three spots. It is a joint work with Alexandre Lourme (University of Bordeaux).
University of Glasgow, SCOTLAND
Professor of Statistics, one of his principal research themes is flexible methods of regression and he is the joint-author of a book on smoothing techniques. He is involved in a wide variety of environmental applications in this area, most recently focused on spatiotemporal data. Current external collaborators include the Scottish Environment Protection Agency and Shell Global Solutions. A second major research theme is on statistical models for three-dimensional shapes, with particular focus on the human face, from biological and surgical perspectives. He also has a strong interest in the use of graphics to communicate the uncertainty associated with data and statistical models. Adrian has played a variety of roles within the Royal Statistical Society, he has chaired the Committee of Professors of Statistics in the UK and he has held various editorial positions, including joint-editorship of the RSS journal, Applied Statistics.
Statistics with a human face
Three-dimensional surface imaging, through laser-scanning or stereo-photogrammetry, provides high-resolution data defining the surface shape of objects. Human faces are of particular interest and there are many biological and anatomical applications, including assessing the success of facial surgery and investigating the possible developmental origins of some adult conditions. An initial challenge is to structure the raw images by identifying features of the face. Ridge and valley curves provide a very good intermediate level at which to approach this, as these provide a good compromise between informative representations of shape and simplicity of structure. Some of the issues involved in analysing data of this type will be discussed and illustrated. Modelling issues include simple comparison of groups, the measurement of asymmetry and longitudinal patterns of shape change. This last topic is relevant at short scale in facial animation, medium scale in individual growth patterns, and very long scale in phylogenetic studies.
Johannes Kepler Universitat Linz, AUSTRIA
Associate Professor of Applied Statistics, she holds a Master and PhD in Applied Statistics from the Vienna University of Technology. Since her PhD she has been working on different aspects in finite mixture modeling. These aspects include theoretic issues such as the identifiability of mixtures of multinomial logit models, the flexible implementation of maximum likelihood estimation within the EM framework using R available in package flexmix and prior specification in Bayesian analysis to induce sparse solutions and to enable an identified mixtures of mixtures estimation. Bettina Grün is strongly involved with the R project and she is an former Editor of the R Journal and an ordinary member of the R Foundation. Bettina Grün also currently serves as Editor-in-Chief for the Journal of Statistical Software, an open-access outlet for publishing open-source statistical software.
Bayesian Model-Based Clustering with Flexible and Sparse Priors
Finite mixtures are a standard tool for clustering observations. However,
selecting the suitable number of clusters, identifying cluster-relevant
variables as well as accounting for non-normal shapes of the clusters are
still challenging issues in applications. Within a Bayesian framework we
indicate how suitable prior choices can help to solve these issues. We
achieve this considering mainly prior distributions that have the
characteristics that they are conditionally conjugate or can be
reformulated as hierarchical priors, thus allowing for simple estimation
using MCMC methods with data augmentation.
University of Cagliari, ITALY
Francesco Mola is Full Professor of Statistics and Vice Chancellor at the University of Cagliari, where he was Dean of the Department of Economics (2003-2007) and Dean of the Department of Business and Economics (2012-2015). He has got a Ph.D in Computational Statistics and Data Analysis at the University of Naples Federico II. His research interests cover various areas: Multivariate Data Analysis, Statistical Learning, Data Science and Computational Statistics. He has published more than seventy papers in international journals, conference proceedings, and revised monographs dealing with, among others, topics like advanced nonparametric regression and classification, causal inference, big data and image segmentation. He served in the council of the International Association for Statistical Computing (IASC, 2011-15) and is a member of the International Statistical Institute (ISI), International Society for Business and Industrial Statistics (ISBIS), International Federation of Classification Societies (IFCS) and Italian Statistical Society (SIS). He is the president elect of CLADAG (Classification and Data Analysis Group).
Grinding massive information into feasible statistics: current challenges and opportunities for data scientists
Massive amounts of data used to make quicker, better and more intelligent decisions to create business value are nowadays available for companies and organizations. Terms like big data, data science, analytics, artificial intelligence, machine learning etc., are very common in both academia and industry. All these areas of research are orientated towards answering the increasing demand for understanding trends and/or discovering patterns in data. Usually, collected data is massive and uncertain due to noise, incompleteness and inconsistency. The main goal of a statistician/data scientist is therefore to turn massive data into feasible information, the latter intended as able to describe efficiently an observed phenomenon, to gain indications about its future evolution as well as to provide useful insights for the ongoing decisional process. All these considerations lead towards arguing that the role of the statistician/data scientist considerably evolved in the latest years.
In my presentation, after a brief description of the scenario summarized above, I will discuss three examples/case studies concerning image validation, hotels’ reputation and social media popularity trying to give a contribution to the debate about turning the enormous amount of available data into feasible statistics. In all cases, ad-hoc but standard classification methods are used to obtain information that is extremely feasible and adds value to a decisional process.
University of Cambridge, UK
Sylvia Richardson is the Director of the MRC Biostatistics Unit and holds a Research Professorship in the University of Cambridge since 2012. Prior to this Sylvia held the Chair of Biostatistics in the Department of Epidemiology and Biostatistics at Imperial College London since 2000 and was formerly Directeur de Recherches at the French National Institute for Medical Research INSERM, where she held research positions for 20 years. In 2009, Sylvia was awarded the Guy Medal in Silver from the Royal Statistical Society and a Royal Society Wolfson Research Merit award. She is a Fellow of the Institute of Mathematical Statistics, of the International Society for Bayesian Analysis and of the Academy of Medical Sciences in 2016. She was recently nominated as President Elect of the Royal Statistical Society. Sylvia has worked extensively in many areas of biostatistics research and made important contributions to the statistical modelling of complex biomedical data, in particular from a Bayesian perspective. Her work has contributed to progress in epidemiological understanding and has covered spatial modelling and disease mapping, measurement error problems, mixture and clustering models as well as integrative analysis of observational data from different sources. Her recent research has focused on modelling and analysis of large data problems such as those arising in genomics. She is particularly interested in developing new analytical strategies for integrative and translational genomics, including statistical methodology for risk stratification, discovering disease subtypes, and large scale hierarchical analysis of high dimensional biomedical and multi-omics data.
Statistical challenges in the analysis of complex responses in biomedicine
To exploit better the structure of the rich sets of characteristics, such as clinical biomarkers, molecular profiles or detailed ontology records, that are currently being collected on large samples of healthy or diseased individuals, statistical models of the variations within and the interplay between different layers of data can be constructed. Generic Bayesian model building strategies and algorithms have been tailored for this purpose. In this talk, I will discuss three areas: implementing joint hierarchical modelling of a large number of responses and a large number of features to discover features associated with many responses ; analysing tree structured ontology data with application for finding the underlying genetic origin of rare diseases; and characterising network structures using fast Bayesian inference in large Gaussian graphical models. Common statistical issues of accounting for model uncertainty, ability to borrow information for retaining power and scalability of Bayesian computations will be highlighted. Modelling strategies and computations will be illustrated on case studies.