Séminaire Modélisation statistique
organisé par l'équipe Modélisation et contrôle
-
Allan Tucker
Supervised and Unsupervised Methods for Modelling Trajectories through the Disease Process
30 janvier 2018 - 14:00Salle de conférences IRMA
Abstract: In this talk I will explore issues with different methods for collecting and modelling clinical data. I will briefly discuss the advantages and disadvantages of cross-sectional and longitudinal studies, and the modelling of these types of data with the chief aim of forecasting disease progression whilst discovering subclasses of disease based on temporal aspects: This will include novel algorithms for identifying disease subclasses based upon different disease trajectories and disease subclasses based upon different disease dynamics where the process is inherently non-stationary. Finally, I will explore methods for integrating both cross-sectional and longitudinal data into probabilistic models that lever the advantages of both. -
Jairo Cugliari
A prediction interval for a function-valued forecast model
26 mars 2018 - 13:45Salle de conférences IRMA
Starting from the information contained in the shape of the load curves, we have proposed a flexible nonparametric function-valued forecast model called KWF (Kernel+Wavelet+Functional) well suited to handle nonstationary series. The predictor can be seen as a weighted average of futures of past situations, where the weights increase with the similarity between the past situations and the actual one. In addition, this strategy provides with a simultaneous multiple horizon prediction. These weights induce a probability distribution that can be used to produce bootstrap pseudo predictions. Prediction intervals are constructed after obtaining the corresponding bootstrap pseudo prediction residuals. We develop two propositions following directly the KWF strategy and compare it to two alternative ways coming from proposals of econometricians. They construct simultaneous prediction intervals using multiple comparison corrections through the control of the family wise error (FWE) or the false discovery rate. Alternatively, such prediction intervals can be constructed bootstrapping joint probability regions. In this work we propose to obtain prediction intervals for the KWF model that are simultaneously valid for the H prediction horizons that corresponds with the corresponding path forecast, making a connection between functional time series and the econometricians’ framework. -
Ghislain Durif
High Dimensional Classification with combined Adaptive Sparse PLS and Logistic Regression
4 juin 2018 - 15:00Salle de conférences IRMA
Abstract : The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection, which can be combined to constitute a powerful framework for classification, as well as data visualization and interpretation. However, current proposed combinations lead to unstable and non convergent methods due to inappropriate computational frameworks. We hereby propose a computationally stable and convergent approach for classification in high dimensional based on sparse Partial Least Squares (sparse PLS).
We start by proposing a new solution for the sparse PLS problem that is based on proximal operators for the case of univariate responses. Then we develop an adaptive version of the sparse PLS for classification, called logit-SPLS, which combines iterative optimization of logistic regression and sparse PLS to ensure computational convergence and stability. Our results are confirmed on synthetic and experimental data. In particular we show how crucial convergence and stability can be when cross-validation is involved for calibration purposes. Using gene expression data we explore the prediction of breast cancer relapse. We also propose a multicategorial version of our method, used to predict cell-types based on single-cell expression data. -
Torsten Hothorn
Transformation Forests
21 septembre 2018 - 14:00Salle de séminaires IRMA
Regression models for supervised learning problems with a continuous
response are commonly understood as models for the conditional mean of the
response given predictors. This notion is simple and therefore appealing
for interpretation and visualisation. Information about the whole
underlying conditional distribution is, however, not available from these
models. A more general understanding of regression models as models for
conditional distributions allows much broader inference from such models,
for example the computation of prediction intervals. Several random
forest-type algorithms aim at estimating conditional distributions, most
prominently quantile regression forests (Meinshausen, 2006, JMLR). We
propose a novel approach based on a parametric family of distributions
characterised by their transformation function. A dedicated novel
``transformation tree'' algorithm able to detect distributional changes is
developed. Based on these transformation trees, we introduce
``transformation forests'' as an adaptive local likelihood estimator of
conditional distribution functions. The resulting predictive distributions
are fully parametric yet very general and allow inference procedures, such
as likelihood-based variable importances, to be applied in a straightforward
way. The procedure allows general transformation models to be estimated
without the necessity of a priori specifying the dependency structure of
parameters. Applications include the computation of probabilistic
forecasts, modelling differential treatment effects, or the derivation of
counterfactural distributions for all types of response variables.
Technical report available from https://arxiv.org/abs/1701.02110
-
Emilie Kaufmann
New tools for Adaptive Testing and Applications to Bandit Problems
3 décembre 2018 - 14:00Salle de conférences IRMA
Abstract: I will introduce a general framework for sequential, adaptive testing of multiple composite hypotheses, that are possibly overlapping. This framework is motivated by several identification problems in multi-armed bandit models, whose applications range from (adaptive) A/B/C Testing to (adaptive) game tree exploration, a.k.a. Monte-Carlo Tree Search. I will first introduce a generic stopping rule for those tests, based on Generalized Likelihood Ratio statistics, and prove its correctness in some cases using new self-normalized concentration inequalities. I will then discuss the sample complexity of this stopping rule when coupled with a good sampling rule, that is the minimal number of samples needed before stopping the test. In particular, we will propose an optimal strategy for (epsilon)-best arm identification in a bandit model. If time allows, we will then discuss the price of relying on a best arm identification phase when the goal is to maximize rewards in a bandit model.