Abstract
There are many ways of inducing sparsity for high-dimensional regression, within this presentation we introduce three of the most common: coordinate sparsity, group sparsity, and sparse-group sparsity. Furthermore, we showcase usecases and provide a rule of thumb when each should be used.
Abstract
High-dimensional spaces are strange places, our intuition can often be misleading and our notions of distances uninformative. How can we circumvent these issues and develop useful statistical tools for high-dimensional datasets?
Further reading
Introduction to High-dimensional statistics (Giraud, 2021)
Presentation of "Variational Bayes for high-dimensional proportional hazards models with applications to gene expression variable selection"
Abstract
High-throughput sequencing has led to a wave of innovation in biomedical sciences, offering extraordinary opportunities for prognostic modelling and understanding diseases drivers. However, the high-dimensionality and heterogeneity of large-scale profiling data introduces considerable challenges. We propose an interpretable Bayesian proportional hazards model for prediction and variable selection, referred to as SVB. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining the useful features, providing excellent point estimates and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, where we identify genes with pre-existing biological interpretations.
Referenced Packages
Presented at: CMStatistics 2021.
The practical uses of spike-and-slab priors.
An introduction to Spike-and-Slab priors with applications to high-dimensional regression.
A presentation of the paper "Scalable Non-parametric sampling for Multimodal posteriors with the Posterior Bootstrap" by Edwin Fong et al., presented to the Imperial CSML reading group.
A presentation of the paper "Gaussian Processes for Survival Analysis" by Tamara Fernandez et al., presented to the Imperial CSML reading group.
We integrate radiomic and transcriptomic data from patients with ovariancancer using sparse canonical correlation analysis (sCCA). We demonstrate integration yields prognostic models with greater predictive accuracy in comparison to using radiomics features alone. However, integration does not provide greater predictive accuracy than transcriptomic data alone. Further, we examine network structures providing plausible relational pathways between genes and radiomic features.