Go to JKU Homepage
Kurt Rothschild School of Economics and Statistics
What's that?

Institutes, schools, other departments, and programs create their own web content and menus.

To help you better navigate the site, see here where you are at the moment.

Research Seminar at the Institute of Applied Statistics

October, 20th 15:30 - Dr. Alejandra Avalos Pacheco:

“Multi-study Factor Regression Models for Large Complex Data with Applications to Nutritional Epidemiology and Cancer Genomics”

[Translate to Englisch:]
[Translate to Englisch:]

zoom link, opens an external URL in a new window

meeting ID: 937 6054 7545

password: 946296


Data-integration of multiple studies can be key to understand and gain knowledge in statistical research. However, such data present both biological and artifactual sources of variation, also known as covariate effects. Covariate effects can be complex, leading to systematic biases. In this talk I will present novel sparse latent factor regression (FR) and multi-study factor regression (MSFR) models to integrate such heterogeneous data. The FR model provides a tool for data exploration via dimensionality reduction and sparse low-rank covariance estimation while correcting for a range of covariate effects. MSFR are extensions of FR that enable us to jointly obtain a covariance structure that models the group-specific covariances in addition to the common component, learning covariate effects from the observed variables, such as the demographic information. I will discuss the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our approach provides a flexible methodology for sparse factor regression which is not limited to data with covariate effects. I will present several examples, with a focus on bioinformatics applications. We show the usefulness of our methods in two main tasks: (1) to give a visual representation of the latent factors of the data, i.e. an unsupervised dimension reduction task and (2) to provide a (i) supervised survival analysis, using the factors obtained in our method as predictions for the cancer genomic data; and (ii) dietary pattern analysis, associating each factor with a measure of overall diet quality related to cardiometabolic disease risk for a hispanic community health nutritional-data study.
Our results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality. The results of our analyses illustrate how failing to properly account for covariate effects can result in unreliable inference.


Time & date

October 20, 2022

15:30 - 17:00 PM

Add to my calendar


S2 Z74, Science Park 2