Forschungsseminar
Wir laden alle Interessierten herzlich ein, unser Forschungsseminar zu besuchen.
Univ.-Prof. Mag.Dr. Andreas Futschik, öffnet eine externe URL in einem neuen Fenster, Univ.-Prof. Mag.Dr. Werner G. Müller, öffnet eine externe URL in einem neuen Fenster
Zoom Link zum Forschungsseminar, öffnet eine externe URL in einem neuen Fenster
Meeting-ID: 280 519 2121
Passwort: 584190
Institut für Angewandte Statistik
Forschungsseminar
Zeit
Donnerstags von 15:30 - 17:00
Ort
Science Park 2, Zwischengeschoß, Z74
Wintersemester 2024/25
-
03. Oktober - 1. Gerhart Bruckmann Vorlesung aus Statistik und Data Science
Univ.-Prof. Dr. Tatyana Krivobokova, Institut für Statistik und Operations Research, Universität Wien, Österreich: An extended latent factor framework for ill-posed generalised linear regression
Abstract:
The classical latent factor model for (generalised) ill-posed linear regression is extended by assuming that, up to an unknown orthogonal transformation, the features consist of subsets that are relevant and irrelevant to the response. Furthermore, a joint low-dimensionality is imposed only on the relevant features and the response variable. This framework not only allows for a comprehensive study of the partial-least-squares (PLS) algorithm under random design, but also sheds light on the performance of other regularisation methods that exploit sparsity or unsupervised projection. Moreover, we propose a novel iteratively-reweighted-partial-least-squares (IRPLS) algorithm for ill-posed generalised linear models and obtain its convergence rates working in the suggested framework. This is a joint work with Gianluca Finocchio.
-
10. Oktober - Prof. Dariusz Uciński, Ph.D., D.Sc, Institute of Control and Computation Engineering, University of Zielona Góra, Poland: Convex relaxation for optimum experimental design with correlated observations
Abstract:Spatiotemporal data occur in many fields such as air pollution or groundwater flow monitoring, and meteorology. Their collection is inevitably related to discrete spatial and temporal sampling of an inherently continuous system. This raises the question of how to locate a limited number of measurement sites so as the amount of information about the observed system be as high as possible. This is of special importance in parameter estimation of systems modelled by partial differential equations. A distinguishing feature of environmental data collection is the presence of correlations in the measurements from different sites and/or time instants. This is because deviations in the observed responses at different sites may be brought about by the same sources, e.g., weather fluctuations on the scale of the whole spatial region. But then the Fisher information matrix is no longer the sum of elemental information matrices stemming from single sites. As a result, powerful convex optimum experimental design theory cannot be directly applied and various heuristic approaches dominate construction of approximations to optimum designs. The aim of this talk is to demonstrate that, in spite of these difficulties, convex relaxed formulations can be invented, for which extremely efficient computational procedures can be set up fully exploiting the power of modern convex optimization algorithms.
In the first part of the talk, the trace of the covariance matrix of the weighted least-squares estimator is employed as the measure of the estimation accuracy. The pivotal role in the new convex relaxation proposed here is played by the decomposition of the noise into uncorrelated and correlated components. Necessary and sufficient optimality conditions are then formulated and optimal solutions are found via simplicial decomposition which alternates between updating the design weights using the well-known multiplicative algorithm and computing a closed-form solution to a linear programming problem.
In the second part of the talk, the setting is considered when the exact correlation structure may not be known exactly, so that the ordinary least squares method is supposed to be used for estimation and the determinant of the covariance matrix of the resulting estimator is the measure of estimation accuracy. This time, the relaxed formulation turns out to be non-convex, but this is overcome by application of the majorization-minimization algorithm. At each of its iterations, a convex tangent surrogate function that majorizes the original nonconvex design criterion is minimized using simplicial decomposition.
As the resulting relaxed solution in both the cases are measures on the set of candidate sites and not specific subsets of selected sensors, various sequential conversions to a nearly optimal subset of selected sensors are discussed. Simulation experiments are also reported to demonstrate that the proposed approaches are highly competitive with the traditional approaches.
-
7. November - Gregor Zens, PhD, Population and Just Societies (POPJUS) Program, International Institute for Applied Systems Analysis (IIASA): Bayesian Factor Models for Age-Specific Demographic Counts
Abstract:
Analyzing age-specific mortality, fertility, and migration patterns is a crucial task in statistical demography, with significant policy relevance. In practice, such analysis is challenging when studying a large number of subpopulations, due to small observation counts within groups and increasing heterogeneity between groups. To address these challenges, we develop a Bayesian factor model for the joint analysis of age-specific counts in many, potentially small, subpopulations. The proposed model uses smooth latent components to capture common age-specific patterns across subpopulations and encourages additional information sharing through a hierarchical prior. The model provides smoothed estimates of the latent demographic pattern in each subpopulation, allows testing for heterogeneity, and can be used to assess the impact of observed covariates on the demographic process. An in-depth case study of age-specific immigration flows to Austria, disaggregated by sex and 155 countries of origin, is discussed. Comparative analysis shows that the model outperforms commonly used benchmark frameworks in both in-sample imputation and out-of-sample predictive exercises. Extensions to dynamic settings are discussed as well. -
21. November - Priv.-Doz. Dr. Bilal Barakat, Founding Partner, benedex education and development research and consulting: Challenges and Opportunities of Model-based estimates for monitoring Sustainable Development Goal 4 on Education
Abstract:
International education statistics have long been based on official administrative data that was treated as observed fact. The demands of the indicator framework for Sustainable Development Goal 4 on Education (SDG4) have required the increasing acceptance of sample-based data sources, such as large-scale household surveys or learning assessments. Customary publication of "latest available" data points is no longer tenable given the variability and often contradictory signals from such sources. Facing similar challenges, the health sector successively developed and endorsed global estimates of infant and maternal mortality based on sophisticated statistical modelling. Building on these experiences, both in terms of methodology and acceptance by the global health statistics community, similar approaches have recently been adapted for education monitoring. In particular, in a first for the education sector, Bayesian estimates of school completion rates combining information from different survey and census sources have been endorsed as official SDG4 monitoring data, despite the complexity of the model and its lack of transparency for government stakeholders. I will discuss the interaction of statistical, institutional, and domain-specific challenges from an inside perspective, such as questions of information sharing between countries' estimates within a Bayesian framework via hyper-parameters, the lack of gold-standard "ground truth" data for calibration of estimates of source-specific bias. -
12. Dezember - Francesca Basini, PhD, University of Warwick, United Kingdom: Trajectory inference with neural potentials and neural diffusion bridges in single-cell differentiation
Abstract:
Cell differentiation is a fundamental process in developmental biology which studies how unspecialised stem cells become specialised ones. Modern next-generation sequencing allows for simultaneous measurement of a large number of gene expressions at the single-cell level resulting in large high-dimensional datasets. As a consequence, quantitative methods that provide insights into cell differentiation mechanisms in this high-dimensional gene space are in high demand and, despite the large efforts and tools available, modelling differentiating cells and their time evolution remains a topic of extensive investigation.
In this talk, I will present our novel method to infer cell trajectories by means of non-linear stochastic differential equations, associated to a quasi-potential landscape, in a reduced yet high-dimensional gene space.
We adopt an agnostic perspective by defining the potential and associated system dynamics using neural networks, within the framework of neural differential equations, which can handle settings where the state space is high-dimensional via efficient and accurate solvers. Moreover, a key benefit of our approach is that the neural network architecture provides flexibility while maintaining an analytical form of the potential function; this can be accessed at no extra computational cost to derive other quantities of interest such as the density law of the system.
Optimisation criteria to find the law of paths of the neural SDE is formulated according to the integrated loss on the path space on a suitable discrepancy measure between the observed data and the ones generated by the simulator model. In particular, we investigate the use of the regularised Wasserstein distance and the expected energy score. Two different modelling frameworks are considered: a time-independent neural potential and a time-dependent extension based on the notion of Doob's H-transform and the addition of neural diffusion bridges.
Finally, applications of our approach are provided for a number of artificial and real-world data examples, particularly on scRNA-seq data on early-stage mouse embryos and on the so-called reprogramming dataset for induced Pluripotent Stem Cells (iPSC). -
30. Jänner - Mag. Johannes Pritz, Leiter Strategisches Controlling, Wirtschaftskammer OÖ: Wirtschaft verstehen: Wirtschaftsstatistik als Basisinstrument für eine evidenzbasierte Wirtschaftspolitik
Vergangene Vorträge
Sommersemester 2024
-
13. Juni - Prof. em. Emmanuel Lesaffre, I-Biostat, KU Leuven, Belgium: The use of historical data in clinical trials: Does it pay off?
Abstract:
This non-technical review discusses the use of historical data in the design and analysis of randomized controlled trials using a Bayesian approach. The focus is on comparing the philosophy behind different approaches and practical considerations for their use. The two main approaches, i.e. the power prior and the meta-analytic-predictive prior, are illustrated using fictitious and real data sets. Such methods, which are known as dynamic borrowing methods, are becoming increasingly popular in pharmaceutical research because they may imply an important reduction in costs. In some cases, e.g. in pediatric studies, they may be indispensable to address the clinical research question. In addition to the two original approaches, this review also covers various extensions and variations of the methods. The usefulness and acceptance of the approaches by regulatory agencies is also critically evaluated. -
25. April - Prof. Asger Hobolth, Department of Mathematics, Aarhus University, Denmark: Phase-type distributions in mathematical population genetics: An emerging framework. (The talk is based on joint work with Iker Rivas-Gonzalez (Leipzig), Mogens Bladt (Copenhagen) and Andreas Futschik (Linz).)
Abstract:
A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the ‘phases’ in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this talk is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference. -
18. April - Univ.-Prof. Dr. Johann Bacher, Institut für Soziologie / Abteilung für Empirische Sozialforschung, Johannes Kepler University Linz, Austria: Datenverknüpfung als neue Herausforderung der Umfrageforschung
Abstract:
Mehrere gesellschaftliche und wissenschaftliche Entwicklungen stellen die Umfrageforschung vor neue Herausforderungen. Mediatisierung und Digitalisierung haben dazu geführt, dass neue Daten im großen Umfang ständig erzeugt werden. Hinzu kommen administrative Registerdaten, die zunehmend auch der Forschung zur Verfügung stehen. Schließlich werden regelmäßig wissenschaftliche Umfragen national und international durchgeführt. Für diese lässt sich ein Trend in Richtung einer web-basierten Umsetzung beobachten, was oft impliziert, dass nicht mehr alle interessierenden Themen in einer einzigen Befragung erhoben werden können. Die Verfügbarkeit einer Vielzahl von Daten einerseits und die Notwendigkeit kürzerer Befragungszeiten anderseits werfen die Frage auf, ob und wie Daten verknüpft werden können, um Forschungsfragen mit bereits vorhandenen Daten beantworten zu können.Im Vortrag werden Forschungsarbeiten zu zwei Methoden der Datenverknüpfung präsentiert. Zum einen wird über eine empirische Studie berichtet, in der für Österreich die Zustimmungsbereitschaft von Befragten zur Verknüpfung ihrer Daten untersucht wurde. Zum anderen werden die Ergebnisse eines Feldexperiments zur Datenfusion von zwei Umfragen dargestellt, bei dem auf Verfahren der multiplen Imputation zurückgegriffen wurde.
Die Ergebnisse zeigen, dass beide Methoden derzeit noch ihre Grenzen haben. Aufgrund der genannten Entwicklungen ist es aber notwendig, sich weiter und vielleicht intensiver als bisher mit ihnen zu beschäftigten.
-
11. April - Dr. Magdalena Muszynska-Spielauer, Institut für Angewandte Statistik, Johannes Kepler University Linz, Austria: Different dimensions of lifespan inequality
Abstract:
Philosophical concepts of health inequities, including normative judgments about the fairness of surviving to different ages, are rarely considered in demographic studies of longevity. The idea of this presentation is to explore some well-studied aspects of the measurement of health inequities and adopt them to the problem of measuring lifespan inequalities. The first paper adopts the capability approach to define deaths that we consider ethically problematic, i.e. premature mortality, and proposes to apply the standard poverty measures to quantify lifespan inequity simultaneously as the prevalence, depth and inequality of lifespan deprivation resulting from premature mortality. The proposed methods are applied to study premature mortality and lifespan inequity in the United States relative to other high-income countries in 1933-2019. Our results show that the high levels of premature mortality and lifespan inequity in the United States are not recent developments, but have been persistent challenges over several decades. In the second paper we introduce the formal differences between the inequality statistics of attainment and deprivation in the study of lifespan inequality. For this paper, we apply the Keyfitz notion that everybody dies prematurely. Based on the simple demographic model of resuscitation, we propose an indicator of inequality in lifespan deprivation. An empirical application focuses on the relationship between the statistics of attainments and deprivation according to the distributions of life table ages at death in high income countries of the Human Mortality Database in 1900-2021. We focus on the question of whether quantifying inequality using the relative inequality of attainment statistics and the relative inequality of deprivation statistics leads to different conclusions about the trends in inequality. We also highlight the effect of three major mortality shocks in the study years, i.e. the Spanish flu, the Second World War and COVID-19, on the two groups of statistics.
Wintersemester 2023/24
-
18. Jänner - Ao.Univ.-Prof. Dipl.-Ing. Dr.techn. Herwig Friedl, Institute of Statistics, Graz University of Technology: Hands-on Applications of Mixture models (Joint work with Dankmar Böhning, Bettina Grün, Sanela Omerovic, Elisabeth Reitbauer & Peter Scheibelhofer)
Abstract:
We will introduce this class of models by discussing three very different practical applications.
The first application is about estimating the hard-to-count size of a closed population. We consider uni-list approaches in which the count of identifications per unit is the basis of analysis. Unseen units have a zero count and do not occur in the sample leading to a zero-truncated setting. Due to various mechanisms one-inflation is often an occurring phenomena which can lead to seriously biased estimates of population size. The zero-truncated one-inflated and the one-inflated zero-truncated model is compared in terms of Horvitz-Thompson estimation of population size. The illustrative data is about the number of sightings of dice snakes in Graz.
In another study, we deal with black and white C-SAM images of wafer structures. The statistical analysis is based on the corresponding multimodal frequency histograms of the gray levels. The goal is to draw conclusions about both the quality of the wafers and the contrast of the images. A heterogeneous mixture of gamma densities together with a uniform distribution component is successfully used to enable such a dual defect analysis.
The last application deals with a mixture of generalized non-linear models for estimating the daily maximum gas consumption as a function of the outdoor temperature. For this purpose, we have implemented "flexmixNL" as an extension of the well-known R package "flexmix". This now allows the analysis by means of a homogeneous mixture of linear exponential families, where the means are modeled nonlinearly, here by a family of sigmoid functions. If we mix various components differently, this can be used to account for unobserved heterogeneity.
-
11. Jänner - Hofrat Mag. Dr. Gernot Filipp, Leiter Landesstatistik und Verwaltungscontrolling, Amt der Salzburger Landesregierung: „Grenzgebiete“ – Überschneidungen von Theorie und Praxis in der amtlichen Statistik
Abstract:
Der amtlichen Statistik eilt manchmal der Ruf eines trockenen und wenig spannenden Tätigkeitsbereiches nach. Die Aussage des Statistikers Tukey „The best thing about being a statistician, is that you get to play in everyone's backyard.“ trifft jedoch für den öffentlichen Sektor in besonderem Maße zu. So ist die Landesstatistik zentraler Informationsanbieter für die Landesverwaltung, die Landespolitik und die Landesbürger in allen gesellschaftspolitisch relevanten Themenbereichen. Die Faszination der amtlichen Statistik entsteht durch die Vielfalt der Fragestellungen mit denen man in Berührung kommt. Es werden aus zunächst scheinbar trockenen Zahlen und Daten interessante Fakten und Informationen generiert, um einen Mehrwert für die Nutzer der Daten zu schaffen. Für die Bearbeitung der verschiedenen Fragestellungen wird dabei auf unterschiedliche statistische Methoden zurückgegriffen. Im ersten Teil des Vortrags wird ein kurzer Einblick in die Vielfalt der Tätigkeiten der amtlichen Statistik gegeben, im zweiten Teil werden einige konkrete Beispiele der Anwendung statistischer Methoden in der amtlichen Statistik aufgezeigt. -
23. November - Paula Camelia Trandafir, Department of Statistics, Computer Science and Mathematics, Public University of Navarre: Age-specific spatio-temporal patterns of ovarian cancer mortality in Spain
Abstract:
Ovarian cancer stands as a prominent contributor to gynecological malignancy-related fatalities, with an estimated lifetime risk of occurrence in about 1 in every 50 to 70 women. Its highest incidence emerges among women aged 60 to 64 years, predominantly afflicting those over 50. Globally, the yearly tally includes approximately 204,449 newly diagnosed cases of ovarian cancer, constituting around 4% of all female cancers, with 124,860 deaths attributed to the disease (GLOBOCAN data).
Within the realm of epidemiological literature, a prevalent observation is the limited focus on spatial, temporal, or spatio-temporal analyses of ovarian cancer mortality, often without delving into age-specific breakdowns. This can potentially lead to conclusions that may lack precision. Our objective in this study is to delve into the temporal evolution of geographic patterns in ovarian cancer mortality rates, examining distinct age groups, across Spanish provinces spanning the period from 1989 to 2015. To achieve this, we will explore various autoregressive models. Model fitting and inference will be carried out using integrated nested Laplace approximations and employing an R code. -
16. November - Prof. Dr. Yarema Okhrin, Chair of Statistics, Faculty of Business and Economics, University of Augsburg: Computer-vision-based Bitcoin price forecasting with relevance-based clustering
Abstract:
This paper examines a novel AI-based computer vision approach for pattern recognition in financial time series classification. We transform past Bitcoin price sequences to Gramian Angular Field images and use them as input data for a convolutional neural network (CNN). We apply spectral relevance analysis to identify clusters of the decision behavior of the CNN. Clustering the images according to the associated relevance maps allows the comparison of cluster-based performances. We detect clusters with substantially higher predictive performance compared to the complete data set. The associated relevance matrices for each cluster represent favourable patterns for price prediction and are identified via the associated clusters.
Sommersemester 2023
-
15. Juni - Prof. Eiichi Isogai, Niigata University, Japan: "Sequential Estimation Problems", partially joint work with Andreas Futschik (Institut für angewandte Statistik, JKU)
Abstract:
Whenever a given level of estimation accuracy is desired without prior knowledge on dispersion, sequential estimation is the method of chice to achieve this goal. In this talk, I will introduce the audience to sequential estimation.
Then I will discuss challenges related to Sequential Point Estimation. Finally personal research on sequential fixed width confidence intervals will be covered.
This is partially joint work with Andreas Futschik.
-
27. April -Assoz.Univ-Prof. Mag. Dr. Susanne Saminger-Platz, Stellvertretende Institutsleiterin des Instituts für Mathematische Methoden in Medizin und Datenbasierter Modellierung (m3dm), JKU Linz: „On perturbing bivariate copulas and their dependence properties"
Abstract:
The relevance of copulas in dependence modeling originates from the fact that, due to Sklar's theorem, for (continuous) multivariate distributions the modeling of its univariate marginals and the dependence structure can be separated, where the latter can be represented by a copula. A copula may be seen as a multivariate distribution function with all univariate margins being uniformly distributed on [0,1]. Hence, if C is a copula, then it is the distribution function of a vector of dependent U(0,1) random variables; in case of independence, the corresponding copula being the product.
In literature one may find quite some different classes and families of copulas either deduced, e.g., from multivariate distributions or following different construction approaches like, e.g., Archimedean copulas or some patchwork and gluing techniques; as such also examplifying the many views one can have on this class of functions, its analytical, probabilistic, measure-theoretic, or also algebraic properties, as well as different fields of applications.
In this talk we will focus on bivariate copulas. The Fréchet-Hoeffding bounds W(x,y)=max{x+y-1,0} and M(x,y)=min{x,y} allowing, as extremal cases, to model the counter- or comonotone behaviour of a pair of continuous random variables, modeling types of complete dependence. We are interested in perturbing some basic dependence behaviour by some parametrized transformations and study under which conditions we obtain again copulas. An example of this type is the well-known family of Eyraud-Farley-Gumble-Morgenstern copulas as a perturbation of the product copula. We will discuss obtained families based on the Fréchet-Hoeffindg bounds and their properties. In a second part, we will turn to another class of perturbations of the product copula, namely polynomial bivariate copulas. We will present a nice characterization result for polynomial bivariate copulas of degree five and discuss some of the related dependence properties. -
20. April - Dr. Georg Zimmermann, Team Biostatistics and Big Medical Data, IDA Lab Salzburg, Paracelsus Medical University, Salzburg, Austria: "Handling ordinal outcomes in medical research"
Abstract:
In medical research rating scales for quality of life, clinical outcome after surgery, assessment of pain, and other patient-relevant aspects are frequently used. Mathematically, these are ordinal measurements, particularly if the number of categories is small (e.g., a 7-point qualitative scale from “no symptoms” (0) to “death” (6)). Hence, classical parametric methods cannot be applied, and nonparametric approaches should be considered as an alternative instead. However, providing recommendations regarding which method to use is sometimes difficult, due to the lack of systematic empirical comparisons of different approaches. Moreover, especially in applied research, classical “cookbook recipes” are widely used, which might be a barrier with respect to implementing more nuanced methodological considerations in practice. On top of that, there are still some settings of high practical relevance (e.g., covariate adjustment) which pose methodological challenges to statisticians. Therefore, the talk addresses those aspects by presenting some recent methodological developments in the field of nonparametric rank-based statistics, motivated by applications in medical research, especially in research on rare diseases.
-
23. März - 15:30 - 17:00 - S2 Z74 - Elham Yousefi, MSc, Zentrum für Medical Data Science, Medizinische Universität Wien: "Optimal Design Methods for Model Discrimination" (Dissertations-Defensio)
Link zum Abstract, öffnet eine Datei
Wintersemester 2022/23
-
26. Jänner - Ritabrata ‘Rito’ Dutta, University of Warwick, UK (joint work with Lorenzo Pachhiardi and Sherman Khoo.): Sampling Likelihood-Free ‘generalized' posteriors with Stochastic Gradient MCMC
Abstract:
We propose a framework for Bayesian Likelihood-Free Inference (LFI) based on Generalized Bayesian Inference. To define the generalized posterior, we use Scoring Rules (SRs), which evaluate probabilistic models given an observation. In LFI, we can sample from the model but not evaluate the likelihood; for this reason, we employ SRs with easy empirical estimators. Our framework includes novel approaches and popular LFI techniques (such as Bayesian Synthetic Likelihood) and enjoys posterior consistency in a well-specified setting when a strictly-proper SR is used (i.e., one whose expectation is uniquely minimized when the model corresponds to the data generating process). In general, our framework does not approximate the standard posterior; as such, it is possible to achieve outlier robustness, which we prove is the case for the Kernel and Energy Scores. Further, we show that our setup can utilise gradient based Markov chain Monte Carlo (MCMC) methods to sample from this proposed generalized posterior, hence making high dimensional parameter inference possible for models with intractable likelihood functions.
-
19. Jänner - Sebastian Fuchs, Universität Salzburg: Using dimension reduction for quantifying and estimating predictability and explainability in regression analysis
-
12. Jänner - Luca Gerardo-Giorda, JKU Linz: Differential equations meet data: quantifying uncertainty for strategy planning in Ecology and Disease.
Luca Gerardo-Giorda studied Mathematics at the University of Turin, and in 2002 received his Doctorate in Applied Mathematics from the University of Trento. He was awarded a Marie Curie Industry Fellowship at the Institut Francais du Petrole in 2003. After working on applied interdisciplinary research at institutions in Europe (University of Trento, Ecole Polytechnique Paris) and the USA (Emory), in 2014 he set up the group on Mathematical Modeling in Biosciences at BCAM (the Basque Center for Applied Mathematics in Bilbao), that he led until February 2020 when he joined Johannes Kepler University Linz. He is currently the head of the Institute for Mathematical Methods in Medicine and Data-Based Modeling at JKU, and group leader at the Johann Radon Institute for Computational and Applied Mathematics (RICAM) of the Austrian Academy of Sciences. An expert in biomedical modeling and simulation, he seeks quantitative answers to clinical problems, with the aim of providing medical doctors with innovative simulation tools to be efficiently used for in silico pathology assessment and in support of clinical decision making.
Abstract:
In the recent decades, the possibility to simulate complex problems popularised the use of computational models in support for the activity of medical doctors and life scientists. As an example, one of the aims of spatial ecology is to help public health authorities and environmental conservation agencies to take more informed decisions at the time of identifying, monitoring and countering invasive dynamics, be it an infectious disease in wildlife, or the spread of an exogenous species. An accurate computational model can be an efficient predictive tool on which building a proper intervention strategy for the challenge at hand.
In this direction, it is well recognized that quantifying uncertainty is essential for computational predictions to have any real value (as highlighted by the 2014 FDA guidance on use of computational simulation). As an example, the incorrect assumption of perfect knowledge of the model parameters hinders the prediction of relevant Quantities of Interest (QoI) and may result in choosing erroneous interventional strategies. Primary sources of uncertainties may result from input variability (aleatory/irreducible uncertainty), such as the initial conditions, or from a lack of knowledge (epistemic/reducible uncertainty), such as the modeling assumptions or the influence of yet unknown physical or biological phenomena.
Moreover, problems from biomedicine and life science are extremely complex and challenging from the modeling viewpoint. Typically, they are characterised by remarkable heterogeneities and multi-scale dynamics, both in space and time: a reliable predictive mathematical model should be able to soundly cope with these aspects. Unfortunately, more often than not, the available data for model calibration is very limited for a variety of reason, especially in the case of spatial ecology (scarcity of data, limited amounts of economic resources to collect them) or in the presence of a new, poorly known, disease.
In this talk I will present some studies we carried out in the recent years on the spread of invasive species and infectious diseases, where we quantify the uncertainty in the presence of scarce data by combining differential equations (be them ordinary or partial) with Generalised Dynamic Linear Models (in a Bayesian framework) or Polynomial Chaos.
-
December, 1st - Andrea Berghold, Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria: Randomization in Clinical Trials
Abstract:
Randomization is a crucial component of an experimental design in general and clinical trials in particular. Using adequate randomization methods is therefore an important prerequisite in conducting a clinical trial. Many procedures have been proposed for the random assignment of participants to treatment groups in clinical trials. Various restricted randomization techniques such as permuted block design, biased coin design, urn design or big stick design as well as covariate-adaptive and response-adaptive randomization can be found in the literature. I will discuss the performance of different restricted randomization techniques regarding their treatment balance behavior and allocation randomness.
However, it is not only important to have different techniques available but also to have suitable software to allow use of these techniques in practice. I will present a web-based randomization tool for multi-centre clinical studies (“Randomizer” – www.randomizer.at) which was developed by the Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria. This tool facilitates efficient management of the randomization process including allocation concealment, stratification, audit trails etc. and can also be used for simulation of different randomization designs. -
20. Oktober - 15:30 - 17:00 - S2 Z74 - Dr. Alejandra Avalos Pacheco:
“Multi-study Factor Regression Models for Large Complex Data with Applications to Nutritional Epidemiology and Cancer Genomics”
Abstract:
Data-integration of multiple studies can be key to understand and gain knowledge in statistical research. However, such data present both biological and artifactual sources of variation, also known as covariate effects. Covariate effects can be complex, leading to systematic biases. In this talk I will present novel sparse latent factor regression (FR) and multi-study factor regression (MSFR) models to integrate such heterogeneous data. The FR model provides a tool for data exploration via dimensionality reduction and sparse low-rank covariance estimation while correcting for a range of covariate effects. MSFR are extensions of FR that enable us to jointly obtain a covariance structure that models the group-specific covariances in addition to the common component, learning covariate effects from the observed variables, such as the demographic information. I will discuss the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our approach provides a flexible methodology for sparse factor regression which is not limited to data with covariate effects. I will present several examples, with a focus on bioinformatics applications. We show the usefulness of our methods in two main tasks: (1) to give a visual representation of the latent factors of the data, i.e. an unsupervised dimension reduction task and (2) to provide a (i) supervised survival analysis, using the factors obtained in our method as predictions for the cancer genomic data; and (ii) dietary pattern analysis, associating each factor with a measure of overall diet quality related to cardiometabolic disease risk for a hispanic community health nutritional-data study.
Our results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality. The results of our analyses illustrate how failing to properly account for covariate effects can result in unreliable inference.
Sommersemester 2022
-
23. Juni - 14:00 - 15:15 - S2 Z74 - Liana Jacobi:
“Posterior Manifolds over Hyperparameter Regions (and Joint Prior Parameter Dependence): Moving Beyond Localized Assessments of Prior Parameter Specifications in MCMC Inference”, joint with Andres Ramirez-Hassan, Jackson Kwok and Nhung Nghie
Abstract:
Prior parameter sensitivity has moved into the focus of prior robustness analysis in response to the increased use of Bayesian inference in applied work, in particular with the popularity of Markov chain Monte Carlo (MCMC) inference under conjugate priors. It is commonly investigated in terms of local or pointwise assessments, in the form of derivatives or multiple evaluations. As such it provides limited localized information about the impact of prior parameter specifications, with the scope further restricted due to analytical and computational complexities in most MCMC applications.
This paper introduces an approach based on the geometry of posterior statistics over hyperparameter regions (posterior manifolds) that encompasses and expands upon two common localized strategies to obtain more information about prior parameter dependence. The proposed estimation strategy is based on multiple point evaluations with Gaussian processes with efficient selection of evaluation points achieved via Active Learning, that is further complemented with derivative information via a recent Automatic Differentiation approach for MCMC output. The approach gives rise to formal measures that can quantify additional aspects of prior parameter dependence and uncover more complex dependencies across prior parameters that are particularly relevant in practical applications which often involve the setting of many location and precision parameters. The real data example investigates the impact of joint changes in prior demand parameter specifications on elasticity inference under a common multivariate demand framework for 5 main good groups using data from a recent virtual supermarket experiment. We identify and estimate sensitivity manifolds for the three-most sensitive (cross-)price and expenditure elasticities and show how conclusions regarding substitutionary versus complementary relationships as well as price sensitivity characteristics (normal versus inferior goods, elastic vs inelastic) can change across the prior parameter space -
19. Mai - Paul Hofmarcher, Department of Economics, Paris-Lodron-University Salzburg: Gaining Insights on US Senate Speeches Using a Time Varying Text Based Ideal Point Model
Abstract:
Estimating political positions of lawmakers has a long tradition in political science and usually lawmakers’ votes are used to quantify their political positions. But lawmakers also give speeches or press statements. In this work we present a time varying text based ideal point model (TV-TBIP) which allows to study political positions of lawmakers in a completely unsupervised way. In doing so, our model combines the class of topic models with ideal point models in a time-dynamic setting.
Our model is inspired by the idea of political framing, so that specific words or terms used when discussing a topic can convey political messages.
The insights of our model are twofold: Firstly, it allows to detect how political discussion of certain topics has changed over time, and secondly it estimates ideological positions of lawmakers on a party level. Using only the texts of Senate speeches, our model identifies US-senators along an interpretable progressive-to-moderate spectrum.
We apply our model to nearly 40 years of US Senate house discussions between 1981 and 2017.
-
12. Mai - 16:30 - 18:00 - S3 047 - Prof. Dr. Werner Brannath, University Bremen:A liberal type I error rate for studies in precision medicine
(joint work with Charlie Hillner and Kornelius Rohmeyer)Abstract:
We introduce a new multiple type I error criterion for clinical trials with multiple populations. Such trials are of interest in precision medicine where the goal is to develop treatments that are targeted to specific sub-populations defined by genetic and/or clinical biomarkers. The new criterion is based on the observation that not all type I errors are relevant to all patients in the overall population. If disjoint sub-populations are considered, no multiplicity adjustment appears necessary, since a claim in one sub-population does not affect patients in the other ones. For intersecting sub-populations we suggest to control the average multiple type error rate, i.e. the probably that a randomly selected patient will be exposed to an inefficient treatment. We call this the population-wise error rate, exemplify it by a number of examples and illustrate how to control it with an adjustment of critical boundaries or adjusted p-values. We furthermore define corresponding simultaneous confidence intervals. We finally illustrate the power gain achieved by passing from family-wise to population-wise error rate control with two simple examples and a recently suggest multiple testing approach for umbrella trials.
-
17. März, 15:30; S2 Z74, Science Park 2 - Stefan Rass, Lehrstuhl Secure Systems LIT Secure and Correct Systems Lab, JKU: On Privacy in Machine Learning by Plausible Deniability
Link zu denVortragsfolien ..., öffnet eine Datei
Abstract: When a machine learning model is trained from data, the data may be subject to security requirements and even be classified as sensitive. If the trained model is intended for use by untrusted parties, this raises the question of how much information about the training data is extractable from the machine learning model, once it is given away. The talk presents two results in this regard, based on the security notion of plausible deniability. We show that a model of finite size will retain a nonzero residual entropy if the training data has a size beyond a (model-dependent) threshold. Second, we show that for a certain class of models, and any artificially chosen training data, we can craft a topological norm that gives an error metric under which the training recovers exactly the given model. The order of quantifiers is what enables plausible deniability here, since we can, for any given model, claim this to have risen from an arbitrary training set that can have any distribution and can be completely unrelated to the original sensitive training data. We illustrate the method on examples from normal and logistic regression and some examples of neural networks and discuss the practical implications of these results.
Wintersemester 2021/22
-
4. November, 15:30 - Florian Meinfelder, Otto-Friedrich-Universität Bamberg: Propensity Score Matching and Statistical Matching
Abstract:
The potential outcome framework generates for a binary treatment variable a missing data pattern that bears resemblance to a data fusion situation, where two different data sources are stacked. The reason for the similarity regarding the missing data pattern is that either outcome under treatment or outcome under control is observed (but never both for obvious reasons). The classical approach under the Rubin Causal Model is to use a nearest neighbor technique called Propensity Score Matching (PSM) to estimate the average treatment effect on the treated (ATET). Data fusion is also referred to as ‘Statistical Matching’, and nearest neighbor matching techniques have indeed been a popular choice for data fusion problems as well, since statistical twins are identified on an individual basis. Recently, publications emerged where the causal inference method PSM was applied to data fusion problems. Within this talk we will investigate under which circumstances PSM can be a viable method for a data fusion scenario. -
21. Oktober, 15:30; MT 128, Science Park 1 - Petr Mazouch, Prague University of Economics and Business | VŠE: Data Quality in Economic and Demographic Statistics
Abstract: Statisticians use data from different data sources for building statistical models, computing analyses and constructing forecasts. Based on their results, economic subjects (companies, government and households) make decisions. Better models lead to better decisions. One of the critical assumptions of excellent and valuable statistical models is the high quality of inputs – statistical data. Regardless of the data source type, the requirements for the quality of statistical data are the same.
The first part of the presentation introduces data quality requirements. It discusses the level of fulfilment of these requirements at several examples from different social and economic statistics (labour statistics, SILC, household budget survey, national accounts). The second part focuses on demographic issues with a particular accent on covid-19 statistics. How does the pressure on data timeliness influence other aspects of covid-19 data quality? The final part uses the Bayesian approach for the assessment of the covid-19 data relevance. Join presentation with Jakub Fischer and Tomáš Karel.
-
11. November, 15:30 - Ulrike Schneider, TU Wien: The Geometry of Model Selection and Uniqueness of Lasso-Type Methods
Abstract: We consider estimation methods in the context of high-dimensional regression models such as the Lasso and SLOPE, defined as solutions to a penalized optimization problem. The geometric object relevant for our investigation is the polytope that is dual to to the unit ball of the penalizing norm. We show that which models are accessible by such a procedure depends on what faces of the polytope are intersected by the row span of the regressor matrix. Moreover, these geometric considerations allow to derive a criterion for the uniqueness of the estimator that is both necessary and sufficient. We illustrate this approach for Lasso and SLOPE with the unit cube and the sign permutahedron as relevant polytopes. Joint work with Patrick Tardivel (Université Bourgogne).
Sommersemester 2021
-
Online-Vortrag - 20. Mai, 17:15 - Ulrike Held, Department of Biostatistics, University of Zurich: Matching on treatment in observational research - what is the role of the matching algorithm?
Link zum Abstract, öffnet eine Datei
Zoom-Link zu den Online-Vorträgen, öffnet eine externe URL in einem neuen Fenster
-
Online-Vortrag - 22. April, 17:15 - Dr. Klaus Nordhausen, University of Jyväskylä, Finland: Blind source separation for multivariate spatial data
Zoom-Link zu den Online-Vorträgen, öffnet eine externe URL in einem neuen Fenster
Abstract:
Blind source separation has a long tradition for iid data and multivariate time series. Blind source separation methods for multivariate spatial observations have however not been considered yet much in the literature. We suggest therefore a blind source separation model for spatial data and show how the latent components can be estimated using two or more scatter matrices. The statistical properties and merits of these estimators are derived and verified in simulation studies. A real data example illustrates the method.
-
Online-Vortrag - 25. März - Dr. Matt Sutton, QUT, Australia: Reversible Jump PDMP Samplers for Variable Selection
Zoom-Link zu den Online-Vorträgen, öffnet eine externe URL in einem neuen Fenster
Abstract:
A new class of Markov chain Monte Carlo (MCMC) algorithms, based on simulating piecewise deterministic Markov processes (PDMPs), have recently shown great promise: they are non-reversible, can mix better than standard MCMC algorithms, and can use subsampling ideas to speed up computation in big data scenarios. However, current PDMP samplers can only sample from posterior densities that are differentiable almost everywhere, which precludes their use for model choice. Motivated by variable selection problems, we show how to develop reversible jump PDMP samplers that can jointly explore the discrete space of models and the continuous space of parameters. Our framework is general: it takes any existing PDMP sampler and adds two types of trans-dimensional moves that allow for the addition or removal of a variable from the model. We show how the rates of these trans-dimensional moves can be calculated so that the sampler has the correct invariant distribution. Simulations show that the new samplers can mix better than standard MCMC algorithms. Our empirical results show they are also more efficient than gradient-based samplers that avoid model choice through use of continuous spike-and-slab priors which replace a point mass at zero for each parameter with a density concentrated around zero.
Wintersemester 2020/21
-
Online-Vortrag - 28. Jänner - Ulrike Schneider, TU Wien:
Zoom-Link zu den Online-Vorträgen, öffnet eine externe URL in einem neuen Fenster
-
Online-Vortrag - 21. Jänner - Lisa Ehrlinger & Florian Sobieczky, Software Competence Center Hagenberg: A Rendez-Vous in Data Science: Machine Learning meets Statistics
Zoom-Link zu den Online-Vorträgen, öffnet eine externe URL in einem neuen Fenster
Abstract:
The talk covers several typical challenges from “Data Science” arising in research
projects at the Software Competence Center Hagenberg (SCCH). Classical statistics
as well as modern complex machine learning methods, such as neural networks, are
applied to real-world use cases from industry.
In the first part, a short presentation of SCCH as an institution for applied research
is given, which is particularly interesting for students with an interest in a master or
PhD thesis on practical problems.
The second part is a summary of various projects involving real-world data with a
focus on recurring statistical problems from manufacturing scenarios. In particular,
methods related to anomaly detection, diagnosis and prediction using machine
learning methods are discussed with some care given to the black-box stigma of typical
modern machine learning methods. The presentation is intended to identify classical
methods and open research questions from statistics relevant for approaches taken by
SCCH’s strategy on predictive maintenance.
*SCCH – Software Competence Center Hagenberg
**FAW - Institute for Application-oriented Knowledge Processing der JKU -
Online-Vortrag - 19. November - Zsolt Lavicza & Martin Andre, Johannes Kepler University in Linz & Universität Innsbruck: Technology changing statistics education: Defining possibilities, opportunities and obligations.
Folien des Vortrages, öffnet eine Datei in einem neuen Fenster
Abstract:
In our talk, we will online some educational research activities within the Linz School of Education related to technology developments and statistics education. Afterwards, we will discuss our work on introducing statistics concepts in schools and how statistics teaching can be connected to sustainable development with real data for students in schools. In particular, we will discuss that statistics is becoming crucial in our current data-driven society to explore numerous phenomena that are too complex to comprehend without exploring and visualising data. Citizens need to understand statistics about issues concerning essential parts of their lives such as the spread of a pandemic or climate change in order to responsibly participate in a prosperous development of our civilization. With our research projects we try to find out more about young students’ intuitive approaches to statistics when visually analysing data. We found that certain kinds of data visualisations are especially capable to provoke reasoning of statistical concepts such as ideas of centre, spread and covariation. Based on these intuitive visual approaches to statistics, another aspect of our design-based research projects is concerned with statistical modelling processes. We developed a learning trajectory where middle school students were engaged in analysing real-world data to explore sustainable development of various countries and to build a model for this phenomenon. Results show that students’ statistical investigative learning processes should feature active participation in constructing knowledge of formal statistical concepts; and students should adopt and fit their intuitive knowledge to formal concepts using methods of visual data analyses. We will outline some diverse opportunities to foster students’ intuitive understanding of statistics and sustainable development issues simultaneously.
Zoom-Link zu den online-Vorträgen, öffnet eine externe URL in einem neuen Fenster
-
Online-Vortrag - 12. November - Irene Tubikanec, Johannes Kepler University in Linz: Approximate Bayesian computation for stochastic differential equations with an invariant distribution
Folien des Vortrages , öffnet eine Datei in einem neuen Fenster
Abstract:
Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to be an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise. First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler–Maruyama discretization) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterized by an invariant distribution and for which a measure-preserving numerical method can be derived.zoom-link zum online Vortrag, öffnet eine externe URL in einem neuen Fenster
-
Online-Vortrag - 5. November - Alex Kowarik, Statistics Austria: COVID-19 Prävalenzstudie - War die Stichprobe groß genug? 3.000 Marsmännchen, Ergebnisse und mehr
Abstract:
Im November wird bereits zum dritten Mal eine Stichprobenerhebung zur Bestimmung der COVID-19 Prävalenz durchgeführt. Der Vortrag soll die methodischen Aspekte Stichprobenziehung, Gewichtung und Fehlerrechnung dieser Erhebungen beleuchten.Folien des Vortrages, öffnet eine Datei
Für Zugangsdaten zu diesem online-talk wenden Sie sich bitte an Milan Stehlik
Sommersemester 2020
-
Online-Vortrag - 18. Juni - Torsten Hothorn, Universität Zürich, Schweiz: Understanding and Applying Transformation Models
Für Zugangsdaten zu diesem online-talk wenden Sie sich bitte an Markus Hainy
Wintersemester 2019/20
-
23. Jänner 2020
Peter Filzmoser, Technische Universität Wien
Robust and sparse k-means clustering in high dimension
-
05. Dezember 2019
Hao Wang, Jilin University, Changchun
Dependence structure between Chinese Shanghai and Shenzhen stock market based on copulas and cluster analysis
-
28. November 2019
Haipeng Li, CAS-MPG, Shanghai
Supervised learning for analyzing large-scale genome-wide DNA polymorphism data
-
07. November 2019
Günter Pilz, Johannes Kepler Universität Linz
Statistik ist ein Segen für die Menschheit
-
31. Oktober 2019
Martin Wolfsegger, Takeda Pharmaceutical Company Ltd.
Some likely useful thoughts on prescription drug-use-related software supporting personalized dosing regimen
Alexander Bauer, Takeda Pharmaceutical Company Ltd.
Evaluation of drug combinations
-
10. Oktober 2019
Leonardo Grilli, University of Florence
Multiple imputation and selection of predictors in multilevel models for analysing the relationship between student ratings and teacher beliefs and practices
Sommersemester 2019
-
23. Mai 2019
Siegfried Hörmann, TU Graz: ANOVA for functional time series data: when there is dependence between groups -
9. Mai 2019
Markus Hainy, Johannes Kepler Universität Linz: Optimal Bayesian design for models with intractable likelihoods via supervised learning
methods -
11. April 2019
Dominik Schrempf, Eötvös Loránd University in Budapest, Ungarn: Phylogenetic incongruences - opportunities to improve the reconstruction of a dated tree of life -
4. April 2019
Antony Overstall, University of Southampton, UK: Bayesian design for physical models using computer experiments -
14. März 2019
Florian Frommlet, Medizinische Universität Wien: Deep Bayesian Regression -
14. März 2019. Achtung, Beginn: 13:45
Thomas Petzoldt, TU Dresden, Deutschland: Identification of distribution components from antibiotic resistance data - Opportunities and challenges
Wintersemester 2018/19
-
17. Jänner 2019
Harry Haupt, Universität Passau, Deutschland: Modeling spatial components for complexly associated urban data -
22. November 2018 (Achtung, Mittwoch 15:30, S3 048)
Hirohisa Kishino, University of Tokyo, Japan: Bridging molecular evolution and phenotypic evolution -
15. November 2018
Helmut Küchenhoff, Ludwig-Maximilians-Universität München, The analysis of voter transitions in the Bavarian state election 2018 using data from different sources: a teaching research project conducted by three Bavarian universities -
8. November 2018
Efstathia Bura, TU Wien: Least Squares and ML Estimation Approaches of the Sufficient Reduction for Matrix Valued Predictors -
25. Oktober 2018
Erindi Allaj: Volatility measurement in presence of high-frequency data -
11. Oktober 2018
David Gabauer, JKU Linz, Austria: To Be or Not to Be’ a Member of an Optimum Currency Area?
Sommersemester 2018
-
24. Mai 2018
Carsten Wiuf, University of Copenhagen, Dänemark: A simple method to aggregate p-values without a priori grouping. -
24. Mai 2018
Pavlina Jordanova, University of Shumen, Bulgaria: On “multivariate” modifications of Cramer Lundberg risk model. -
26. April 2018
Juan M. Rodríguez-Díaz, Universidad de Salamanca, Spanien: Design optimality in multiresponse models with double covariance structure. -
19. April 2018
Robert Breitenecker, Johannes Kepler Universität Linz: Spatial Heterogeneity in Entrepreneurship Research: An application of Geographically Weighted Regression. -
15. März 2018
Andreas Mayr, Friedrich-Alexander-University Erlangen-Nürnberg, Germany: An introduction to boosting distributional regression. -
28. Juni 2018
Gangaram S. Ladde, University of South Florida, USA: Energy/Lyapunov Function Method and Stochastic Mathematical Finance
Wintersemester 2017/18
-
25. January 2018
Thomas Kneib, Georg-August-Universität Göttingen: A Lego System for Building Structured Additive Distributional Regression Models with Tensor Product Interactions -
7. December 2017
Franz König, Medizinische Universität Wien: Optimal rejection regions for multi-arm clinical trials -
9. November 2017
Henrique Teotonio, Institut de Biologie de l'École Normale Supérieure, Paris: Inferring natural selection and genetic drift in evolution experiments -
19. October 2017
Lenka Filová, Comenius University in Bratislava: Optimal Design of Experiments in R -
12. October 2017
Elisa Perrone, Massachusetts Institute of Technology, Cambridge, MA (USA): Discrete copulas for weather forecasting: theoretical and practical aspects