Go to JKU Homepage
Institute of Applied Statistics
What's that?

Institutes, schools, other departments, and programs create their own web content and menus.

To help you better navigate the site, see here where you are at the moment.

Research Seminar

Research Seminar

We kindly welcome all interested people to participate in our research seminar.
Univ.-Prof. Mag. Dr. Andreas Futschik & Univ.-Prof. Mag. Dr. Werner G. Müller

Institute of Applied Statistics

Research Seminar


Thursdays from 15:30 until 17:00


Science Park 2, Intermediate Storey, Z74.

Summer Term 2024

  1. April 25th - Prof. Asger Hobolth, Department of Mathematics, Aarhus University, Denmark: Phase-type distributions in mathematical population genetics: An emerging framework. (The talk is based on joint work with Iker Rivas-Gonzalez (Leipzig), Mogens Bladt (Copenhagen) and Andreas Futschik (Linz).)

    A phase-type distribution is the time to absorption in a continuous- or discrete-time Markov chain. Phase-type distributions can be used as a general framework to calculate key properties of the standard coalescent model and many of its extensions. Here, the ‘phases’ in the phase-type distribution correspond to states in the ancestral process. For example, the time to the most recent common ancestor and the total branch length are phase-type distributed. Furthermore, the site frequency spectrum follows a multivariate discrete phase-type distribution and the joint distribution of total branch lengths in the two-locus coalescent-with-recombination model is multivariate phase-type distributed. In general, phase-type distributions provide a powerful mathematical framework for coalescent theory because they are analytically tractable using matrix manipulations. The purpose of this talk is to explain the phase-type theory and demonstrate how the theory can be applied to derive basic properties of coalescent models. These properties can then be used to obtain insight into the ancestral process, or they can be applied for statistical inference.

  2. Juni 13th - Prof. em. Emmanuel Lesaffre, I-Biostat, KU Leuven, Belgium: The use of historical data in clinical trials: Does it pay off?

    This non-technical review discusses the use of historical data in the design and analysis of randomized controlled trials using a Bayesian approach. The focus is on comparing the philosophy behind different approaches and practical considerations for their use. The two main approaches, i.e. the power prior and the meta-analytic-predictive prior, are illustrated using fictitious and real data sets. Such methods, which are known as dynamic borrowing methods, are becoming increasingly popular in pharmaceutical research because they may imply an important reduction in costs. In some cases, e.g. in pediatric studies, they may be indispensable to address the clinical research question. In addition to the two original approaches, this review also covers various extensions and variations of the methods. The usefulness and acceptance of the approaches by regulatory agencies is also critically evaluated.

Previous Talks

Summer Term 2024

  1. April 18th - Univ.-Prof. Dr. Johann Bacher, Institut für Soziologie / Abteilung für Empirische Sozialforschung, Johannes Kepler University Linz, Austria: Datenverknüpfung als neue Herausforderung der Umfrageforschung
    Mehrere gesellschaftliche und wissenschaftliche Entwicklungen stellen die Umfrageforschung vor neue Herausforderungen. Mediatisierung und Digitalisierung haben dazu geführt, dass neue Daten im großen Umfang ständig erzeugt werden. Hinzu kommen administrative Registerdaten, die zunehmend auch der Forschung zur Verfügung stehen. Schließlich werden regelmäßig wissenschaftliche Umfragen national und international durchgeführt. Für diese lässt sich ein Trend in Richtung einer web-basierten Umsetzung beobachten, was oft impliziert, dass nicht mehr alle interessierenden Themen in einer einzigen Befragung erhoben werden können. Die Verfügbarkeit einer Vielzahl von Daten einerseits und die Notwendigkeit kürzerer Befragungszeiten anderseits werfen die Frage auf, ob und wie Daten verknüpft werden können, um Forschungsfragen mit bereits vorhandenen Daten beantworten zu können.

    Im Vortrag werden Forschungsarbeiten zu zwei Methoden der Datenverknüpfung präsentiert. Zum einen wird über eine empirische Studie berichtet, in der für Österreich die Zustimmungsbereitschaft von Befragten zur Verknüpfung ihrer Daten untersucht wurde. Zum anderen werden die Ergebnisse eines Feldexperiments zur Datenfusion von zwei Umfragen dargestellt, bei dem auf Verfahren der multiplen Imputation zurückgegriffen wurde.

    Die Ergebnisse zeigen, dass beide Methoden derzeit noch ihre Grenzen haben. Aufgrund der genannten Entwicklungen ist es aber notwendig, sich weiter und vielleicht intensiver als bisher mit ihnen zu beschäftigten.

  2. April 11th - Dr. Magdalena Muszynska-Spielauer, Institut für Angewandte Statistik, Johannes Kepler University Linz, Austria: Different dimensions of lifespan inequality

    Philosophical concepts of health inequities, including normative judgments about the fairness of surviving to different ages, are rarely considered in demographic studies of longevity. The idea of this presentation is to explore some well-studied aspects of the measurement of health inequities and adopt them to the problem of measuring lifespan inequalities. The first paper adopts the capability approach to define deaths that we consider ethically problematic, i.e. premature mortality, and proposes to apply the standard poverty measures to quantify lifespan inequity simultaneously as the prevalence, depth and inequality of lifespan deprivation resulting from premature mortality. The proposed methods are applied to study premature mortality and lifespan inequity in the United States relative to other high-income countries in 1933-2019. Our results show that the high levels of premature mortality and lifespan inequity in the United States are not recent developments, but have been persistent challenges over several decades. In the second paper we introduce the formal differences between the inequality statistics of attainment and deprivation in the study of lifespan inequality. For this paper, we apply the Keyfitz notion that everybody dies prematurely. Based on the simple demographic model of resuscitation, we propose an indicator of inequality in lifespan deprivation. An empirical application focuses on the relationship between the statistics of attainments and deprivation according to the distributions of life table ages at death in high income countries of the Human Mortality Database in 1900-2021. We focus on the question of whether quantifying inequality using the relative inequality of attainment statistics and the relative inequality of deprivation statistics leads to different conclusions about the trends in inequality. We also highlight the effect of three major mortality shocks in the study years, i.e. the Spanish flu, the Second World War and COVID-19, on the two groups of statistics.

Winter Term 2023/24

  1. January, 18th - Ao.Univ.-Prof. Dipl.-Ing. Dr.techn. Herwig Friedl, Institute of Statistics, Graz University of Technology: Hands-on Applications of Mixture models (Joint work with Dankmar Böhning, Bettina Grün, Sanela Omerovic, Elisabeth Reitbauer & Peter Scheibelhofer)

    We will introduce this class of models by discussing three very different practical applications.

    The first application is about estimating the hard-to-count size of a closed population. We consider uni-list approaches in which the count of identifications per unit is the basis of analysis. Unseen units have a zero count and do not occur in the sample leading to a zero-truncated setting. Due to various mechanisms one-inflation is often an occurring phenomena which can lead to seriously biased estimates of population size. The zero-truncated one-inflated and the one-inflated zero-truncated model is compared in terms of Horvitz-Thompson estimation of population size. The illustrative data is about the number of sightings of dice snakes in Graz.

    In another study, we deal with black and white C-SAM images of wafer structures. The statistical analysis is based on the corresponding multimodal frequency histograms of the gray levels. The goal is to draw conclusions about both the quality of the wafers and the contrast of the images. A heterogeneous mixture of gamma densities together with a uniform distribution component is successfully used to enable such a dual defect analysis.

    The last application deals with a mixture of generalized non-linear models for estimating the daily maximum gas consumption as a function of the outdoor temperature. For this purpose, we have implemented "flexmixNL" as an extension of the well-known R package "flexmix". This now allows the analysis by means of a homogeneous mixture of linear exponential families, where the means are modeled nonlinearly, here by a family of sigmoid functions. If we mix various components differently, this can be used to account for unobserved heterogeneity.

  2. January, 11th - Hofrat Mag. Dr. Gernot Filipp, Leiter Landesstatistik und Verwaltungscontrolling, Amt der Salzburger Landesregierung:
    „Grenzgebiete“ – Überschneidungen von Theorie und Praxis in der amtlichen Statistik

    Der amtlichen Statistik eilt manchmal der Ruf eines trockenen und wenig spannenden Tätigkeitsbereiches nach. Die Aussage des Statistikers Tukey „The best thing about being a statistician, is that you get to play in everyone's backyard.“ trifft jedoch für den öffentlichen Sektor in besonderem Maße zu. So ist die Landesstatistik zentraler Informationsanbieter für die Landesverwaltung, die Landespolitik und die Landesbürger in allen gesellschaftspolitisch relevanten Themenbereichen. Die Faszination der amtlichen Statistik entsteht durch die Vielfalt der Fragestellungen mit denen man in Berührung kommt. Es werden aus zunächst scheinbar trockenen Zahlen und Daten interessante Fakten und Informationen generiert, um einen Mehrwert für die Nutzer der Daten zu schaffen. Für die Bearbeitung der verschiedenen Fragestellungen wird dabei auf unterschiedliche statistische Methoden zurückgegriffen. Im ersten Teil des Vortrags wird ein kurzer Einblick in die Vielfalt der Tätigkeiten der amtlichen Statistik gegeben, im zweiten Teil werden einige konkrete Beispiele der Anwendung statistischer Methoden in der amtlichen Statistik aufgezeigt.

  3. November 16th - Prof. Dr. Yarema Okhrin, Chair of Statistics, Faculty of Business and Economics, University of Augsburg: Computer-vision-based Bitcoin price forecasting with relevance-based clustering
    This paper examines a novel AI-based computer vision approach for pattern recognition in financial time series classification. We transform past Bitcoin price sequences to Gramian Angular Field images and use them as input data for a convolutional neural network (CNN). We apply spectral relevance analysis to identify clusters of the decision behavior of the CNN. Clustering the images according to the associated relevance maps allows the comparison of cluster-based performances. We detect clusters with substantially higher predictive performance compared to the complete data set. The associated relevance matrices for each cluster represent favourable patterns for price prediction and are identified via the associated clusters.

  4. November, 23rd - Paula Camelia Trandafir, Department of Statistics, Computer Science and Mathematics, Public University of Navarre:
    Age-specific spatio-temporal patterns of ovarian cancer mortality in Spain

    Ovarian cancer stands as a prominent contributor to gynecological malignancy-related fatalities, with an estimated lifetime risk of occurrence in about 1 in every 50 to 70 women. Its highest incidence emerges among women aged 60 to 64 years, predominantly afflicting those over 50. Globally, the yearly tally includes approximately 204,449 newly diagnosed cases of ovarian cancer, constituting around 4% of all female cancers, with 124,860 deaths attributed to the disease (GLOBOCAN data).
    Within the realm of epidemiological literature, a prevalent observation is the limited focus on spatial, temporal, or spatio-temporal analyses of ovarian cancer mortality, often without delving into age-specific breakdowns. This can potentially lead to conclusions that may lack precision. Our objective in this study is to delve into the temporal evolution of geographic patterns in ovarian cancer mortality rates, examining distinct age groups, across Spanish provinces spanning the period from 1989 to 2015. To achieve this, we will explore various autoregressive models. Model fitting and inference will be carried out using integrated nested Laplace approximations and employing an R code.

Summer Term 2023

  1. June, 15th - Prof. Eiichi Isogai, Niigata University, Japan: "Sequential Estimation Problems", partially joint work with Andreas Futschik (Institut für angewandte Statistik, JKU)


    Whenever a given level of estimation accuracy is desired without prior knowledge on dispersion, sequential estimation is the method of chice to achieve this goal. In this talk, I will introduce the audience to sequential estimation.

    Then I will discuss challenges related to Sequential Point Estimation. Finally personal research on sequential fixed width confidence intervals will be covered.

    This is partially joint work with Andreas Futschik.

  2. April 27th - Assoz.Univ-Prof. Mag. Dr. Susanne Saminger-Platz, Stellvertretende Institutsleiterin des Instituts für Mathematische Methoden in Medizin und Datenbasierter Modellierung (m3dm), JKU Linz: „On perturbing bivariate copulas and their dependence properties"


    The relevance of copulas in dependence modeling originates from the fact that, due to Sklar's theorem, for (continuous) multivariate distributions the modeling of its univariate marginals and the dependence structure can be separated, where the latter can be represented by a copula. A copula may be seen as a multivariate distribution function with all univariate margins being uniformly distributed on [0,1]. Hence, if C is a copula, then it is the distribution function of a vector of dependent U(0,1) random variables; in case of independence, the corresponding copula being the product.

    In literature one may find quite some different classes and families of copulas either deduced, e.g., from multivariate distributions or following different construction approaches like, e.g., Archimedean copulas or some patchwork and gluing techniques; as such also examplifying the many views one can have on this class of functions, its analytical, probabilistic, measure-theoretic, or also algebraic properties, as well as different fields of applications.

    In this talk we will focus on bivariate copulas. The Fréchet-Hoeffding bounds W(x,y)=max{x+y-1,0} and M(x,y)=min{x,y} allowing, as extremal cases, to model the counter- or comonotone behaviour of a pair of continuous random variables, modeling types of complete dependence. We are interested in perturbing some basic dependence behaviour by some parametrized transformations and study under which conditions we obtain again copulas. An example of this type is the well-known family of Eyraud-Farley-Gumble-Morgenstern copulas as a perturbation of the product copula. We will discuss obtained families based on the Fréchet-Hoeffindg bounds and their properties. In a second part, we will turn to another class of perturbations of the product copula, namely polynomial bivariate copulas. We will present a nice characterization result for polynomial bivariate copulas of degree five and discuss some of the related dependence properties.


  3. April, 20th - Dr. Georg Zimmermann, Team Biostatistics and Big Medical Data, IDA Lab Salzburg, Paracelsus Medical University, Salzburg, Austria: "Handling ordinal outcomes in medical research"


    In medical research rating scales for quality of life, clinical outcome after surgery, assessment of pain, and other patient-relevant aspects are frequently used. Mathematically, these are ordinal measurements, particularly if the number of categories is small (e.g., a 7-point qualitative scale from “no symptoms” (0) to “death” (6)). Hence, classical parametric methods cannot be applied, and nonparametric approaches should be considered as an alternative instead. However, providing recommendations regarding which method to use is sometimes difficult, due to the lack of systematic empirical comparisons of different approaches. Moreover, especially in applied research, classical “cookbook recipes” are widely used, which might be a barrier with respect to implementing more nuanced methodological considerations in practice. On top of that, there are still some settings of high practical relevance (e.g., covariate adjustment) which pose methodological challenges to statisticians. Therefore, the talk addresses those aspects by presenting some recent methodological developments in the field of nonparametric rank-based statistics, motivated by applications in medical research, especially in research on rare diseases.

  4. March 23rd - 15:30 - 17:00 - S2 Z74 - Elham Yousefi, MSc, Zentrum für Medical Data Science, Medizinische Universität Wien: "Optimal Design Methods for Model Discrimination" (Dissertations-Defensio)

    Link to Abstract, opens a file

Winter Term 2022/23

  1. January 26th - Ritabrata ‘Rito’ Dutta,  University of Warwick, UK (joint work with Lorenzo Pachhiardi and Sherman Khoo.): Sampling Likelihood-Free ‘generalized' posteriors with Stochastic Gradient MCMC


    We propose a framework for Bayesian Likelihood-Free Inference (LFI) based on Generalized Bayesian Inference. To define the generalized posterior, we use Scoring Rules (SRs), which evaluate probabilistic models given an observation. In LFI, we can sample from the model but not evaluate the likelihood; for this reason, we employ SRs with easy empirical estimators. Our framework includes novel approaches and popular LFI techniques (such as Bayesian Synthetic Likelihood) and enjoys posterior consistency in a well-specified setting when a strictly-proper SR is used (i.e., one whose expectation is uniquely minimized when the model corresponds to the data generating process). In general, our framework does not approximate the standard posterior; as such, it is possible to achieve outlier robustness, which we prove is the case for the Kernel and Energy Scores. Further, we show that our setup can utilise gradient based Markov chain Monte Carlo (MCMC) methods to sample from this proposed generalized posterior, hence making high dimensional parameter inference possible for models with intractable likelihood functions.

  2. January 19th - Sebastian Fuchs,  Universität Salzburg: Using dimension reduction for quantifying and estimating predictability and explainability in regression analysis

    link to abstract ..., opens a file

  3. January 12th - Luca Gerardo-Giorda, JKU Linz: Differential equations meet data: quantifying uncertainty for strategy planning in Ecology and Disease.

    Luca Gerardo-Giorda studied Mathematics at the University of Turin, and in 2002 received his Doctorate in Applied Mathematics from the University of Trento. He was awarded a Marie Curie Industry Fellowship at the Institut Francais du Petrole in 2003. After working on applied interdisciplinary research at institutions in Europe (University of Trento, Ecole Polytechnique Paris) and the USA (Emory), in 2014 he set up the group on Mathematical Modeling in Biosciences at BCAM (the Basque Center for Applied Mathematics in Bilbao), that he led until February 2020 when he joined Johannes Kepler University Linz. He is currently the head of the Institute for Mathematical Methods in Medicine and Data-Based Modeling at JKU, and group leader at the Johann Radon Institute for Computational and Applied Mathematics (RICAM) of the Austrian Academy of Sciences. An expert in biomedical modeling and simulation, he seeks quantitative answers to clinical problems, with the aim of providing medical doctors with innovative simulation tools to be efficiently used for in silico pathology assessment and in support of clinical decision making.


    In the recent decades, the possibility to simulate complex problems popularised the use of computational models in support for the activity of medical doctors and life scientists. As an example, one of the aims of spatial ecology is to help public health authorities and environmental conservation agencies to take more informed decisions at the time of identifying, monitoring and countering invasive dynamics, be it an infectious disease in wildlife, or the spread of an exogenous species. An accurate computational model can be an efficient predictive tool on which building a proper intervention strategy for the challenge at hand.

    In this direction, it is well recognized that quantifying uncertainty is essential for computational predictions to have any real value (as highlighted by the 2014 FDA guidance on use of computational simulation). As an example, the incorrect assumption of perfect knowledge of the model parameters hinders the prediction of relevant Quantities of Interest (QoI) and may result in choosing erroneous interventional strategies. Primary sources of uncertainties may result from input variability (aleatory/irreducible uncertainty), such as the initial conditions, or from a lack of knowledge (epistemic/reducible uncertainty), such as the modeling assumptions or the influence of yet unknown physical or biological phenomena.

    Moreover, problems from biomedicine and life science are extremely complex and challenging from the modeling viewpoint. Typically, they are characterised by remarkable heterogeneities and multi-scale dynamics, both in space and time: a reliable predictive mathematical model should be able to soundly cope with these aspects. Unfortunately, more often than not, the available data for model calibration is very limited for a variety of reason, especially in the case of spatial ecology (scarcity of data, limited amounts of economic resources to collect them) or in the presence of a new, poorly known, disease.

    In this talk I will present some studies we carried out in the recent years on the spread of invasive species and infectious diseases, where we quantify the uncertainty in the presence of scarce data by combining differential equations (be them ordinary or partial) with Generalised Dynamic Linear Models (in a Bayesian framework) or Polynomial Chaos.


  4. December, 1st - Andrea Berghold, Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria: Randomization in Clinical Trials


    Randomization is a crucial component of an experimental design in general and clinical trials in particular. Using adequate randomization methods is therefore an important prerequisite in conducting a clinical trial. Many procedures have been proposed for the random assignment of participants to treatment groups in clinical trials. Various restricted randomization techniques such as permuted block design, biased coin design, urn design or big stick design as well as covariate-adaptive and response-adaptive randomization can be found in the literature. I will discuss the performance of different restricted randomization techniques regarding their treatment balance behavior and allocation randomness.
    However, it is not only important to have different techniques available but also to have suitable software to allow use of these techniques in practice. I will present a web-based randomization tool for multi-centre clinical studies (“Randomizer” – www.randomizer.at) which was developed by the Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria. This tool facilitates efficient management of the randomization process including allocation concealment, stratification, audit trails etc. and can also be used for simulation of different randomization designs.

  5. October 20th - 15:30 - 17:00 - S2 Z74 - Dr. Alejandra Avalos Pacheco:

    “Multi-study Factor Regression Models for Large Complex Data with Applications to Nutritional Epidemiology and Cancer Genomics”


    Data-integration of multiple studies can be key to understand and gain knowledge in statistical research. However, such data present both biological and artifactual sources of variation, also known as covariate effects. Covariate effects can be complex, leading to systematic biases. In this talk I will present novel sparse latent factor regression (FR) and multi-study factor regression (MSFR) models to integrate such heterogeneous data. The FR model provides a tool for data exploration via dimensionality reduction and sparse low-rank covariance estimation while correcting for a range of covariate effects. MSFR are extensions of FR that enable us to jointly obtain a covariance structure that models the group-specific covariances in addition to the common component, learning covariate effects from the observed variables, such as the demographic information. I will discuss the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our approach provides a flexible methodology for sparse factor regression which is not limited to data with covariate effects. I will present several examples, with a focus on bioinformatics applications. We show the usefulness of our methods in two main tasks: (1) to give a visual representation of the latent factors of the data, i.e. an unsupervised dimension reduction task and (2) to provide a (i) supervised survival analysis, using the factors obtained in our method as predictions for the cancer genomic data; and (ii) dietary pattern analysis, associating each factor with a measure of overall diet quality related to cardiometabolic disease risk for a hispanic community health nutritional-data study.
    Our results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality. The results of our analyses illustrate how failing to properly account for covariate effects can result in unreliable inference.

Summer Term 2022

  1. March 17th, 15:30; S2 Z74, Science Park 2 - Stefan Rass, Lehrstuhl Secure Systems LIT Secure and Correct Systems Lab, JKU: On Privacy in Machine Learning by Plausible Deniability

    link to the slides of the talk ..., opens a file

    Abstract: When a machine learning model is trained from data, the data may be subject to security requirements and even be classified as sensitive. If the trained model is intended for use by untrusted parties, this raises the question of how much information about the training data is extractable from the machine learning model, once it is given away. The talk presents two results in this regard, based on the security notion of plausible deniability. We show that a model of finite size will retain a nonzero residual entropy if the training data has a size beyond a (model-dependent) threshold. Second, we show that for a certain class of models, and any artificially chosen training data, we can craft a topological norm that gives an error metric under which the training recovers exactly the given model. The order of quantifiers is what enables plausible deniability here, since we can, for any given model, claim this to have risen from an arbitrary training set that can have any distribution and can be completely unrelated to the original sensitive training data. We illustrate the method on examples from normal and logistic regression and some examples of neural networks and discuss the practical implications of these results.

  2. June 23rd - 14:00 - 15:15 - S2 Z74 - Liana Jacobi:

    “Posterior Manifolds over Hyperparameter Regions (and Joint Prior Parameter Dependence): Moving Beyond Localized Assessments of Prior Parameter Specifications in MCMC Inference”, joint with Andres Ramirez-Hassan, Jackson Kwok and Nhung Nghiem


    Prior parameter sensitivity has moved into the focus of prior robustness analysis in response to the increased use of Bayesian inference in applied work, in particular with the popularity of Markov chain Monte Carlo (MCMC) inference under conjugate priors. It is commonly investigated in terms of local or pointwise assessments, in the form of derivatives or multiple evaluations. As such it provides limited localized information about the impact of prior parameter specifications, with the scope further restricted due to analytical and computational complexities in most MCMC applications.

    This paper introduces an approach based on the geometry of posterior statistics over hyperparameter regions (posterior manifolds) that encompasses and expands upon two common localized strategies to obtain more information about prior parameter dependence. The proposed estimation strategy is based on multiple point evaluations with Gaussian processes with efficient selection of evaluation points achieved via Active Learning, that is further complemented with derivative information via a recent Automatic Differentiation approach for MCMC output. The approach gives rise to formal measures that can quantify additional aspects of prior parameter dependence and uncover more complex dependencies across prior parameters that are particularly relevant in practical applications which often involve the setting of many location and precision parameters. The real data example investigates the impact of joint changes in prior demand parameter specifications on elasticity inference under a common multivariate demand framework for 5 main good groups using data from a recent virtual supermarket experiment. We identify and estimate sensitivity manifolds for the three-most sensitive (cross-)price and expenditure elasticities and show how conclusions regarding substitutionary versus complementary relationships as well as price sensitivity characteristics (normal versus inferior goods, elastic vs inelastic) can change across the prior parameter space

  3. May, 19th - Paul Hofmarcher, Department of Economics, Paris-Lodron-University Salzburg: Gaining Insights on US Senate Speeches Using a Time Varying Text Based Ideal Point Model


    Estimating political positions of lawmakers has a long tradition in political science and usually lawmakers’ votes are used to quantify their political positions. But lawmakers also give speeches or press statements. In this work we present a time varying text based ideal point model (TV-TBIP) which allows to study political positions of lawmakers in a completely unsupervised way. In doing so, our model combines the class of topic models with ideal point models in a time-dynamic setting.

    Our model is inspired by the idea of political framing, so that specific words or terms used when discussing a topic can convey political messages.

    The insights of our model are twofold: Firstly, it allows to detect how political discussion of certain topics has changed over time, and secondly it estimates ideological positions of lawmakers on a party level. Using only the texts of Senate speeches, our model identifies US-senators along an interpretable progressive-to-moderate spectrum.

    We apply our model to nearly 40 years of US Senate house discussions between 1981 and 2017.

  4. May 12th - 16:30 - 18:00 - S3 047 - Prof. Dr. Werner Brannath, University Bremen: A liberal type I error rate for studies in precision medicine
    (joint work with Charlie Hillner and Kornelius Rohmeyer)


    We introduce a new multiple type I error criterion for clinical trials with multiple populations. Such trials are of interest in precision medicine where the goal is to develop treatments that are targeted to specific sub-populations defined by genetic and/or clinical biomarkers. The new criterion is based on the observation that not all type I errors are relevant to all patients in the overall population. If disjoint sub-populations are considered, no multiplicity adjustment appears necessary, since a claim in one sub-population does not affect patients in the other ones. For intersecting sub-populations we suggest to control the average multiple type error rate, i.e. the probably that a randomly selected patient will be exposed to an inefficient treatment. We call this the population-wise error rate, exemplify it by a number of examples and illustrate how to control it with an adjustment of critical boundaries or adjusted p-values. We furthermore define corresponding simultaneous confidence intervals. We finally illustrate the power gain achieved by passing from family-wise to population-wise error rate control with two simple examples and a recently suggest multiple testing approach for umbrella trials.

Winter Term 2021/22

  1. November 4th, 15:30 - Florian Meinfelder, Otto-Friedrich-Universität Bamberg: Propensity Score Matching and Statistical Matching


    The potential outcome framework generates for a binary treatment variable a missing data pattern that bears resemblance to a data fusion situation, where two different data sources are stacked. The reason for the similarity regarding the missing data pattern is that either outcome under treatment or outcome under control is observed (but never both for obvious reasons). The classical approach under the Rubin Causal Model is to use a nearest neighbor technique called Propensity Score Matching (PSM) to estimate the average treatment effect on the treated (ATET). Data fusion is also referred to as ‘Statistical Matching’, and nearest neighbor matching techniques have indeed been a popular choice for data fusion problems as well, since statistical twins are identified on an individual basis. Recently, publications emerged where the causal inference method PSM was applied to data fusion problems. Within this talk we will investigate under which circumstances PSM can be a viable method for a data fusion scenario.


  2. October 21st, 15:30; MT 128, Science Park 1 - Petr Mazouch, Prague University of Economics and Business | VŠE: Data Quality in Economic and Demographic Statistics

    Abstract: Statisticians use data from different data sources for building statistical models, computing analyses and constructing forecasts. Based on their results, economic subjects (companies, government and households) make decisions. Better models lead to better decisions. One of the critical assumptions of excellent and valuable statistical models is the high quality of inputs – statistical data. Regardless of the data source type, the requirements for the quality of statistical data are the same.

    The first part of the presentation introduces data quality requirements. It discusses the level of fulfilment of these requirements at several examples from different social and economic statistics (labour statistics, SILC, household budget survey, national accounts). The second part focuses on demographic issues with a particular accent on covid-19 statistics. How does the pressure on data timeliness influence other aspects of covid-19 data quality? The final part uses the Bayesian approach for the assessment of the covid-19 data relevance. Join presentation with Jakub Fischer and Tomáš Karel.

  3. November 11th, 15:30 - Ulrike Schneider, TU Wien:  The Geometry of Model Selection and Uniqueness of Lasso-Type Methods

    Abstract: We consider estimation methods in the context of high-dimensional regression models such as the Lasso and SLOPE, defined as solutions to a penalized optimization problem. The geometric object relevant for our investigation is the polytope that is dual to to the unit ball of the penalizing norm. We show that which models are accessible by such a procedure depends on what faces of the polytope are intersected by the row span of the regressor matrix. Moreover, these geometric considerations allow to derive a criterion for the uniqueness of the estimator that is both necessary and sufficient. We illustrate this approach for Lasso and SLOPE with the unit cube and the sign permutahedron as relevant polytopes.  Joint work with Patrick Tardivel (Université Bourgogne).

Summer Term 2021

  1. Online-talk - May 20th, 17:15 - Ulrike Held, Department of Biostatistics, University of Zurich: Matching on treatment in observational research - what is the role of the matching algorithm?

    Link to the Abstract, opens a file

    Zoom link to the online talks, opens an external URL in a new window

    orchid account, opens an external URL in a new window

    website Ulrike Held, opens an external URL in a new window


  2. Online-talk - April 22th, 17:15 - Dr. Klaus Nordhausen, University of Jyväskylä, Finland: Blind source separation for multivariate spatial data

    Zoom link to the online talks, opens an external URL in a new window


    Blind source separation has a long tradition for iid data and multivariate time series. Blind source separation methods for multivariate spatial observations have however not been considered yet much in the literature. We suggest therefore a blind source separation model for spatial data and show how the latent components can be estimated using two or more scatter matrices. The statistical properties and merits of these estimators are derived and verified in simulation studies. A real data example illustrates the method.


  3. Online-talk - March 25th - Dr. Matt Sutton, QUT, Australia: Reversible Jump PDMP Samplers for Variable Selection

    Zoom link to the online talks, opens an external URL in a new window

    Abstract:A new class of Markov chain Monte Carlo (MCMC) algorithms, based on simulating piecewise deterministic Markov processes (PDMPs), have recently shown great promise: they are non-reversible, can mix better than standard MCMC algorithms, and can use subsampling ideas to speed up computation in big data scenarios. However, current PDMP samplers can only sample from posterior densities that are differentiable almost everywhere, which precludes their use for model choice. Motivated by variable selection problems, we show how to develop reversible jump PDMP samplers that can jointly explore the discrete space of models and the continuous space of parameters. Our framework is general: it takes any existing PDMP sampler and adds two types of trans-dimensional moves that allow for the addition or removal of a variable from the model. We show how the rates of these trans-dimensional moves can be calculated so that the sampler has the correct invariant distribution. Simulations show that the new samplers can mix better than standard MCMC algorithms. Our empirical results show they are also more efficient than gradient-based samplers that avoid model choice through use of continuous spike-and-slab priors which replace a point mass at zero for each parameter with a density concentrated around zero.

Winter Term 2020/21

  1. Online-talk - January 28th - Ulrike Schneider, TU Wien

    Zoom-link to the online talks, opens an external URL in a new window

  2. Online-talk - January 21st - Lisa Ehrlinger & Florian Sobieczky, Software Competence Center Hagenberg: A rendezvous in data science: machine learning meets statistics

    Zoom link to the online talks, opens an external URL in a new window


    The talk covers several typical challenges from “Data Science” arising in research
    projects at the Software Competence Center Hagenberg (SCCH). Classical statistics
    as well as modern complex machine learning methods, such as neural networks, are
    applied to real-world use cases from industry.
    In the first part, a short presentation of SCCH as an institution for applied research
    is given, which is particularly interesting for students with an interest in a master or
    PhD thesis on practical problems.
    The second part is a summary of various projects involving real-world data with a
    focus on recurring statistical problems from manufacturing scenarios. In particular,
    methods related to anomaly detection, diagnosis and prediction using machine
    learning methods are discussed with some care given to the black-box stigma of typical
    modern machine learning methods. The presentation is intended to identify classical
    methods and open research questions from statistics relevant for approaches taken by
    SCCH’s strategy on predictive maintenance.
    *SCCH – Software Competence Center Hagenberg
    **FAW - Institute for Application-oriented Knowledge Processing der JKU

  3. Online-talk - November 19th - Zsolt Lavicza & Martin Andre, Johannes Kepler University in Linz & Universität Innsbruck: Technology changing statistics education: Defining possibilities, opportunities and obligations.

    Slides of the talk, opens a file


    In our talk, we will online some educational research activities within the Linz School of Education related to technology developments and statistics education. Afterwards, we will discuss our work on introducing statistics concepts in schools and how statistics teaching can be connected to sustainable development with real data for students in schools. In particular, we will discuss that statistics is becoming crucial in our current data-driven society to explore numerous phenomena that are too complex to comprehend without exploring and visualising data. Citizens need to understand statistics about issues concerning essential parts of their lives such as the spread of a pandemic or climate change in order to responsibly participate in a prosperous development of our civilization. With our research projects we try to find out more about young students’ intuitive approaches to statistics when visually analysing data. We found that certain kinds of data visualisations are especially capable to provoke reasoning of statistical concepts such as ideas of centre, spread and covariation. Based on these intuitive visual approaches to statistics, another aspect of our design-based research projects is concerned with statistical modelling processes. We developed a learning trajectory where middle school students were engaged in analysing real-world data to explore sustainable development of various countries and to build a model for this phenomenon. Results show that students’ statistical investigative learning processes should feature active participation in constructing knowledge of formal statistical concepts; and students should adopt and fit their intuitive knowledge to formal concepts using methods of visual data analyses. We will outline some diverse opportunities to foster students’ intuitive understanding of statistics and sustainable development issues simultaneously.

    Zoom link to the online talks, opens an external URL in a new window


  4. Online-talk - November 12th - Irene Tubikanec, Johannes Kepler University in Linz: Approximate Bayesian computation for stochastic differential equations with an invariant distribution

    Slides of the talk, opens a file in a new window

    Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to be an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise. First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler–Maruyama discretization) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterized by an invariant distribution and for which a measure-preserving numerical method can be derived.

    Zoom link to the online talk, opens an external URL in a new window

  5. Online-talk - November 5th - Alex Kowarik, Statistik Austria: COVID-19 Prevalence Study - Was the Sample Large Enough? 3,000 Martians, Results and More

    In November, a sample survey to determine the COVID-19 prevalence will be carried out for the third time. The lecture is intended to shed light on the methodological aspects of sampling, weighting and error calculation of these surveys.

    Slides of the talk, opens a file

    For login-details of this online event please contact Milan Stehlik

Summer Term 2020

  1. Online-talk - June 18th - Torsten Hothorn, University of Zurich, Switzerland: Understanding and Applying Transformation Models

    [Abstract], opens a file

    For login-details of this online event please contact Markus Hainy

Winter Term 2019/20

  1. 23. January 2020

    Peter Filzmoser, Technische Universität Wien

    Robust and sparse k-means clustering in high dimension

    [abstract], opens a file

  2. 05. December 2019

    Hao Wang, Jilin University, Changchun

    Dependence structure between Chinese Shanghai and Shenzhen stock market based on copulas and cluster analysis

    [abstract], opens a file

  3. 28. November 2019

    Haipeng Li, CAS-MPG, Shanghai

    Supervised learning for analyzing large-scale genome-wide DNA polymorphism data

    [abstract], opens a file


  4. 07. November 2019

    Günter Pilz, Johannes Kepler Universität Linz

    Statistik ist ein Segen für die Menschheit

    [abstract], opens a file

    [talk], opens a file


  5. 31. October 2019

    Martin Wolfsegger, Takeda Pharmaceutical Company Ltd.

    Some likely useful thoughts on prescription drug-use-related software support­ing personalized dosing regimen

    Alexander Bauer, Takeda Pharmaceutical Company Ltd.

    Evaluation of drug combinations

    [abstract], opens a file



  6. 10. October 2019

    Leonardo Grilli, University of Florence

    Multiple imputation and selection of predictors in multilevel models for analys­ing the relationship between student ratings and teacher beliefs and practices

    [abstract], opens a file


Summer Term 2019

  1. 23. May 2019
    Siegfried Hörmann, TU Graz, Austria: ANOVA for functional time series data: when there is dependence between groups

    [abstract], opens a file


  2. 9. May 2019
    Markus Hainy, Johannes Kepler Universität Linz: Optimal Bayesian design for models with intractable likelihoods via supervised learning

    [abstract], opens a file


  3. 11. April 2019
    Dominik Schrempf, Eötvös Loránd University in Budapest, Hungary: Phylogenetic incongruences - opportunities to improve the reconstruction of a dated tree of life

    [abstract], opens a file

  4. 4. April 2019
    Antony Overstall, University of Southampton, UK: Bayesian design for physical models using computer experiments

    [abstract], opens a file

  5. 14. March 2019
    Florian Frommlet, Medical University Vienna, Austria: Deep Bayesian Regression

    [abstract], opens a file


  6. 14. March 2019. Attention, Start: 13:45
    Thomas Petzoldt, TU Dresden, Germany: Identification of distribution components from antibiotic resistance data - Opportunities and challenges

    [abstract], opens a file

Winter Term 2018/19

  1. 17. January 2019
    Harry Haupt, Universität Passau, Germany: Modeling spatial components for complexly associated urban data

    [abstract], opens a file in a new window


  2. 21. November 2018 (Attention, Wednesday 15:30, S3 048)
    Hirohisa Kishino, University of Tokyo, Japan: Bridging molecular evolution and phenotypic evolution

    [abstract], opens a file in a new window

  3. 15. November 2018
    Helmut Küchenhoff, Ludwig-Maximilians-Universität München, The analysis of voter transitions in the Bavarian state election 2018 using data from different sources: a teaching research project conducted by three Bavarian universities

    [abstract], opens a file

    [Slides], opens a file

  4. 8. November 2018
    Efstathia Bura, TU Wien: Least Squares and ML Estimation Approaches of the Sufficient Reduction for Matrix Valued Predictors

    [abstract], opens a file in a new window

  5. 25. Oktober 2018
    Erindi Allaj: Volatility measurement in presence of high-frequency data

    [abstract], opens a file in a new window

  6. 11. October 2018
    David Gabauer, JKU Linz: To Be or Not to Be’ a Member of an Optimum Currency Area?

    [abstract], opens a file in a new window

Summer Term 2018

  1. 28. June 2018
    Gangaram S. Ladde, University of South Florida, USA: Energy/Lyapunov Function Method and Stochastic Mathematical Finance

    [abstract], opens a file in a new window

  2. 24. May 2018
    Pavlina Jordanova, University of Shumen, Bulgaria: On “multivariate” modifications of Cramer Lundberg risk model. 

  3. 26. April 2018
    Juan M. Rodríguez-Díaz, Universidad de Salamanca, Spanien: Design optimality in multiresponse models with double covariance structure. 

  4. 24. May 2018
    Carsten Wiuf, University of Copenhagen, Denmark: A simple method to aggregate p-valus without a priori grouping. 

  5. 15. March 2018
    Andreas Mayr, Friedrich-Alexander-University Erlangen-Nürnberg, Germany: An introduction to boosting distributional regression

  6. 19. April 2018
    Robert Breitenecker, Johannes Kepler University Linz: Spatial Heterogeneity in Entrepreneurship Research: An application of Geographically Weighted Regression

Winter Term 2017/18

  1. 25. January 2018
    Thomas Kneib, Georg-August-Universität Göttingen: A Lego System for Building Structured Additive Distributional Regression Models with Tensor Product Interactions

  2. 7. December 2017
    Franz König, Medizinische Universität Wien: Optimal rejection regions for multi-arm clinical trials

    [abstract], opens a file

  3. 9. November 2017
    Henrique Teotonio, Institut de Biologie de l'École Normale Supérieure, Paris: Inferring natural selection and genetic drift in evolution experiments

  4. 19. October 2017
    Lenka Filová, Comenius University in Bratislava: Optimal Design of Experiments in R

  5. 12. October 2017
    Elisa Perrone, Massachusetts Institute of Technology, Cambridge, MA (USA): Discrete copulas for weather forecasting: theoretical and practical aspects

    [abstract], opens a file