Go to JKU Homepage
Institute of Applied Statistics
What's that?

Institutes, schools, other departments, and programs create their own web content and menus.

To help you better navigate the site, see here where you are at the moment.

Research Seminar

Research Seminar

We kindly welcome all interested people to participate in our research seminar.
Univ.-Prof. Mag. Dr. Andreas Futschik & Univ.-Prof. Mag. Dr. Werner G. Müller

Institute of Applied Statistics

Research Seminar

Time

Thursdays from 15:30 until 17:00

Location

Science Park 2, Intermediate Storey, Z74.

Winter Term 2022/23

  1. December, 1st - Andrea Berghold, Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria: Randomization in Clinical Trials

    Abstract:

    Randomization is a crucial component of an experimental design in general and clinical trials in particular. Using adequate randomization methods is therefore an important prerequisite in conducting a clinical trial. Many procedures have been proposed for the random assignment of participants to treatment groups in clinical trials. Various restricted randomization techniques such as permuted block design, biased coin design, urn design or big stick design as well as covariate-adaptive and response-adaptive randomization can be found in the literature. I will discuss the performance of different restricted randomization techniques regarding their treatment balance behavior and allocation randomness.
    However, it is not only important to have different techniques available but also to have suitable software to allow use of these techniques in practice. I will present a web-based randomization tool for multi-centre clinical studies (“Randomizer” – www.randomizer.at) which was developed by the Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria. This tool facilitates efficient management of the randomization process including allocation concealment, stratification, audit trails etc. and can also be used for simulation of different randomization designs.

  2. January 12th - Luca Gerardo-Giorda, JKU Linz

  3. January 19th - Sebastian Fuchs,  Universität Salzburg: Using dimension reduction for quantifying and estimating predictability and explainability in regression analysis

  4. January 26th - Ritabrata Dutta,  University of Warwick, UK

Previous Talks

Winter Term 2022/23

  1. October 20th - 15:30 - 17:00 - S2 Z74 - Dr. Alejandra Avalos Pacheco:

    “Multi-study Factor Regression Models for Large Complex Data with Applications to Nutritional Epidemiology and Cancer Genomics”

    Abstract:

    Data-integration of multiple studies can be key to understand and gain knowledge in statistical research. However, such data present both biological and artifactual sources of variation, also known as covariate effects. Covariate effects can be complex, leading to systematic biases. In this talk I will present novel sparse latent factor regression (FR) and multi-study factor regression (MSFR) models to integrate such heterogeneous data. The FR model provides a tool for data exploration via dimensionality reduction and sparse low-rank covariance estimation while correcting for a range of covariate effects. MSFR are extensions of FR that enable us to jointly obtain a covariance structure that models the group-specific covariances in addition to the common component, learning covariate effects from the observed variables, such as the demographic information. I will discuss the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our approach provides a flexible methodology for sparse factor regression which is not limited to data with covariate effects. I will present several examples, with a focus on bioinformatics applications. We show the usefulness of our methods in two main tasks: (1) to give a visual representation of the latent factors of the data, i.e. an unsupervised dimension reduction task and (2) to provide a (i) supervised survival analysis, using the factors obtained in our method as predictions for the cancer genomic data; and (ii) dietary pattern analysis, associating each factor with a measure of overall diet quality related to cardiometabolic disease risk for a hispanic community health nutritional-data study.
    Our results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality. The results of our analyses illustrate how failing to properly account for covariate effects can result in unreliable inference.

Summer Term 2022

  1. March 17th, 15:30; S2 Z74, Science Park 2 - Stefan Rass, Lehrstuhl Secure Systems LIT Secure and Correct Systems Lab, JKU: On Privacy in Machine Learning by Plausible Deniability

    link to the slides of the talk ..., opens a file

    Abstract: When a machine learning model is trained from data, the data may be subject to security requirements and even be classified as sensitive. If the trained model is intended for use by untrusted parties, this raises the question of how much information about the training data is extractable from the machine learning model, once it is given away. The talk presents two results in this regard, based on the security notion of plausible deniability. We show that a model of finite size will retain a nonzero residual entropy if the training data has a size beyond a (model-dependent) threshold. Second, we show that for a certain class of models, and any artificially chosen training data, we can craft a topological norm that gives an error metric under which the training recovers exactly the given model. The order of quantifiers is what enables plausible deniability here, since we can, for any given model, claim this to have risen from an arbitrary training set that can have any distribution and can be completely unrelated to the original sensitive training data. We illustrate the method on examples from normal and logistic regression and some examples of neural networks and discuss the practical implications of these results.

  2. June 23rd - 14:00 - 15:15 - S2 Z74 - Liana Jacobi:

    “Posterior Manifolds over Hyperparameter Regions (and Joint Prior Parameter Dependence): Moving Beyond Localized Assessments of Prior Parameter Specifications in MCMC Inference”, joint with Andres Ramirez-Hassan, Jackson Kwok and Nhung Nghiem

    Abstract:

    Prior parameter sensitivity has moved into the focus of prior robustness analysis in response to the increased use of Bayesian inference in applied work, in particular with the popularity of Markov chain Monte Carlo (MCMC) inference under conjugate priors. It is commonly investigated in terms of local or pointwise assessments, in the form of derivatives or multiple evaluations. As such it provides limited localized information about the impact of prior parameter specifications, with the scope further restricted due to analytical and computational complexities in most MCMC applications.

    This paper introduces an approach based on the geometry of posterior statistics over hyperparameter regions (posterior manifolds) that encompasses and expands upon two common localized strategies to obtain more information about prior parameter dependence. The proposed estimation strategy is based on multiple point evaluations with Gaussian processes with efficient selection of evaluation points achieved via Active Learning, that is further complemented with derivative information via a recent Automatic Differentiation approach for MCMC output. The approach gives rise to formal measures that can quantify additional aspects of prior parameter dependence and uncover more complex dependencies across prior parameters that are particularly relevant in practical applications which often involve the setting of many location and precision parameters. The real data example investigates the impact of joint changes in prior demand parameter specifications on elasticity inference under a common multivariate demand framework for 5 main good groups using data from a recent virtual supermarket experiment. We identify and estimate sensitivity manifolds for the three-most sensitive (cross-)price and expenditure elasticities and show how conclusions regarding substitutionary versus complementary relationships as well as price sensitivity characteristics (normal versus inferior goods, elastic vs inelastic) can change across the prior parameter space

  3. May, 19th - Paul Hofmarcher, Department of Economics, Paris-Lodron-University Salzburg: Gaining Insights on US Senate Speeches Using a Time Varying Text Based Ideal Point Model

    Abstract:

    Estimating political positions of lawmakers has a long tradition in political science and usually lawmakers’ votes are used to quantify their political positions. But lawmakers also give speeches or press statements. In this work we present a time varying text based ideal point model (TV-TBIP) which allows to study political positions of lawmakers in a completely unsupervised way. In doing so, our model combines the class of topic models with ideal point models in a time-dynamic setting.

    Our model is inspired by the idea of political framing, so that specific words or terms used when discussing a topic can convey political messages.

    The insights of our model are twofold: Firstly, it allows to detect how political discussion of certain topics has changed over time, and secondly it estimates ideological positions of lawmakers on a party level. Using only the texts of Senate speeches, our model identifies US-senators along an interpretable progressive-to-moderate spectrum.

    We apply our model to nearly 40 years of US Senate house discussions between 1981 and 2017.

  4. May 12th - 16:30 - 18:00 - S3 047 - Prof. Dr. Werner Brannath, University Bremen: A liberal type I error rate for studies in precision medicine
    (joint work with Charlie Hillner and Kornelius Rohmeyer)

    Abstract:

    We introduce a new multiple type I error criterion for clinical trials with multiple populations. Such trials are of interest in precision medicine where the goal is to develop treatments that are targeted to specific sub-populations defined by genetic and/or clinical biomarkers. The new criterion is based on the observation that not all type I errors are relevant to all patients in the overall population. If disjoint sub-populations are considered, no multiplicity adjustment appears necessary, since a claim in one sub-population does not affect patients in the other ones. For intersecting sub-populations we suggest to control the average multiple type error rate, i.e. the probably that a randomly selected patient will be exposed to an inefficient treatment. We call this the population-wise error rate, exemplify it by a number of examples and illustrate how to control it with an adjustment of critical boundaries or adjusted p-values. We furthermore define corresponding simultaneous confidence intervals. We finally illustrate the power gain achieved by passing from family-wise to population-wise error rate control with two simple examples and a recently suggest multiple testing approach for umbrella trials.

Winter Term 2021/22

  1. November 4th, 15:30 - Florian Meinfelder, Otto-Friedrich-Universität Bamberg: Propensity Score Matching and Statistical Matching

    Abstract:

    The potential outcome framework generates for a binary treatment variable a missing data pattern that bears resemblance to a data fusion situation, where two different data sources are stacked. The reason for the similarity regarding the missing data pattern is that either outcome under treatment or outcome under control is observed (but never both for obvious reasons). The classical approach under the Rubin Causal Model is to use a nearest neighbor technique called Propensity Score Matching (PSM) to estimate the average treatment effect on the treated (ATET). Data fusion is also referred to as ‘Statistical Matching’, and nearest neighbor matching techniques have indeed been a popular choice for data fusion problems as well, since statistical twins are identified on an individual basis. Recently, publications emerged where the causal inference method PSM was applied to data fusion problems. Within this talk we will investigate under which circumstances PSM can be a viable method for a data fusion scenario.

     

  2. October 21st, 15:30; MT 128, Science Park 1 - Petr Mazouch, Prague University of Economics and Business | VŠE: Data Quality in Economic and Demographic Statistics

    Abstract: Statisticians use data from different data sources for building statistical models, computing analyses and constructing forecasts. Based on their results, economic subjects (companies, government and households) make decisions. Better models lead to better decisions. One of the critical assumptions of excellent and valuable statistical models is the high quality of inputs – statistical data. Regardless of the data source type, the requirements for the quality of statistical data are the same.

    The first part of the presentation introduces data quality requirements. It discusses the level of fulfilment of these requirements at several examples from different social and economic statistics (labour statistics, SILC, household budget survey, national accounts). The second part focuses on demographic issues with a particular accent on covid-19 statistics. How does the pressure on data timeliness influence other aspects of covid-19 data quality? The final part uses the Bayesian approach for the assessment of the covid-19 data relevance. Join presentation with Jakub Fischer and Tomáš Karel.

  3. November 11th, 15:30 - Ulrike Schneider, TU Wien:  The Geometry of Model Selection and Uniqueness of Lasso-Type Methods

    Abstract: We consider estimation methods in the context of high-dimensional regression models such as the Lasso and SLOPE, defined as solutions to a penalized optimization problem. The geometric object relevant for our investigation is the polytope that is dual to to the unit ball of the penalizing norm. We show that which models are accessible by such a procedure depends on what faces of the polytope are intersected by the row span of the regressor matrix. Moreover, these geometric considerations allow to derive a criterion for the uniqueness of the estimator that is both necessary and sufficient. We illustrate this approach for Lasso and SLOPE with the unit cube and the sign permutahedron as relevant polytopes.  Joint work with Patrick Tardivel (Université Bourgogne).

Summer Term 2021

  1. Online-talk - May 20th, 17:15 - Ulrike Held, Department of Biostatistics, University of Zurich: Matching on treatment in observational research - what is the role of the matching algorithm?

    Link to the Abstract, opens a file

    Zoom link to the online talks, opens an external URL in a new window

    orchid account, opens an external URL in a new window

    website Ulrike Held, opens an external URL in a new window

     

  2. Online-talk - April 22th, 17:15 - Dr. Klaus Nordhausen, University of Jyväskylä, Finland: Blind source separation for multivariate spatial data

    Zoom link to the online talks, opens an external URL in a new window

    Abstract:

    Blind source separation has a long tradition for iid data and multivariate time series. Blind source separation methods for multivariate spatial observations have however not been considered yet much in the literature. We suggest therefore a blind source separation model for spatial data and show how the latent components can be estimated using two or more scatter matrices. The statistical properties and merits of these estimators are derived and verified in simulation studies. A real data example illustrates the method.

     

  3. Online-talk - March 25th - Dr. Matt Sutton, QUT, Australia: Reversible Jump PDMP Samplers for Variable Selection

    Zoom link to the online talks, opens an external URL in a new window

    Abstract:A new class of Markov chain Monte Carlo (MCMC) algorithms, based on simulating piecewise deterministic Markov processes (PDMPs), have recently shown great promise: they are non-reversible, can mix better than standard MCMC algorithms, and can use subsampling ideas to speed up computation in big data scenarios. However, current PDMP samplers can only sample from posterior densities that are differentiable almost everywhere, which precludes their use for model choice. Motivated by variable selection problems, we show how to develop reversible jump PDMP samplers that can jointly explore the discrete space of models and the continuous space of parameters. Our framework is general: it takes any existing PDMP sampler and adds two types of trans-dimensional moves that allow for the addition or removal of a variable from the model. We show how the rates of these trans-dimensional moves can be calculated so that the sampler has the correct invariant distribution. Simulations show that the new samplers can mix better than standard MCMC algorithms. Our empirical results show they are also more efficient than gradient-based samplers that avoid model choice through use of continuous spike-and-slab priors which replace a point mass at zero for each parameter with a density concentrated around zero.

Winter Term 2020/21

  1. Online-talk - January 28th - Ulrike Schneider, TU Wien

    Zoom-link to the online talks, opens an external URL in a new window

  2. Online-talk - January 21st - Lisa Ehrlinger & Florian Sobieczky, Software Competence Center Hagenberg: A rendezvous in data science: machine learning meets statistics

    Zoom link to the online talks, opens an external URL in a new window

    Abstract:

    The talk covers several typical challenges from “Data Science” arising in research
    projects at the Software Competence Center Hagenberg (SCCH). Classical statistics
    as well as modern complex machine learning methods, such as neural networks, are
    applied to real-world use cases from industry.
    In the first part, a short presentation of SCCH as an institution for applied research
    is given, which is particularly interesting for students with an interest in a master or
    PhD thesis on practical problems.
    The second part is a summary of various projects involving real-world data with a
    focus on recurring statistical problems from manufacturing scenarios. In particular,
    methods related to anomaly detection, diagnosis and prediction using machine
    learning methods are discussed with some care given to the black-box stigma of typical
    modern machine learning methods. The presentation is intended to identify classical
    methods and open research questions from statistics relevant for approaches taken by
    SCCH’s strategy on predictive maintenance.
    *SCCH – Software Competence Center Hagenberg
    **FAW - Institute for Application-oriented Knowledge Processing der JKU

  3. Online-talk - November 19th - Zsolt Lavicza & Martin Andre, Johannes Kepler University in Linz & Universität Innsbruck: Technology changing statistics education: Defining possibilities, opportunities and obligations.

    Slides of the talk, opens a file

    Abstract:

    In our talk, we will online some educational research activities within the Linz School of Education related to technology developments and statistics education. Afterwards, we will discuss our work on introducing statistics concepts in schools and how statistics teaching can be connected to sustainable development with real data for students in schools. In particular, we will discuss that statistics is becoming crucial in our current data-driven society to explore numerous phenomena that are too complex to comprehend without exploring and visualising data. Citizens need to understand statistics about issues concerning essential parts of their lives such as the spread of a pandemic or climate change in order to responsibly participate in a prosperous development of our civilization. With our research projects we try to find out more about young students’ intuitive approaches to statistics when visually analysing data. We found that certain kinds of data visualisations are especially capable to provoke reasoning of statistical concepts such as ideas of centre, spread and covariation. Based on these intuitive visual approaches to statistics, another aspect of our design-based research projects is concerned with statistical modelling processes. We developed a learning trajectory where middle school students were engaged in analysing real-world data to explore sustainable development of various countries and to build a model for this phenomenon. Results show that students’ statistical investigative learning processes should feature active participation in constructing knowledge of formal statistical concepts; and students should adopt and fit their intuitive knowledge to formal concepts using methods of visual data analyses. We will outline some diverse opportunities to foster students’ intuitive understanding of statistics and sustainable development issues simultaneously.

    Zoom link to the online talks, opens an external URL in a new window

     

  4. Online-talk - November 12th - Irene Tubikanec, Johannes Kepler University in Linz: Approximate Bayesian computation for stochastic differential equations with an invariant distribution

    Slides of the talk, opens a file in a new window

    Abstract:
    Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to be an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise. First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler–Maruyama discretization) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterized by an invariant distribution and for which a measure-preserving numerical method can be derived.

    Zoom link to the online talk, opens an external URL in a new window

  5. Online-talk - November 5th - Alex Kowarik, Statistik Austria: COVID-19 Prevalence Study - Was the Sample Large Enough? 3,000 Martians, Results and More

    Abstract:
    In November, a sample survey to determine the COVID-19 prevalence will be carried out for the third time. The lecture is intended to shed light on the methodological aspects of sampling, weighting and error calculation of these surveys.

    Slides of the talk, opens a file

    For login-details of this online event please contact Milan Stehlik

Summer Term 2020

  1. Online-talk - June 18th - Torsten Hothorn, University of Zurich, Switzerland: Understanding and Applying Transformation Models

    [Abstract], opens a file

    For login-details of this online event please contact Markus Hainy

Winter Term 2019/20

  1. 23. January 2020

    Peter Filzmoser, Technische Universität Wien

    Robust and sparse k-means clustering in high dimension

    [abstract], opens a file

  2. 05. December 2019

    Hao Wang, Jilin University, Changchun

    Dependence structure between Chinese Shanghai and Shenzhen stock market based on copulas and cluster analysis

    [abstract], opens a file

  3. 28. November 2019

    Haipeng Li, CAS-MPG, Shanghai

    Supervised learning for analyzing large-scale genome-wide DNA polymorphism data

    [abstract], opens a file

     

  4. 07. November 2019

    Günter Pilz, Johannes Kepler Universität Linz

    Statistik ist ein Segen für die Menschheit

    [abstract], opens a file

    [talk], opens a file

     

  5. 31. October 2019

    Martin Wolfsegger, Takeda Pharmaceutical Company Ltd.

    Some likely useful thoughts on prescription drug-use-related software support­ing personalized dosing regimen

    Alexander Bauer, Takeda Pharmaceutical Company Ltd.

    Evaluation of drug combinations

    [abstract], opens a file

     

     

  6. 10. October 2019

    Leonardo Grilli, University of Florence

    Multiple imputation and selection of predictors in multilevel models for analys­ing the relationship between student ratings and teacher beliefs and practices

    [abstract], opens a file

     

Summer Term 2019

  1. 23. May 2019
    Siegfried Hörmann, TU Graz, Austria: ANOVA for functional time series data: when there is dependence between groups

    [abstract], opens a file

     

  2. 9. May 2019
    Markus Hainy, Johannes Kepler Universität Linz: Optimal Bayesian design for models with intractable likelihoods via supervised learning
    methods

    [abstract], opens a file

     

  3. 11. April 2019
    Dominik Schrempf, Eötvös Loránd University in Budapest, Hungary: Phylogenetic incongruences - opportunities to improve the reconstruction of a dated tree of life

    [abstract], opens a file

  4. 4. April 2019
    Antony Overstall, University of Southampton, UK: Bayesian design for physical models using computer experiments

    [abstract], opens a file

  5. 14. March 2019
    Florian Frommlet, Medical University Vienna, Austria: Deep Bayesian Regression

    [abstract], opens a file

     

  6. 14. March 2019. Attention, Start: 13:45
    Thomas Petzoldt, TU Dresden, Germany: Identification of distribution components from antibiotic resistance data - Opportunities and challenges

    [abstract], opens a file

Winter Term 2018/19

  1. 17. January 2019
    Harry Haupt, Universität Passau, Germany: Modeling spatial components for complexly associated urban data

    [abstract], opens a file in a new window

     

  2. 21. November 2018 (Attention, Wednesday 15:30, S3 048)
    Hirohisa Kishino, University of Tokyo, Japan: Bridging molecular evolution and phenotypic evolution

    [abstract], opens a file in a new window

  3. 15. November 2018
    Helmut Küchenhoff, Ludwig-Maximilians-Universität München, The analysis of voter transitions in the Bavarian state election 2018 using data from different sources: a teaching research project conducted by three Bavarian universities

    [abstract], opens a file

    [Slides], opens a file

  4. 8. November 2018
    Efstathia Bura, TU Wien: Least Squares and ML Estimation Approaches of the Sufficient Reduction for Matrix Valued Predictors

    [abstract], opens a file in a new window

  5. 25. Oktober 2018
    Erindi Allaj: Volatility measurement in presence of high-frequency data

    [abstract], opens a file in a new window

  6. 11. October 2018
    David Gabauer, JKU Linz: To Be or Not to Be’ a Member of an Optimum Currency Area?

    [abstract], opens a file in a new window

Summer Term 2018

  1. 28. June 2018
    Gangaram S. Ladde, University of South Florida, USA: Energy/Lyapunov Function Method and Stochastic Mathematical Finance

    [abstract], opens a file in a new window

  2. 24. May 2018
    Pavlina Jordanova, University of Shumen, Bulgaria: On “multivariate” modifications of Cramer Lundberg risk model. 

  3. 26. April 2018
    Juan M. Rodríguez-Díaz, Universidad de Salamanca, Spanien: Design optimality in multiresponse models with double covariance structure. 

  4. 24. May 2018
    Carsten Wiuf, University of Copenhagen, Denmark: A simple method to aggregate p-valus without a priori grouping. 

  5. 15. March 2018
    Andreas Mayr, Friedrich-Alexander-University Erlangen-Nürnberg, Germany: An introduction to boosting distributional regression

  6. 19. April 2018
    Robert Breitenecker, Johannes Kepler University Linz: Spatial Heterogeneity in Entrepreneurship Research: An application of Geographically Weighted Regression

Winter Term 2017/18

  1. 25. January 2018
    Thomas Kneib, Georg-August-Universität Göttingen: A Lego System for Building Structured Additive Distributional Regression Models with Tensor Product Interactions

  2. 7. December 2017
    Franz König, Medizinische Universität Wien: Optimal rejection regions for multi-arm clinical trials

    [abstract], opens a file

  3. 9. November 2017
    Henrique Teotonio, Institut de Biologie de l'École Normale Supérieure, Paris: Inferring natural selection and genetic drift in evolution experiments

  4. 19. October 2017
    Lenka Filová, Comenius University in Bratislava: Optimal Design of Experiments in R

  5. 12. October 2017
    Elisa Perrone, Massachusetts Institute of Technology, Cambridge, MA (USA): Discrete copulas for weather forecasting: theoretical and practical aspects

    [abstract], opens a file