meeting ID: 937 6054 7545
Institute of Applied Statistics
Thursdays from 15:30 until 17:00
Science Park 2, Intermediate Storey, Z74.
December, 1st - Andrea Berghold, Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria: Randomization in Clinical Trials
Randomization is a crucial component of an experimental design in general and clinical trials in particular. Using adequate randomization methods is therefore an important prerequisite in conducting a clinical trial. Many procedures have been proposed for the random assignment of participants to treatment groups in clinical trials. Various restricted randomization techniques such as permuted block design, biased coin design, urn design or big stick design as well as covariate-adaptive and response-adaptive randomization can be found in the literature. I will discuss the performance of different restricted randomization techniques regarding their treatment balance behavior and allocation randomness.
However, it is not only important to have different techniques available but also to have suitable software to allow use of these techniques in practice. I will present a web-based randomization tool for multi-centre clinical studies (“Randomizer” – www.randomizer.at) which was developed by the Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria. This tool facilitates efficient management of the randomization process including allocation concealment, stratification, audit trails etc. and can also be used for simulation of different randomization designs.
January 12th - Luca Gerardo-Giorda, JKU Linz
January 19th - Sebastian Fuchs, Universität Salzburg: Using dimension reduction for quantifying and estimating predictability and explainability in regression analysis
January 26th - Ritabrata Dutta, University of Warwick, UK
October 20th - 15:30 - 17:00 - S2 Z74 - Dr. Alejandra Avalos Pacheco:
“Multi-study Factor Regression Models for Large Complex Data with Applications to Nutritional Epidemiology and Cancer Genomics”
Data-integration of multiple studies can be key to understand and gain knowledge in statistical research. However, such data present both biological and artifactual sources of variation, also known as covariate effects. Covariate effects can be complex, leading to systematic biases. In this talk I will present novel sparse latent factor regression (FR) and multi-study factor regression (MSFR) models to integrate such heterogeneous data. The FR model provides a tool for data exploration via dimensionality reduction and sparse low-rank covariance estimation while correcting for a range of covariate effects. MSFR are extensions of FR that enable us to jointly obtain a covariance structure that models the group-specific covariances in addition to the common component, learning covariate effects from the observed variables, such as the demographic information. I will discuss the use of several sparse priors (local and non-local) to learn the dimension of the latent factors. Our approach provides a flexible methodology for sparse factor regression which is not limited to data with covariate effects. I will present several examples, with a focus on bioinformatics applications. We show the usefulness of our methods in two main tasks: (1) to give a visual representation of the latent factors of the data, i.e. an unsupervised dimension reduction task and (2) to provide a (i) supervised survival analysis, using the factors obtained in our method as predictions for the cancer genomic data; and (ii) dietary pattern analysis, associating each factor with a measure of overall diet quality related to cardiometabolic disease risk for a hispanic community health nutritional-data study.
Our results show an increase in the accuracy of the dimensionality reduction, with non-local priors substantially improving the reconstruction of factor cardinality. The results of our analyses illustrate how failing to properly account for covariate effects can result in unreliable inference.
March 17th, 15:30; S2 Z74, Science Park 2 - Stefan Rass, Lehrstuhl Secure Systems LIT Secure and Correct Systems Lab, JKU: On Privacy in Machine Learning by Plausible Deniability
Abstract: When a machine learning model is trained from data, the data may be subject to security requirements and even be classified as sensitive. If the trained model is intended for use by untrusted parties, this raises the question of how much information about the training data is extractable from the machine learning model, once it is given away. The talk presents two results in this regard, based on the security notion of plausible deniability. We show that a model of finite size will retain a nonzero residual entropy if the training data has a size beyond a (model-dependent) threshold. Second, we show that for a certain class of models, and any artificially chosen training data, we can craft a topological norm that gives an error metric under which the training recovers exactly the given model. The order of quantifiers is what enables plausible deniability here, since we can, for any given model, claim this to have risen from an arbitrary training set that can have any distribution and can be completely unrelated to the original sensitive training data. We illustrate the method on examples from normal and logistic regression and some examples of neural networks and discuss the practical implications of these results.
June 23rd - 14:00 - 15:15 - S2 Z74 - Liana Jacobi:
“Posterior Manifolds over Hyperparameter Regions (and Joint Prior Parameter Dependence): Moving Beyond Localized Assessments of Prior Parameter Specifications in MCMC Inference”, joint with Andres Ramirez-Hassan, Jackson Kwok and Nhung Nghiem
Prior parameter sensitivity has moved into the focus of prior robustness analysis in response to the increased use of Bayesian inference in applied work, in particular with the popularity of Markov chain Monte Carlo (MCMC) inference under conjugate priors. It is commonly investigated in terms of local or pointwise assessments, in the form of derivatives or multiple evaluations. As such it provides limited localized information about the impact of prior parameter specifications, with the scope further restricted due to analytical and computational complexities in most MCMC applications.
This paper introduces an approach based on the geometry of posterior statistics over hyperparameter regions (posterior manifolds) that encompasses and expands upon two common localized strategies to obtain more information about prior parameter dependence. The proposed estimation strategy is based on multiple point evaluations with Gaussian processes with efficient selection of evaluation points achieved via Active Learning, that is further complemented with derivative information via a recent Automatic Differentiation approach for MCMC output. The approach gives rise to formal measures that can quantify additional aspects of prior parameter dependence and uncover more complex dependencies across prior parameters that are particularly relevant in practical applications which often involve the setting of many location and precision parameters. The real data example investigates the impact of joint changes in prior demand parameter specifications on elasticity inference under a common multivariate demand framework for 5 main good groups using data from a recent virtual supermarket experiment. We identify and estimate sensitivity manifolds for the three-most sensitive (cross-)price and expenditure elasticities and show how conclusions regarding substitutionary versus complementary relationships as well as price sensitivity characteristics (normal versus inferior goods, elastic vs inelastic) can change across the prior parameter space
May, 19th - Paul Hofmarcher, Department of Economics, Paris-Lodron-University Salzburg: Gaining Insights on US Senate Speeches Using a Time Varying Text Based Ideal Point Model
Estimating political positions of lawmakers has a long tradition in political science and usually lawmakers’ votes are used to quantify their political positions. But lawmakers also give speeches or press statements. In this work we present a time varying text based ideal point model (TV-TBIP) which allows to study political positions of lawmakers in a completely unsupervised way. In doing so, our model combines the class of topic models with ideal point models in a time-dynamic setting.
Our model is inspired by the idea of political framing, so that specific words or terms used when discussing a topic can convey political messages.
The insights of our model are twofold: Firstly, it allows to detect how political discussion of certain topics has changed over time, and secondly it estimates ideological positions of lawmakers on a party level. Using only the texts of Senate speeches, our model identifies US-senators along an interpretable progressive-to-moderate spectrum.
We apply our model to nearly 40 years of US Senate house discussions between 1981 and 2017.
May 12th - 16:30 - 18:00 - S3 047 - Prof. Dr. Werner Brannath, University Bremen: A liberal type I error rate for studies in precision medicine
(joint work with Charlie Hillner and Kornelius Rohmeyer)
We introduce a new multiple type I error criterion for clinical trials with multiple populations. Such trials are of interest in precision medicine where the goal is to develop treatments that are targeted to specific sub-populations defined by genetic and/or clinical biomarkers. The new criterion is based on the observation that not all type I errors are relevant to all patients in the overall population. If disjoint sub-populations are considered, no multiplicity adjustment appears necessary, since a claim in one sub-population does not affect patients in the other ones. For intersecting sub-populations we suggest to control the average multiple type error rate, i.e. the probably that a randomly selected patient will be exposed to an inefficient treatment. We call this the population-wise error rate, exemplify it by a number of examples and illustrate how to control it with an adjustment of critical boundaries or adjusted p-values. We furthermore define corresponding simultaneous confidence intervals. We finally illustrate the power gain achieved by passing from family-wise to population-wise error rate control with two simple examples and a recently suggest multiple testing approach for umbrella trials.
November 4th, 15:30 - Florian Meinfelder, Otto-Friedrich-Universität Bamberg: Propensity Score Matching and Statistical Matching
The potential outcome framework generates for a binary treatment variable a missing data pattern that bears resemblance to a data fusion situation, where two different data sources are stacked. The reason for the similarity regarding the missing data pattern is that either outcome under treatment or outcome under control is observed (but never both for obvious reasons). The classical approach under the Rubin Causal Model is to use a nearest neighbor technique called Propensity Score Matching (PSM) to estimate the average treatment effect on the treated (ATET). Data fusion is also referred to as ‘Statistical Matching’, and nearest neighbor matching techniques have indeed been a popular choice for data fusion problems as well, since statistical twins are identified on an individual basis. Recently, publications emerged where the causal inference method PSM was applied to data fusion problems. Within this talk we will investigate under which circumstances PSM can be a viable method for a data fusion scenario.
October 21st, 15:30; MT 128, Science Park 1 - Petr Mazouch, Prague University of Economics and Business | VŠE: Data Quality in Economic and Demographic Statistics
Abstract: Statisticians use data from different data sources for building statistical models, computing analyses and constructing forecasts. Based on their results, economic subjects (companies, government and households) make decisions. Better models lead to better decisions. One of the critical assumptions of excellent and valuable statistical models is the high quality of inputs – statistical data. Regardless of the data source type, the requirements for the quality of statistical data are the same.
The first part of the presentation introduces data quality requirements. It discusses the level of fulfilment of these requirements at several examples from different social and economic statistics (labour statistics, SILC, household budget survey, national accounts). The second part focuses on demographic issues with a particular accent on covid-19 statistics. How does the pressure on data timeliness influence other aspects of covid-19 data quality? The final part uses the Bayesian approach for the assessment of the covid-19 data relevance. Join presentation with Jakub Fischer and Tomáš Karel.
November 11th, 15:30 - Ulrike Schneider, TU Wien: The Geometry of Model Selection and Uniqueness of Lasso-Type Methods
Abstract: We consider estimation methods in the context of high-dimensional regression models such as the Lasso and SLOPE, defined as solutions to a penalized optimization problem. The geometric object relevant for our investigation is the polytope that is dual to to the unit ball of the penalizing norm. We show that which models are accessible by such a procedure depends on what faces of the polytope are intersected by the row span of the regressor matrix. Moreover, these geometric considerations allow to derive a criterion for the uniqueness of the estimator that is both necessary and sufficient. We illustrate this approach for Lasso and SLOPE with the unit cube and the sign permutahedron as relevant polytopes. Joint work with Patrick Tardivel (Université Bourgogne).
Online-talk - May 20th, 17:15 - Ulrike Held, Department of Biostatistics, University of Zurich: Matching on treatment in observational research - what is the role of the matching algorithm?
Online-talk - April 22th, 17:15 - Dr. Klaus Nordhausen, University of Jyväskylä, Finland: Blind source separation for multivariate spatial data
Blind source separation has a long tradition for iid data and multivariate time series. Blind source separation methods for multivariate spatial observations have however not been considered yet much in the literature. We suggest therefore a blind source separation model for spatial data and show how the latent components can be estimated using two or more scatter matrices. The statistical properties and merits of these estimators are derived and verified in simulation studies. A real data example illustrates the method.
Online-talk - March 25th - Dr. Matt Sutton, QUT, Australia: Reversible Jump PDMP Samplers for Variable Selection
Abstract:A new class of Markov chain Monte Carlo (MCMC) algorithms, based on simulating piecewise deterministic Markov processes (PDMPs), have recently shown great promise: they are non-reversible, can mix better than standard MCMC algorithms, and can use subsampling ideas to speed up computation in big data scenarios. However, current PDMP samplers can only sample from posterior densities that are differentiable almost everywhere, which precludes their use for model choice. Motivated by variable selection problems, we show how to develop reversible jump PDMP samplers that can jointly explore the discrete space of models and the continuous space of parameters. Our framework is general: it takes any existing PDMP sampler and adds two types of trans-dimensional moves that allow for the addition or removal of a variable from the model. We show how the rates of these trans-dimensional moves can be calculated so that the sampler has the correct invariant distribution. Simulations show that the new samplers can mix better than standard MCMC algorithms. Our empirical results show they are also more efficient than gradient-based samplers that avoid model choice through use of continuous spike-and-slab priors which replace a point mass at zero for each parameter with a density concentrated around zero.
Online-talk - January 28th - Ulrike Schneider, TU Wien
Online-talk - January 21st - Lisa Ehrlinger & Florian Sobieczky, Software Competence Center Hagenberg: A rendezvous in data science: machine learning meets statistics
The talk covers several typical challenges from “Data Science” arising in research
projects at the Software Competence Center Hagenberg (SCCH). Classical statistics
as well as modern complex machine learning methods, such as neural networks, are
applied to real-world use cases from industry.
In the first part, a short presentation of SCCH as an institution for applied research
is given, which is particularly interesting for students with an interest in a master or
PhD thesis on practical problems.
The second part is a summary of various projects involving real-world data with a
focus on recurring statistical problems from manufacturing scenarios. In particular,
methods related to anomaly detection, diagnosis and prediction using machine
learning methods are discussed with some care given to the black-box stigma of typical
modern machine learning methods. The presentation is intended to identify classical
methods and open research questions from statistics relevant for approaches taken by
SCCH’s strategy on predictive maintenance.
*SCCH – Software Competence Center Hagenberg
**FAW - Institute for Application-oriented Knowledge Processing der JKU
Online-talk - November 19th - Zsolt Lavicza & Martin Andre, Johannes Kepler University in Linz & Universität Innsbruck: Technology changing statistics education: Defining possibilities, opportunities and obligations.
In our talk, we will online some educational research activities within the Linz School of Education related to technology developments and statistics education. Afterwards, we will discuss our work on introducing statistics concepts in schools and how statistics teaching can be connected to sustainable development with real data for students in schools. In particular, we will discuss that statistics is becoming crucial in our current data-driven society to explore numerous phenomena that are too complex to comprehend without exploring and visualising data. Citizens need to understand statistics about issues concerning essential parts of their lives such as the spread of a pandemic or climate change in order to responsibly participate in a prosperous development of our civilization. With our research projects we try to find out more about young students’ intuitive approaches to statistics when visually analysing data. We found that certain kinds of data visualisations are especially capable to provoke reasoning of statistical concepts such as ideas of centre, spread and covariation. Based on these intuitive visual approaches to statistics, another aspect of our design-based research projects is concerned with statistical modelling processes. We developed a learning trajectory where middle school students were engaged in analysing real-world data to explore sustainable development of various countries and to build a model for this phenomenon. Results show that students’ statistical investigative learning processes should feature active participation in constructing knowledge of formal statistical concepts; and students should adopt and fit their intuitive knowledge to formal concepts using methods of visual data analyses. We will outline some diverse opportunities to foster students’ intuitive understanding of statistics and sustainable development issues simultaneously.
Online-talk - November 12th - Irene Tubikanec, Johannes Kepler University in Linz: Approximate Bayesian computation for stochastic differential equations with an invariant distribution
Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to be an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise. First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler–Maruyama discretization) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterized by an invariant distribution and for which a measure-preserving numerical method can be derived.
Online-talk - November 5th - Alex Kowarik, Statistik Austria: COVID-19 Prevalence Study - Was the Sample Large Enough? 3,000 Martians, Results and More
In November, a sample survey to determine the COVID-19 prevalence will be carried out for the third time. The lecture is intended to shed light on the methodological aspects of sampling, weighting and error calculation of these surveys.
For login-details of this online event please contact Milan Stehlik
23. January 2020
Peter Filzmoser, Technische Universität Wien
Robust and sparse k-means clustering in high dimension
05. December 2019
Hao Wang, Jilin University, Changchun
Dependence structure between Chinese Shanghai and Shenzhen stock market based on copulas and cluster analysis
28. November 2019
Haipeng Li, CAS-MPG, Shanghai
Supervised learning for analyzing large-scale genome-wide DNA polymorphism data
07. November 2019
Günter Pilz, Johannes Kepler Universität Linz
Statistik ist ein Segen für die Menschheit
31. October 2019
Martin Wolfsegger, Takeda Pharmaceutical Company Ltd.
Some likely useful thoughts on prescription drug-use-related software supporting personalized dosing regimen
Alexander Bauer, Takeda Pharmaceutical Company Ltd.
Evaluation of drug combinations
10. October 2019
Leonardo Grilli, University of Florence
Multiple imputation and selection of predictors in multilevel models for analysing the relationship between student ratings and teacher beliefs and practices
23. May 2019
Siegfried Hörmann, TU Graz, Austria: ANOVA for functional time series data: when there is dependence between groups
9. May 2019
Markus Hainy, Johannes Kepler Universität Linz: Optimal Bayesian design for models with intractable likelihoods via supervised learning
11. April 2019
Dominik Schrempf, Eötvös Loránd University in Budapest, Hungary: Phylogenetic incongruences - opportunities to improve the reconstruction of a dated tree of life
4. April 2019
Antony Overstall, University of Southampton, UK: Bayesian design for physical models using computer experiments
14. March 2019
Florian Frommlet, Medical University Vienna, Austria: Deep Bayesian Regression
14. March 2019. Attention, Start: 13:45
Thomas Petzoldt, TU Dresden, Germany: Identification of distribution components from antibiotic resistance data - Opportunities and challenges
17. January 2019
Harry Haupt, Universität Passau, Germany: Modeling spatial components for complexly associated urban data
21. November 2018 (Attention, Wednesday 15:30, S3 048)
Hirohisa Kishino, University of Tokyo, Japan: Bridging molecular evolution and phenotypic evolution
15. November 2018
Helmut Küchenhoff, Ludwig-Maximilians-Universität München, The analysis of voter transitions in the Bavarian state election 2018 using data from different sources: a teaching research project conducted by three Bavarian universities
8. November 2018
Efstathia Bura, TU Wien: Least Squares and ML Estimation Approaches of the Sufficient Reduction for Matrix Valued Predictors
25. Oktober 2018
Erindi Allaj: Volatility measurement in presence of high-frequency data
11. October 2018
David Gabauer, JKU Linz: To Be or Not to Be’ a Member of an Optimum Currency Area?
28. June 2018
Gangaram S. Ladde, University of South Florida, USA: Energy/Lyapunov Function Method and Stochastic Mathematical Finance
24. May 2018
Pavlina Jordanova, University of Shumen, Bulgaria: On “multivariate” modifications of Cramer Lundberg risk model.
26. April 2018
Juan M. Rodríguez-Díaz, Universidad de Salamanca, Spanien: Design optimality in multiresponse models with double covariance structure.
24. May 2018
Carsten Wiuf, University of Copenhagen, Denmark: A simple method to aggregate p-valus without a priori grouping.
15. March 2018
Andreas Mayr, Friedrich-Alexander-University Erlangen-Nürnberg, Germany: An introduction to boosting distributional regression
19. April 2018
Robert Breitenecker, Johannes Kepler University Linz: Spatial Heterogeneity in Entrepreneurship Research: An application of Geographically Weighted Regression
25. January 2018
Thomas Kneib, Georg-August-Universität Göttingen: A Lego System for Building Structured Additive Distributional Regression Models with Tensor Product Interactions
7. December 2017
Franz König, Medizinische Universität Wien: Optimal rejection regions for multi-arm clinical trials
9. November 2017
Henrique Teotonio, Institut de Biologie de l'École Normale Supérieure, Paris: Inferring natural selection and genetic drift in evolution experiments
19. October 2017
Lenka Filová, Comenius University in Bratislava: Optimal Design of Experiments in R
12. October 2017
Elisa Perrone, Massachusetts Institute of Technology, Cambridge, MA (USA): Discrete copulas for weather forecasting: theoretical and practical aspects