Institute of Applied Statistics
Thursdays from 15:30 until 17:00
Oct. 21st 2021: MT 128, Science Park 1
usually: Science Park 2, Intermediate Storey, Z74.
November 4th, 15:30 - Florian Meinfelder, Otto-Friedrich-Universität Bamberg: Propensity Score Matching and Statistical Matching
The potential outcome framework generates for a binary treatment variable a missing data pattern that bears resemblance to a data fusion situation, where two different data sources are stacked. The reason for the similarity regarding the missing data pattern is that either outcome under treatment or outcome under control is observed (but never both for obvious reasons). The classical approach under the Rubin Causal Model is to use a nearest neighbor technique called Propensity Score Matching (PSM) to estimate the average treatment effect on the treated (ATET). Data fusion is also referred to as ‘Statistical Matching’, and nearest neighbor matching techniques have indeed been a popular choice for data fusion problems as well, since statistical twins are identified on an individual basis. Recently, publications emerged where the causal inference method PSM was applied to data fusion problems. Within this talk we will investigate under which circumstances PSM can be a viable method for a data fusion scenario.
October 21st, 15:30; MT 128, Science Park 1 - Petr Mazouch, Prague University of Economics and Business | VŠE: Data Quality in Economic and Demographic Statistics
Abstract: Statisticians use data from different data sources for building statistical models, computing analyses and constructing forecasts. Based on their results, economic subjects (companies, government and households) make decisions. Better models lead to better decisions. One of the critical assumptions of excellent and valuable statistical models is the high quality of inputs – statistical data. Regardless of the data source type, the requirements for the quality of statistical data are the same.
The first part of the presentation introduces data quality requirements. It discusses the level of fulfilment of these requirements at several examples from different social and economic statistics (labour statistics, SILC, household budget survey, national accounts). The second part focuses on demographic issues with a particular accent on covid-19 statistics. How does the pressure on data timeliness influence other aspects of covid-19 data quality? The final part uses the Bayesian approach for the assessment of the covid-19 data relevance. Join presentation with Jakub Fischer and Tomáš Karel.
November 11th, 15:30 - Ulrike Schneider, TU Wien: The Geometry of Model Selection and Uniqueness of Lasso-Type Methods
Abstract: We consider estimation methods in the context of high-dimensional regression models such as the Lasso and SLOPE, defined as solutions to a penalized optimization problem. The geometric object relevant for our investigation is the polytope that is dual to to the unit ball of the penalizing norm. We show that which models are accessible by such a procedure depends on what faces of the polytope are intersected by the row span of the regressor matrix. Moreover, these geometric considerations allow to derive a criterion for the uniqueness of the estimator that is both necessary and sufficient. We illustrate this approach for Lasso and SLOPE with the unit cube and the sign permutahedron as relevant polytopes. Joint work with Patrick Tardivel (Université Bourgogne).
Online-talk - May 20th, 17:15 - Ulrike Held, Department of Biostatistics, University of Zurich: Matching on treatment in observational research - what is the role of the matching algorithm?
Online-talk - April 22th, 17:15 - Dr. Klaus Nordhausen, University of Jyväskylä, Finland: Blind source separation for multivariate spatial data
Blind source separation has a long tradition for iid data and multivariate time series. Blind source separation methods for multivariate spatial observations have however not been considered yet much in the literature. We suggest therefore a blind source separation model for spatial data and show how the latent components can be estimated using two or more scatter matrices. The statistical properties and merits of these estimators are derived and verified in simulation studies. A real data example illustrates the method.
Online-talk - March 25th - Dr. Matt Sutton, QUT, Australia: Reversible Jump PDMP Samplers for Variable Selection
Abstract:A new class of Markov chain Monte Carlo (MCMC) algorithms, based on simulating piecewise deterministic Markov processes (PDMPs), have recently shown great promise: they are non-reversible, can mix better than standard MCMC algorithms, and can use subsampling ideas to speed up computation in big data scenarios. However, current PDMP samplers can only sample from posterior densities that are differentiable almost everywhere, which precludes their use for model choice. Motivated by variable selection problems, we show how to develop reversible jump PDMP samplers that can jointly explore the discrete space of models and the continuous space of parameters. Our framework is general: it takes any existing PDMP sampler and adds two types of trans-dimensional moves that allow for the addition or removal of a variable from the model. We show how the rates of these trans-dimensional moves can be calculated so that the sampler has the correct invariant distribution. Simulations show that the new samplers can mix better than standard MCMC algorithms. Our empirical results show they are also more efficient than gradient-based samplers that avoid model choice through use of continuous spike-and-slab priors which replace a point mass at zero for each parameter with a density concentrated around zero.
Online-talk - January 28th - Ulrike Schneider, TU Wien
Online-talk - January 21st - Lisa Ehrlinger & Florian Sobieczky, Software Competence Center Hagenberg: A rendezvous in data science: machine learning meets statistics
The talk covers several typical challenges from “Data Science” arising in research
projects at the Software Competence Center Hagenberg (SCCH). Classical statistics
as well as modern complex machine learning methods, such as neural networks, are
applied to real-world use cases from industry.
In the first part, a short presentation of SCCH as an institution for applied research
is given, which is particularly interesting for students with an interest in a master or
PhD thesis on practical problems.
The second part is a summary of various projects involving real-world data with a
focus on recurring statistical problems from manufacturing scenarios. In particular,
methods related to anomaly detection, diagnosis and prediction using machine
learning methods are discussed with some care given to the black-box stigma of typical
modern machine learning methods. The presentation is intended to identify classical
methods and open research questions from statistics relevant for approaches taken by
SCCH’s strategy on predictive maintenance.
*SCCH – Software Competence Center Hagenberg
**FAW - Institute for Application-oriented Knowledge Processing der JKU
Online-talk - November 19th - Zsolt Lavicza & Martin Andre, Johannes Kepler University in Linz & Universität Innsbruck: Technology changing statistics education: Defining possibilities, opportunities and obligations.
In our talk, we will online some educational research activities within the Linz School of Education related to technology developments and statistics education. Afterwards, we will discuss our work on introducing statistics concepts in schools and how statistics teaching can be connected to sustainable development with real data for students in schools. In particular, we will discuss that statistics is becoming crucial in our current data-driven society to explore numerous phenomena that are too complex to comprehend without exploring and visualising data. Citizens need to understand statistics about issues concerning essential parts of their lives such as the spread of a pandemic or climate change in order to responsibly participate in a prosperous development of our civilization. With our research projects we try to find out more about young students’ intuitive approaches to statistics when visually analysing data. We found that certain kinds of data visualisations are especially capable to provoke reasoning of statistical concepts such as ideas of centre, spread and covariation. Based on these intuitive visual approaches to statistics, another aspect of our design-based research projects is concerned with statistical modelling processes. We developed a learning trajectory where middle school students were engaged in analysing real-world data to explore sustainable development of various countries and to build a model for this phenomenon. Results show that students’ statistical investigative learning processes should feature active participation in constructing knowledge of formal statistical concepts; and students should adopt and fit their intuitive knowledge to formal concepts using methods of visual data analyses. We will outline some diverse opportunities to foster students’ intuitive understanding of statistics and sustainable development issues simultaneously.
Online-talk - November 12th - Irene Tubikanec, Johannes Kepler University in Linz: Approximate Bayesian computation for stochastic differential equations with an invariant distribution
Approximate Bayesian computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to be an established tool for modelling time-dependent, real-world phenomena with underlying random effects. When applying ABC to stochastic models, two major difficulties arise. First, the derivation of effective summary statistics and proper distances is particularly challenging, since simulations from the stochastic process under the same parameter configuration result in different trajectories. Second, exact simulation schemes to generate trajectories from the stochastic model are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To obtain summaries that are less sensitive to the intrinsic stochasticity of the model, we propose to build up the statistical method (e.g. the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that these model properties are kept in the synthetic data generation, we adopt measure-preserving numerical splitting schemes. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian type SDEs, both with simulated data and with real electroencephalography data. The derived summaries are particularly robust to the model simulation, and this fact, combined with the proposed reliable numerical scheme, yields accurate ABC inference. In contrast, the inference returned using standard numerical methods (Euler–Maruyama discretization) fails. The proposed ingredients can be incorporated into any type of ABC algorithm and directly applied to all SDEs that are characterized by an invariant distribution and for which a measure-preserving numerical method can be derived.
Online-talk - November 5th - Alex Kowarik, Statistik Austria: COVID-19 Prevalence Study - Was the Sample Large Enough? 3,000 Martians, Results and More
In November, a sample survey to determine the COVID-19 prevalence will be carried out for the third time. The lecture is intended to shed light on the methodological aspects of sampling, weighting and error calculation of these surveys.
For login-details of this online event please contact Milan Stehlik
23. January 2020
Peter Filzmoser, Technische Universität Wien
Robust and sparse k-means clustering in high dimension
05. December 2019
Hao Wang, Jilin University, Changchun
Dependence structure between Chinese Shanghai and Shenzhen stock market based on copulas and cluster analysis
28. November 2019
Haipeng Li, CAS-MPG, Shanghai
Supervised learning for analyzing large-scale genome-wide DNA polymorphism data
07. November 2019
Günter Pilz, Johannes Kepler Universität Linz
Statistik ist ein Segen für die Menschheit
31. October 2019
Martin Wolfsegger, Takeda Pharmaceutical Company Ltd.
Some likely useful thoughts on prescription drug-use-related software supporting personalized dosing regimen
Alexander Bauer, Takeda Pharmaceutical Company Ltd.
Evaluation of drug combinations
10. October 2019
Leonardo Grilli, University of Florence
Multiple imputation and selection of predictors in multilevel models for analysing the relationship between student ratings and teacher beliefs and practices
23. May 2019
Siegfried Hörmann, TU Graz, Austria: ANOVA for functional time series data: when there is dependence between groups
9. May 2019
Markus Hainy, Johannes Kepler Universität Linz: Optimal Bayesian design for models with intractable likelihoods via supervised learning
11. April 2019
Dominik Schrempf, Eötvös Loránd University in Budapest, Hungary: Phylogenetic incongruences - opportunities to improve the reconstruction of a dated tree of life
4. April 2019
Antony Overstall, University of Southampton, UK: Bayesian design for physical models using computer experiments
14. March 2019
Florian Frommlet, Medical University Vienna, Austria: Deep Bayesian Regression
14. March 2019. Attention, Start: 13:45
Thomas Petzoldt, TU Dresden, Germany: Identification of distribution components from antibiotic resistance data - Opportunities and challenges
17. January 2019
Harry Haupt, Universität Passau, Germany: Modeling spatial components for complexly associated urban data
21. November 2018 (Attention, Wednesday 15:30, S3 048)
Hirohisa Kishino, University of Tokyo, Japan: Bridging molecular evolution and phenotypic evolution
15. November 2018
Helmut Küchenhoff, Ludwig-Maximilians-Universität München, The analysis of voter transitions in the Bavarian state election 2018 using data from different sources: a teaching research project conducted by three Bavarian universities
8. November 2018
Efstathia Bura, TU Wien: Least Squares and ML Estimation Approaches of the Sufficient Reduction for Matrix Valued Predictors
25. Oktober 2018
Erindi Allaj: Volatility measurement in presence of high-frequency data
11. October 2018
David Gabauer, JKU Linz: To Be or Not to Be’ a Member of an Optimum Currency Area?
28. June 2018
Gangaram S. Ladde, University of South Florida, USA: Energy/Lyapunov Function Method and Stochastic Mathematical Finance
24. May 2018
Pavlina Jordanova, University of Shumen, Bulgaria: On “multivariate” modifications of Cramer Lundberg risk model.
26. April 2018
Juan M. Rodríguez-Díaz, Universidad de Salamanca, Spanien: Design optimality in multiresponse models with double covariance structure.
24. May 2018
Carsten Wiuf, University of Copenhagen, Denmark: A simple method to aggregate p-valus without a priori grouping.
15. March 2018
Andreas Mayr, Friedrich-Alexander-University Erlangen-Nürnberg, Germany: An introduction to boosting distributional regression
19. April 2018
Robert Breitenecker, Johannes Kepler University Linz: Spatial Heterogeneity in Entrepreneurship Research: An application of Geographically Weighted Regression
25. January 2018
Thomas Kneib, Georg-August-Universität Göttingen: A Lego System for Building Structured Additive Distributional Regression Models with Tensor Product Interactions
7. December 2017
Franz König, Medizinische Universität Wien: Optimal rejection regions for multi-arm clinical trials
9. November 2017
Henrique Teotonio, Institut de Biologie de l'École Normale Supérieure, Paris: Inferring natural selection and genetic drift in evolution experiments
19. October 2017
Lenka Filová, Comenius University in Bratislava: Optimal Design of Experiments in R
12. October 2017
Elisa Perrone, Massachusetts Institute of Technology, Cambridge, MA (USA): Discrete copulas for weather forecasting: theoretical and practical aspects