### Scope

Moving from the era of explainable AI, a comprehensive comparison of the performance of stochastic optimization algorithms has become an increasingly important task. One of the most common ways to compare the performance of stochastic optimization algorithms is to apply statistical analyses. However, for performing them, there are still caveats that need to be addressed for acquiring relevant and valid conclusions. First of all, such statistical analyses require good knowledge from the user to apply them properly, which is often lacking and leads to incorrect conclusions. Secondly, the standard approaches can be influenced by outliers (e.g., poor runs) or some statistically insignificant differences (solutions within some ε-neighborhood) that exist in the data.

This tutorial will provide an overview of the current approaches for analyzing algorithms performance with special emphasis on caveats that are often overlooked. We will show how these can be easily avoided by applying simple principles that lead to Deep Statistical Comparison. The tutorial will not be based on equations, but mainly examples through which a deeper understanding of statistics will be achieved. Examples will be based on various comparison scenarios for single-objective optimization algorithms. The tutorial will end with a demonstration of a web-service-based framework (i.e. DSCTool) for statistical comparison of stochastic optimization algorithms.

### Brief description

To make the audience familiar with terms that will be used during the tutorial we will begin by giving an introduction to statistical analysis. We will explain the difference between descriptive statistics, inferential statistics (Frequentist vs. Bayesian). We will also provide the audience with a background in frequentist hypothesis testing, which is key to making a statistical comparison and always involves two hypotheses, the null and the alternative. Next, we will describe different types of statistical tests (e.g., parametric vs. non-parametric, omnibus vs. post-hoc) and discuss the required conditions that must be met in order to apply them properly. The selection of a statistical test is crucial to the outcome of a study because applying an inappropriate test can lead to a wrong conclusion. We will explicitly point out typical mistakes, which are often found in publications, which result from a lack of statistical knowledge. We will go into explaining the difference between the practical and the statistical significance and discuss how the performance measure influences a statistical comparison with an emphasis on single-problem and multiple-problem analysis. This will be followed by brief explanation of different statistical scenarios including pairwise comparison, multiple comparisons, and multiple comparisons with a control algorithm and we will follow this up by providing the audience with an overview of the standard approaches for making statistical comparisons and the latest advances (i.e. Deep Statistical Comparison) for providing robust statistical results. We will provide examples comparing single-objective stochastic optimization algorithms in different statistical scenarios. Finally, we will give an actual demonstration in which the audience will get to use a web-service-based framework for making a statistical comparison easier without having to worry about making incorrect conclusions. The tutorial will conclude with a summary of the covered topics and important take home messages.

### Outline

• Introduction to statistical analysis (Frequentist vs. Bayesian).
• Background on frequentist hypothesis testing, different statistical tests, the required conditions for their usage and sample size.
• Typical mistakes and understanding why making a statistical comparison of data needs to be done properly.
• Understanding the difference between statistical and practical significance.
• Understanding the effect that performance measures have on making a statistical comparison.
• Defining single-problem and multiple-problem analysis.
• Insight into pairwise comparison, multiple comparisons (all vs. all), and multiple comparisons with a control algorithm (one vs. all).
• Standard approaches to making statistical comparisons and their deficiencies.
• Latest advances in making statistical comparisons e.g., Deep Statistical Comparison, which provides more robust statistical results in cases of outliers and statistically insignificant differences between data values.
• Extended Deep Statistical Comparison for understanding exploitation and exploration powers of stochastic optimization algorithms.
• Examples of all possible statistical scenarios in single-objective optimization and caveats.
• Presentation of a web-service-based framework that automatizes and simplifies the whole process of making a statistical comparison.
• Take home messages.

### Demo

In the demo, we will present how the Deep Statistical Comparison web-service-based framework (DSCTool) can be used in order to prevent us from drawing incorrect conclusions. The examples will include a comparison of single-objective optimization algorithms involved in different test scenarios: a pairwise comparison, multiple comparisons among all the algorithms, and multiple comparisons with a control algorithm. The demonstration will be carried out using standard curl client and IOHAnalyzer website, which supports DSC analyses.

### General info

The tutorial is a mix of introductory and advanced level. The first part covers basic statistical practices used for statistical comparison, while the second part covers the most recent approaches for statistical comparison of stochastic optimization algorithms. The tutorial is planned for students and experienced researchers in the field of stochastic optimization algorithms. Many researchers have problems and difficulties in making a statistical analysis of their data, which they need to correctly interpret their results. To become familiar with making a proper statistical comparison, we suggest to attend this tutorial on how to perform a statistical comparison by focusing on state-of-the-art approaches that provide robust statistical results. We will provide specific case studies where a statistical comparison is made using single-objective stochastic optimization algorithms.

### Instructors

#### Tome Eftimov

Computer Systems Depratment

Jožef Stefan Institue, Slovenia

Tome Eftimov is a researcher at the Computer Systems Department at the Jožef Stefan Institute. He is a visiting assistant professor at the Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University, Skopje. He was a postdoctoral research fellow at the Stanford University, USA, where he investigated biomedical relations outcomes by using AI methods. In addition, he was a research associate at the University of California, San Francisco, investigating AI methods for rheumatology concepts extraction from electronic health records. He obtained his PhD in Information and Communication Technologies (2018). His research interests include statistical data analysis, metaheuristics, natural language processing, representation learning, and machine learning. He has been involved in courses on probability and statistics, and statistical data analysis. The work related to Deep Statistical Comparison was presented as tutorial (i.e. IJCCI 2018, IEEE SSCI 2019, GECCO 2020, and PPSN 2020) or as invited lecture to several international conferences and universities. He is an organizer of several workshops related to AI at high-ranked international conferences. He is a coordinator of a national project “Mr-BEC: Modern approaches for benchmarking in evolutionary computation” and actively participates in European projects.

#### Peter Korošec

Computer Systems Depratment

Jožef Stefan Institue, Slovenia

Peter Korošec received his Ph.D. degree from the Jožef Stefan Postgraduate School, Ljubljana, Slovenia, in 2006. Since 2002, he has been a researcher at the Computer Systems Department, Jožef Stefan Institute, Ljubljana. His current areas of research include understanding principles behind meta-heuristic optimization and parallel/distributed computing. He participated in several tutorials related to statistical analysis for optimization algorithms presented in different international conferences and co-organized a workshop on understanding of evolutionary optimization behavior.