Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

Felipe Campelo; Elizabeth  Wanner

doi:10.1007/s10732-020-09454-w

Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

Felipe Campelo^*, Elizabeth Wanner

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by allowing researchers to design experiments based on the desired best, worst, mean or median-case statistical power to detect differences between algorithms larger than a certain threshold. Holm’s step-down procedure is used to maintain the overall significance level controlled at desired levels, without resulting in overly conservative experiments. This paper also presents an approach for sampling each algorithm on each instance, based on optimal sample size ratios that minimise the total required number of runs subject to a desired accuracy in the estimation of paired differences. A case study investigating the effect of 21 variants of a custom-tailored Simulated Annealing for a class of scheduling problems is used to illustrate the application of the proposed methods for sample size calculations in the experimental comparison of algorithms.

Original language	English
Pages (from-to)	851–883
Number of pages	33
Journal	Journal of Heuristics
Volume	26
Issue number	6
Early online date	5 Aug 2020
DOIs	https://doi.org/10.1007/s10732-020-09454-w
Publication status	Published - 1 Dec 2020

Bibliographical note

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Funding: F. Campelo worked under grants from Brazilian agencies FAPEMIG (APQ-01099-16) and CNPq (404988/2016-4). E. F. Wanner has been funded by The Leverhulme Trust through Research Fellowship RF-2018-527/9.

Keywords

Experimental comparison of algorithms
Iterative sampling
Multiple hypotheses testing
Sample size estimation
Statistical methods

Access to Document

10.1007/s10732-020-09454-wLicence: CC BY 3.0

Campelo-Wanner2020_Article_SampleSizeCalculationsForTheEx
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Final published version, 14.1 MBLicence: CC BY 3.0

Cite this

@article{360504ed19e443648a06c144c76e4f0e,

title = "Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances",

abstract = "This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by allowing researchers to design experiments based on the desired best, worst, mean or median-case statistical power to detect differences between algorithms larger than a certain threshold. Holm{\textquoteright}s step-down procedure is used to maintain the overall significance level controlled at desired levels, without resulting in overly conservative experiments. This paper also presents an approach for sampling each algorithm on each instance, based on optimal sample size ratios that minimise the total required number of runs subject to a desired accuracy in the estimation of paired differences. A case study investigating the effect of 21 variants of a custom-tailored Simulated Annealing for a class of scheduling problems is used to illustrate the application of the proposed methods for sample size calculations in the experimental comparison of algorithms.",

keywords = "Experimental comparison of algorithms, Iterative sampling, Multiple hypotheses testing, Sample size estimation, Statistical methods",

author = "Felipe Campelo and Elizabeth Wanner",

note = "This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article{\textquoteright}s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article{\textquoteright}s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Funding: F. Campelo worked under grants from Brazilian agencies FAPEMIG (APQ-01099-16) and CNPq (404988/2016-4). E. F. Wanner has been funded by The Leverhulme Trust through Research Fellowship RF-2018-527/9.",

year = "2020",

month = dec,

day = "1",

doi = "10.1007/s10732-020-09454-w",

language = "English",

volume = "26",

pages = "851–883",

journal = "Journal of Heuristics",

issn = "1381-1231",

publisher = "Springer",

number = "6",

}

TY - JOUR

T1 - Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

AU - Campelo, Felipe

AU - Wanner, Elizabeth

N1 - This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Funding: F. Campelo worked under grants from Brazilian agencies FAPEMIG (APQ-01099-16) and CNPq (404988/2016-4). E. F. Wanner has been funded by The Leverhulme Trust through Research Fellowship RF-2018-527/9.

PY - 2020/12/1

Y1 - 2020/12/1

N2 - This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by allowing researchers to design experiments based on the desired best, worst, mean or median-case statistical power to detect differences between algorithms larger than a certain threshold. Holm’s step-down procedure is used to maintain the overall significance level controlled at desired levels, without resulting in overly conservative experiments. This paper also presents an approach for sampling each algorithm on each instance, based on optimal sample size ratios that minimise the total required number of runs subject to a desired accuracy in the estimation of paired differences. A case study investigating the effect of 21 variants of a custom-tailored Simulated Annealing for a class of scheduling problems is used to illustrate the application of the proposed methods for sample size calculations in the experimental comparison of algorithms.

AB - This work presents a statistically principled method for estimating the required number of instances in the experimental comparison of multiple algorithms on a given problem class of interest. This approach generalises earlier results by allowing researchers to design experiments based on the desired best, worst, mean or median-case statistical power to detect differences between algorithms larger than a certain threshold. Holm’s step-down procedure is used to maintain the overall significance level controlled at desired levels, without resulting in overly conservative experiments. This paper also presents an approach for sampling each algorithm on each instance, based on optimal sample size ratios that minimise the total required number of runs subject to a desired accuracy in the estimation of paired differences. A case study investigating the effect of 21 variants of a custom-tailored Simulated Annealing for a class of scheduling problems is used to illustrate the application of the proposed methods for sample size calculations in the experimental comparison of algorithms.

KW - Experimental comparison of algorithms

KW - Iterative sampling

KW - Multiple hypotheses testing

KW - Sample size estimation

KW - Statistical methods

UR - https://link.springer.com/article/10.1007%2Fs10732-020-09454-w

UR - http://www.scopus.com/inward/record.url?scp=85089031498&partnerID=8YFLogxK

U2 - 10.1007/s10732-020-09454-w

DO - 10.1007/s10732-020-09454-w

M3 - Article

SN - 1381-1231

VL - 26

SP - 851

EP - 883

JO - Journal of Heuristics

JF - Journal of Heuristics

IS - 6

ER -

Sample size calculations for the experimental comparison of multiple algorithms on multiple problem instances

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this