This paper presents a statistical based comparison methodology for performing evolutionary algorithm comparison under multiple merit criteria. The analysis of each criterion is based on the progressive construction of a ranking of the algorithms under analysis, with the determination of significance levels for each ranking step. The multicriteria analysis is based on the aggregation of the different criteria rankings via a non-dominance analysis which indicates the algorithms which constitute the efficient set. In order to avoid correlation effects, a principal component analysis pre-processing is performed. Bootstrapping techniques allow the evaluation of merit criteria data with arbitrary probability distribution functions. The algorithm ranking in each criterion is built progressively, using either ANOVA or first order stochastic dominance. The resulting ranking is checked using a permutation test which detects possible inconsistencies in the ranking-leading to the execution of more algorithm runs which refine the ranking confidence. As a by-product, the permutation test also delivers -values for the ordering between each two algorithms which have adjacent rank positions. A comparison of the proposed method with other methodologies has been performed using reference probability distribution functions (PDFs). The proposed methodology has always reached the correct ranking with less samples and, in the case of non-Gaussian PDFs, the proposed methodology has worked well, while the other methods have not been able even to detect some PDF differences. The application of the proposed method is illustrated in benchmark problems.