Over the past few years, multi-label classification has been widely explored in the machine learning community. This resulted in a number of multi-label classification methods requiring benchmarking to determine their strengths and weakness. For these reasons, typically, the authors compare the methods using a set of benchmark problems (datasets) with regard to different performance measures. At the end, the results are discussed for each performance measure separately. In order to give a general conclusion in which the contribution of each performance measure will be included, we propose a performance measures fusion approach based on multi criteria decision analysis. The approach provides rankings of the compared methods for each benchmark problem separately. These rankings can then be aggregated to discover sets of correlated measures as well as sets of evaluation measures that are least correlated. The performance and the robustness of the proposed methodology is investigated and illustrated on the results from a comprehensive experimental study including 12 multi-label classification according to 16 performance measures on a set of 11 benchmark problems.