Genome Biology recently published a peer-reviewed paper containing a comprehensive benchmarking and comparison of 13 widely-used pathway analysis methods across more than 1,000 analyses. We highlight the most important findings below.
Many high-throughput experiments compare two phenotypes such as disease vs. healthy, with the goal of understanding the underlying biological phenomena characterizing the given phenotype. Because of the importance of this type of analysis, more than 70 pathway analysis methods have been proposed so far. These can be categorized into two main categories: non-topology-based (non-TB) and topology-based (TB). Non-topology based methods, consider the pathways as simple sets of genes. Enrichment analysis is a very popular approach, used by DAVID and Ingenuity Pathway Analysis (IPA), among others. Functional class scoring is an alternative non-topology based approach used by Gene Set Enrichment Analysis (GSEA). In contrast, topology-based methods take into consideration the position of the genes, as well as the signals that various genes exchange during the normal functioning of the organism. Intuitively, one would expect that the topology-based methods should be more powerful and provide more accuracy, since they are using much more information than what non-topology-based methods. However, is this actually true?
In a different posting, I discussed some of the difficulties related to the assessment of pathway analysis methods. However, this only a theoretical discussion of how one could compare and assess various methods. Until now, nobody has ever provided a comprehensive, large-scale assessment of such methods. Anecdotal evidence, ie. results obtained on a couple of data sets were shown here and in the literature. However, good science cannot be based on results obtained on a couple of data sets. At the end of the day, every single paper presenting a new method will include one or two data sets on which the proposed method is shown to be better than others. So, in spite of the difficulty, we embarked on the uncertain journey aiming to benchmark and compare the main pathway analysis methods currently available.
In a peer-reviewed paper recently published in Genome Biology, we report the results of comparing the actual performance of 13 widely used pathway analysis methods in over 1,085 analyses. These comparisons were performed using 2,601 samples from 75 human disease data sets and 121 samples from 11 knockout mouse data sets. In this assessment, we used both the target pathway method, as well as the knock-out approach (both described here). To choose the best pathway analysis method, one should consider the following four crucial factors in order of importance: (i) number of biased pathways under the null; (ii) ranking of the target pathways; (iii) AUC, accuracy, sensitivity, and specificity; and finally (iv) p values of the target pathways. A full discussion of these four factors and how each method performed according to each factor can be found in the full Genome Biology paper. Here, we will only summarize the main results.
If you are interested in the bottom line, the findings can be summarized as follows:
- The result shows that no method is perfect (no surprise here). No method was able to identify the correct pathway 100% of the time.
- Topology methods do perform better than non-topology methods. No matter how intuitive this was, it had to be proven. Now you have it, demonstrated on over 1,000 analyses.
- If you wanted to look at a single figure from the paper, look at the one below. The y-axis shows the area under the curve (AUC), capturing both the sensitivity, as well as the specificity with which each method was able to identify the pathways known to contain the cause of the given phenotype. These are box plots representing the results obtained over all data sets.
The accuracy, as measured by the area-under-the-curve (AUC), of the receiver operating characteristic (ROC) of several common pathway analysis methods. The accuracy is shown as a box plot with the horizontal line representing the median AUC over all data sets. Image modified from Genome Biology 20, no.1 (2019): 1-15.
- The approach that provided the best AUC across our 11 KO data sets was the impact analysis, the approach also used in iPathwayGuide [73, 78].
- The cautionary word from this paper: “The results show that Fisher’s exact test (FE) is not very suitable in the context of pathway analysis. Figure 6 shows that FE test performs the worst among the 13 compared pathway analysis methods: 137 out of 150 pathways are biased toward 0, that being very likely to often produce false positives. This should be a strong cautionary note to the users of other platforms using this test, such as Ingenuity Pathway Analysis  or DAVID . One of the main reasons for the poor performance of the FE test is that it assumes that the genes are independent, while the genes on any pathway influence each other as described by the pathway. Another reason is that the FE test ignores the roles of genes situated in key positions (e.g., a single entry point in a pathway), as well as the number, direction, and type of various signals through which genes on the pathway interact with each other.