A while ago, I was doing the pathway analysis of a data set coming from a fat remodeling experiment in mice. This experiment involved a drug candidate known to transform the “regular fat” or white adipose tissue into “brown fat” or brown adipose tissue. The kick here is that while the white fat just stores energy and the body has to carry it around with all sorts of negative effects, the brown fat actually burns energy and could potentially help the body use up the excess energy from a calorie-rich diet. Imagine what transforming the white fat into brown fat would do for all the people suffering from diabetes and obesity!
But I digress. Let me get back to the point. I did a pathway analysis with the enrichment approach currently used by Ingenuity Pathway Analysis (#IPA), #DAVID, etc. While looking at these results, I was astonished to see what pathways were found to be significant by the analysis. The top three most significant pathways were: Parkinson’s disease, Alzheimer’s disease, and Hungtington’s disease and they all had p-values in the range of 0.000002-0.00003 after the correction for multiple comparisons. The image below shows the top 20 pathways. The ones in read are pathways that have no connection with the phenomenon under study. The ones in green are pathways involved with the given phenomenon, and the ones in white may or may not be involved.
This did not make any sense whatsoever! All three top pathways describe neurological diseases involving the brain. The fourth one is an infection disease caused by a parasite. However, the data was coming from an experiment involving a drug acting on fat cells. Overall, the top 8 significant pathways included 6 other pathways that did not have any connection with the fat tissue or fat remodeling, and 6 out of the 10 pathways significant at 1% were false positives. My first reaction was to double check that I was analyzing the correct input file. I was! What the heck was going on here?
If you have analyzed enough data, you have probably encountered situations like that. This is not the case of a failed experiment or a gross error in uploading the data. This is a prime example of a situation in which the results of the analysis are so dramatically impacted by pathway cross-talk that one can start losing faith in this type of analysis. In this particular case, we happened to know what the drug was doing and we only wanted to understand how it happened. However, if we had not known what the phenotype was, we could have easily been led in the wrong direction of thinking that the phenomena we are looking at were related to neurological diseases. Instead of helping us find the needle in the haystack, we would have had a second haystack dumped in front of us. Not good!
All statistical approaches currently available for this purpose calculate a p-value that aims to quantify the significance of the involvement of a given pathway in the condition under study. These p-values were previously thought to be independent. In our paper in Genome Research, we show that this is not the case, and that many pathways can considerably affect each other’s p-values through a phenomenon we refer to as crosstalk. Although it is biologically intuitive that various pathways would influence each other, especially if they share any genes, the presence and extent of this phenomenon had not been rigorously studied and, most importantly, there was no currently available technique able to quantify the amount of such crosstalk between different pathways. In this paper, we showed that all currently available pathway analysis methods are affected by such phenomena, and we present a method able to correct for such effects.
As shown above, in some cases pathways with significant p-values are not biologically meaningful. Furthermore, some biologically meaningful pathways with non-significant p-values become statistically significant when the crosstalk effects of other pathways are removed. The approach we developed is able to calculate the enrichment significance of a pathway after the crosstalk effects of other pathways are removed. We also showed that this type of analysis can identify novel sub-modules that have a biological function independent of that of the pathways they are currently included in. Here are the top pathways in the same experiment, after the crosstalk effects have been identified and removed:
Essentially, this approach was able to eliminate most false positives, as well as correctly identify as significant pathways that had been biologically proven to be involved in the given condition, yet not found to be significant by the classical analysis. The approach also found several independent functional modules including a mitochondrial activity module active in different stages of fat remodeling. And in case you wonder, this is exactly why those neurological disease pathways were involved– because they contain a lot of mitochondrial genes.
We recently heard that our patent application to protect the methods we developed for detecting and eliminating pathway crosstalk was successful and we were granted a patent (US Patent 10,248,757). Once again, Advaita’s scientists are pushing the envelope by developing cutting-edge analysis methods for the most important problems in this area of research.
As a disclaimer, this approach is not yet implemented in our flagship pathway analysis platform, iPathwayGuide. However, unlike Ingenuity’s IPA pathway analysis tool and free tools such as DAVID, WebGestalt, etc., iPathwayGuide uses an analysis approach that goes much beyond simple enrichment analysis and is therefore less susceptible to produce as many false positives due to crosstalk.