In a previous post, I wrote about the importance of storytelling in science. This is the other side of the story, the darker side of storytelling, where you decide the story in advance rather than letting your data guide it.

A few years ago, I travelled to DC for an NIH study section. As it sometimes happens, the day I flew into DC, I found myself with a free evening and I decided to go see a play. Since I only had one evening, I didn’t have the luxury of choosing. I just bought a ticket to whatever play was on that evening, which was a completely unknown play by a completely unknown (to me) play writer. It turned out, it was one of my most interesting theatrical experiences of my life.

The play was a murder mystery in three acts.  The first act introduces the characters and one of them gets killed. By the end of the second act, the detective work is in full swing and it turns out that all main characters would have had some reason to kill the victim. Nothing unusual so far. But then, in the break before the second and third act, somebody acting as the host polls the audience asking who do we think did it. And then, to my astonishment, the third act unfolds and shows us how the killer was exactly the person who was voted by the audience to have been the most likely killer.

broken mirror

It turned out that the play writer wrote several versions of the third act, one for each possible murderer, and the cast only needed to know which one was chosen by the audience in order to play the chosen version of the final act.

It was a fascinating and fun experience as a play-goer. Yet, whatever you do, make sure you do not end up doing this in your research. And it can be possible to end up in just this situation if you are not careful. It turns out that some bioinformatics platforms report tens or hundreds of “significant” findings for instance, biological processes. In those cases, the user has to scan through endless tables of biological processes and end up focusing on the few that they think make sense. This is not good science.

If the bioinformatics analysis is done right, it—not you—should identify the biological processes that are truly related to the underlying biology. If you end up selecting the ones that “make sense” out of a longer list, you are not discovering new phenomena, you are just confirming what you know, or want to hear.

Don’t let your intuition or your preconceived notions drive the results you choose. Don’t look into the mirror to decide what results are interesting. Let the data guide you . and use algorithms that take you straight to the culprit.


Want to learn more about how Advaita software can help you tell the story that’s in your data? Get in touch.