Product Support Topics

Click on a Category to see the associated topics.
- iPathwyGuide Tutorials
- iPathwayGuide Release Notes
- iPathwayGuide Science
- iPathwayGuide Webinars
- iPathwayGuide Videos
- iPathwayGuide FAQs
- Prepare a list of genes with a header “Symbol”, in one single column. Please use HGNC/MGI symbols only.
- You can use in-built reference background gene set, or a custom background gene set for enrichment analysis
- If you opt for enrichment with custom background gene set, prepare of list of background genes, with a header “Symbol”, in one single column. Please use HGNC/MGI symbols only.
- In user dashboard, click on “Analyze a new experiment”
- Intake form loads and in intake form, select “Only my genes of interest/DE genes” under “What data would you like to analyze” drop down menu
- Select appropriate organism and Data type
- Upload the input file by clicking on “Choose file” and select file prepared in step 1
- Uploaded data is displayed in file content preview.
- Under “Gene symbol” column, select the column with genes (gene symbols) with header “Symbol”
- Click on “select the reference type” drop down menu
- If you want to use custom background, follow steps 13 to 16. If you want to use an in-built reference gene set, follow step 12.
- To use the in-built reference gene set, click on “all genes/proteins in the transcriptome/proteome” available under “select the reference type” drop down menu and skip steps 13-16.
- To use custom background, click on “a custom set of genes” available under “select the reference type” drop down menu.
- Click on “Choose file” available under “Upload reference file (Mandatory)” to upload custom background gene set. Upload file prepared in step 3.
- Uploaded data is displayed in file content preview.
- Under “Gene symbol” column, select the column with genes with header “Symbol”
- Click on upload and proceed with analysis.
The report is deprecated” message is there to inform you that the report is not based on the latest knowledge available. If you choose to update the report, the analysis is done again, from scratch, with the latest version of all annotations. If you do that, your new, re-analyzed results will reflect the most up-to-date knowledge. We recommend that you update your analysis results for all the data sets that you care about or work on. If you choose not to update, you can have access to the deprecated report by clicking on the little “i” symbol (for information) on the very right of the report. That will open a window showing you all previous versions of your analysis results. You can access any of them by clicking on the date.
You can find a more detailed explanation with screen captures in our FAQ: https://advaitabio.com/product-support/ under “How do I find a deprecated analysis?”
Of course, you are allowed to publish any and all figures from our software. That’s the whole point. The software is meant to create figures and give you results that can be directly published so you can save time and be more productive.
We would kindly ask that you either maintain the “(c) Advaita Corporation 2022” in the figure, or mention in the caption something like “Figure obtained with iPathwayGuide (AdvaitaBio)”. Just mentioning the software in the Methods would not be sufficient since it would not be clear what figures where produced with it.
I would like to take this opportunity to remind you a couple of things:
- we are here to help if you need any help with the Methods section or even if you have any questions from the reviewers about anything to do with our analysis.
- many figures can be customized in terms of content, order, etc. Some figures can be customized for colors, as well. Please let us know if you want us to show you how to do this.
- once the paper is published, if you let us know, we may be able to highlight your research in a short piece that we can send to our mailing list. That will greatly increase the visibility of your research.
Regarding the figures, you need to download them in .SVG format from iPathwayGuide. Any downloadable figure in our software can be downloaded as SVG. The SVG is a scalable vector format that can scale to any size and offers perfect resolution. Since we are talking about images, please keep in mind that you can customize most figures. In some, you can even select the colors. In many, you can select which genes to include and in which order. Please contact us if you have any questions about these capabilities or if you want to have a short meeting to show you how to do any such customization.
The basics are always the same regarding of the assay. The first step is to map the results of your assays onto genes. The assay can be very different but in most cases you can map the results on genes. For instance, for a methylation assay, you could associate each differentially methylated region to the gene immediately downstream of it. There will be some that won’t be mapped, but most areas that have an effect on transcriptional control are expected to be mapped correctly. Then you calculate an effect size. that can be a log fold change of differential expression or differential protein abundance or methylation, etc. It is often possible to calculate a p-value, as well. Once you have genes, effect sizes, and p-values, you can upload this in iPathwayGuide. You should specify the type of assay the data came from so you can do a meta-analysis later integrating multiple types of data, if those additional data become available.
iPathwayGuide supports directly human, mouse and rat. For other organisms, you need to first map your genes to orthologs. Many of our users have done this very successfully.
Here is an example in which iPathwayGuide was used to make important discoveries involving mackerel and sardines:
When you complete and publish your research, please share it with us. We will be happy to make your research more visible by sharing it with our users.
The answer here is very similar to the one above. When you change the organism, the annotations will be different. This would be a totally different experiment basically. This is an additional reason for which this has to be a different analysis.
When you change the p-value, the number of differentially expressed genes will change. That in turn, will change every single results in every single type of analysis performed: GO terms, pathway, upstream regulator, drugs, mechanisms, etc. The results will be an entirely different from the results before. Hence, this will be a new analysis. A good thing to do is to analyze your data with a couple of different thresholds and then do a meta-analysis of the results. this way, you will be able to tell whether your conclusions change depending on the threshold used. Obviously, you would place more trust in findings that remain the same regarding of the threshold used.
On the Pathways tab, on the right hand side, “pathway details” then “Gene table” then the “+” sign. When you click on that, you will see a table with the genes on the pathway. there you can sort, search for specific genes, etc. You can also download the entire table with the usual download icon.
On your dashboard, you can click on the little shopping cart and then click on “purchase with balance.” The report will become available right away.
When one shares a report with somebody else, the recipient gets a link and they will be able to access the report and do everything that the owner can do. The owner has the opportunity to decide whether the recipient is allowed to share the report further. Once the recipient clicks on the link they will be sent to the application to accept the share; they will then go into their account and click the accept share button. Once the link has been used it cannot be reused. So if a recipient of a share forwards the email to somebody else, the second person will not be able to see the report. This is by design so data cannot be shared without the permission of the owner. Also, if one clicks on the same link more than once, the second time, the link will not work.
On January 31, 2020, Advaita released a major update to its platform, with two brand new capabilities and three major improvements, now available in iPathwayGuide. As with all iPathwayGuide releases, many of the improvements are available in older analyses. Other features, including the two new capabilities, are only available in analyses generated after the update was released. To access these new capabilities for older analyses, please update the analysis using the button found in report information, “Re-run as new analysis.”
NEW CAPABILITIES: Available in newly created or updated analyses
New Module: Upstream Chemicals, Drugs, and Toxicants
Predicted Upstream Chemical Analysis allows you to predict chemicals, drugs, and toxicants that might be present (overly-abundant) or absent (insufficient) in your experiment. This analysis compares chemical-to-gene regulatory interactions with patterns of downstream gene expression to find chemicals with large numbers of consistent downstream DE genes. To support this analysis, the Advaita knowledgebase was expanded (to v1910) to include 170,997 chemicals and 774,553 chemical-to-gene regulatory interactions. This new analysis capability is available from the Printable Report, Meta-Analysis, and Network Analysis, and navigation bar, where it can be found under the heading Upstream Regulators, along with Upstream Genes and miRNAs.
New Visualization: Dendrograms
iPathwayGuide now offers dendrograms, as an additional way to visualize relationships across results. The dendrogram visualization groups together significant results (annotations) that have DE genes in common. This visualization is available for all GO terms, pathways, upstream regulatory genes, and chemicals. It is also available in Step 1 of the updated Network Analysis module, where it can be used to select the set of genes of interest based on annotations that have DE genes in common.
MAJOR IMPROVEMENTS: Available in existing analyses
New Analysis Workflow: Select genes of interest in Network Analysis
The Network Analysis page has been redesigned to allow users select genes of interest from the results of all analyses modules (biological processes, pathways, diseases, miRNAs, upstream regulators and more). Other improvements to the Network Analysis module include:
- the ability to add Chemicals to networks to visualize interactions with downstream genes
- the ability to save & reload networks
- the ability to export network images as .png and .svg
- and two new network layouts: Gatekeepers and Regulators
- new onboarding tutorial and videos
Note: these capabilities are available in analyses generated or updated after October 2018.
New Analysis Workflow: The Intake Page now displays file parsing warnings. This allows you to preview which lines from your input file were ignored, and for what reason.
New Feature: Results from any module may be searched by associated genes, e.g. find significant pathways containing BRCA1
On June 21, 2019, Advaita released a major update to its platform, with improvements to iPathwayGuide, iVariantGuide, and iBioGuide.
IMPROVEMENTS
The Advaita Knowledgebase was updated to version 1906 and now includes:
- 3 organisms: homo sapiens, mus musculus, rattus norvegicus
- 216,544 Genes
- 2,307 Diseases
- 45,049 GO terms
- 5,694 Drugs
- 985 Pathways
- 5,396 miRNAs
- 92,745 Proteins
- 29,761,157 References
- 3,042,479 Interactions
- 477,289 Experiments
For a complete list of databases and versions, please see report information within each application.
NEW FEATURES
- iBioGuide: Users can now log in to iBioGuide using their Advaita credentials, the same account they use for iPathwayGuide and iVariantGuide.
- iPathwayGuide: Experiment and references for Network Analysis are now shown in a paged view, improving load time.
BUG FIXES
- iPathwayGuide: Meta-Analysis icon was updated to stay consistent with menu selections.
On March 16, 2019, Advaita released a major update to its platform, with several major improvements to iPathwayGuide. As with all major releases, new features are available in all analyses generated after the update was released. To see new analysis modules with older analyses, please update the analysis using the button found in report information, “Re-run as new analysis.”
NEW FEATURES
– NEW MODULE: Predicted Upstream Regulator Analysis allows you to find genes that have regulatory interactions consistent with expression patterns in DE genes. Several parts of the application were updated to include the new analysis module, including: Printable Report, Meta-Analysis, API, toolbar, and more.
MAJOR IMPROVEMENTS
– DE down-regulated genes are now shown as blue in the volcano plot, corresponding with the coloring used in gene bar plots, on pathways, and in networks.
– Genes data export now includes option to export all genes as TSV file— convenient for uploading to iPathwayGuide as a new analysis.
– Pathway data export now includes p-values for pORA and pAcc in addition to the combined p-value, pComb.
– Updated text of the printer-friendly report that is auto-generated with every analysis.
– Improvements to registration flow and application selection page
– Updated layout for the table of annotation sources inside the report info section. The new layout accommodates sources for Network Analysis and Predicted Upstream Regulator Analysis.
BUG FIXES
– Registration page layout is fixed for the newest version of Chrome browsers
– Tooltips are now more responsive and more visible in iPathwayGuide
– Fixed a bug in network analysis causing long GO terms to spill out of their boxes
– Fixed a bug affecting the navigation bar display on smaller screens
Here is a step-by-step guide to find your old analysis:
- Click on the little “i” icon to the right of your file. That “i” stands for information:
2. In the pop-up window, select the previous version you wish to open and click on the date. That will open that particular set of analysis results:
iPathwayGuide’s powerful meta-analysis tool allows you to compare and contrast upto 5 differential experiments at the same time. With meta-analysis, you can rapidly identify several characteristics of your phenotype comparisons and drill down to pinpoint plausible biomarkers and signatures. Watch the video below to learn the nuts and bolts of iPathwayGuide’s meta-analysis. Be sure to watch some of our webinars on the topic as well.
Paper
# of Citations
Identifying significantly impacted pathways: a comprehensive review and assessment.
Genome Biology, 20:203, October, 2019
Ontological analysis of gene expression data: current tools, limitations, and open problems.
Bioinformatics 21 (18), 3587-3595
A systems biology approach for pathway level analysis.
Genome Research, 2007, Vol. 17 (10), pages 1537-1545.
A novel signaling pathway impact analysis (SPIA).
Bioinformatics (2009), Vol. 25 (1), pages 75-82.
Global functional profiling of gene expression.
Genomics 81 (2), 98-104
Reliability and reproducibility issues in DNA microarray measurements.
TRENDS in Genetics 22 (2), 101-109
Data analysis tools for DNA microarrays.
(Book) CRC Press
Profiling gene expression using onto-express.
Genomics 79 (2), 266-270
Use and misuse of the gene ontology annotations.
Nature Reviews Genetics 9 (7), 509-515
Onto-tools, the toolkit of the modern biologist: onto-express, onto-compare, onto-design and onto-translate.
Nucleic acids research 31 (13), 3775-3781
Onto-Tools: New Additions and Improvements in 2006.
Nucleic Acids Research, Vol. 35, pages W206-W211, July 2007.
Statistics and data analysis for microarrays using R and bioconductor.
(Book) CRC Press
Analysis and correction of crosstalk effects in pathway analysis.
Genome Research, 2013, Vol. 23 (9).
A system biology approach for the steady-state analysis of gene signaling networks.
In Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications (CIARP’07).
Most existing pathway analysis methods focus on either the number of differentially expressed genes observed in a given pathway (enrichment analysis methods), or on the correlation between the pathway genes and the class of the samples (functional class scoring methods). Both approaches treat pathways as simple sets of genes, disregarding the complex gene interactions that these pathways are built to describe.
More recently, biological annotations have started to include descriptions of gene interactions in the form of gene signaling networks, such as KEGG (Ogata et al., 1999), BioCarta (www.biocarta.com) and Reactome (Joshi-Tope et al., 2005). This richer type of annotations have opened the possibility of an automatic analysis aimed to identify the gene signaling networks that are relevant in a given condition, and perhaps even the specific signals or signal perturbations involved. This approach is not well suited for a systems biology approach that aims to account for system-level dependencies and interactions, as well as identify perturbations and modifications at the pathway or organism level (Stelling, 2004).
Advaita’s products are based on Impact Analysis method that leverages the information about type, function, position and interaction between genes in a given pathway. Impact Analysis combines the evidence obtained from the classical enrichment analysis with a novel type of evidence, which measures the actual perturbation on a given pathway under a given condition. We illustrate the capabilities of the novel method on four real datasets. The results obtained on these data show that Impact Analysis has better specificity and more sensitivity than several widely used pathway analysis methods.
On January 12, 2018, Advaita released a major update to its platform, with improvements to iPathwayGuide, iVariantGuide, and iBioguide.
IMPROVEMENTS
The Advaita Knowledgebase was updated to version 1711 and now includes:
- 3 organisms: homo sapiens, mus musculus, rattus norvegicus
- 213,390 Genes
- 1,933 Diseases
- 44,976 GO terms
- 4,791 Drugs
- 955 Pathways
- 5,710 miRNAs
- 3,161,730 References
- For a complete list of databases and versions, please see report information within each application.
NEW FEATURES
- iVariantGuide: API Client now accepts multi-sample analyses
- Improvements to account registration page to ensure proper organization affiliation.
BUG FIXES
- iPathwayGuide: Improvements to parsing of CuffDiff-formatted files to maintain association of phenotype labels. Fold changes and p-value parsing remains untouched.
3/13/2017
With Advaita’s latest update to its applications and knowledge base, Advaita updated its API for iVariantGuide and iPathwayGuide.
An API or (Application Program Interface) is a set of routines, protocols, and tools for building software applications. An API specifies how software components should interact. Advaita’s API is designed.
On February 27, 2017, Advaita released a major updates to its platform. These are the release notes.
IMPROVEMENTS TO: iPathwayGuide, iVariantGuide, iBioguide, and the Advaita Knowledge Base
- Changes to AWS services in preparation for HIPAA compliance
- Updated knowledge base to version Advaita KB v1702, which includes the following data sources and versions:
Database | Version | iPG Annotations | iVG Annotations |
---|---|---|---|
KEGG | Release 81.0+/01-20, Jan 17 | Pathways, Diseases, Drugs | Pathways |
Gene Ontology | 2016-Sep26 | GO Terms | GO Terms |
Targetscan | Targetscan v7.1 | miRNA Target Genes | miRNA Target Genes |
MIRBASE | MIRBASE v21,06/14 | miRNA Sequences | |
dbSNP (incl 1k genomes) | Build 149 | Minor Allele freq. | |
RefSeq | Release 71 July 2016 | Impacted Transcripts | |
ClinVar | Dec 1, 2016 | Clinical Significance | |
SNPEff | v4.1L | Predicted Impact |
IMPROVEMENTS TO iPATHWAYGUIDE
- NEW FEATURE! Onboarding carousel with top user benefits
- NEW FEATURE! API (Premium feature)
- Bug fix: genes selected in Genes Table on Pathways page are now highlighted on pathway map
IMPROVEMENTS TO iVARIANTGUIDE
- Improved error messaging for sample upload & report creation
- NEW FEATURE! Versioning: each report now shows which version of the Advaita Knowledgebase was used to annotate the sample. Outdated reports may be updated when viewing Report Info: either on the Reports page or from within the report itself. As is true for other Advaita applications, only the report owner may update it.
IMPROVEMENTS TO iBIOGUIDE
- Updated to use AKB v1702
The following components were added or addressed in this release.
- Extensive databases updates including:
- KEGG pathways, drugs, and diseases
- NCBI genes
- TargetScan miRNAs
- Gene Ontologies
- PubMed references
- New EdgeR import format support
- Improvements to several of the exported images
- Improvements to meta analysis to preserve order of comparisons
- Several bug fixes
- Changes to support additional AWS features
- Enhancements to security
The following components were added:
- Support for Sciex SWATH 2.0 Proteomics Expression data
- “Trash” bin on user dashboard
- Pathway and ontology images are now locked for scrolling. They can be unlocked in the on-screen menu.
- In the pathway images, individual genes can now be selected if you hover over the node in the image.
- Coherent cascades now have arrow heads so you can see directionality of the cascade.
- The gene table in the pathway detail page is more refined. Easier to filter.
- Bar chart of DE genes for pathways is now presented below the pathway image.
- Meta-analysis has a new view called “Rank layout” that lets you see how genes, GO terms, pathways, etc, rank compared to each other. (accessible from the
- lower right corner of the Venn).
- Several other improvements on the back-end and a few bug fixes.
Knowledge Base Updates:
- Genes – 195,222 (increase of 23,106)
- Pathways – 871 (increase of 12)
- micro RNAs – 8,837 (increase of 5,268)
- GO Terms – 39,907 (increase of 1,880)
- Drugs – 4,389 (increase of 229)
- Diseases – 1,411 (increase of 12)
- SNPs – 92,169,423 (increase of 32,120,131)
- References – 3,010,588 (increase of 51,977)
New Features:
- Support for nSolver data from NanoString Technologies
- QC and Normalization metrics for Affymetrix CEL files
- Stem-loop information for miRNAs
- Printable report summary with detailed methods and references
- “Line-up” comparative ranking chart in meta analysis
1. iPathwayGuide expects the following three items in your differential expression input file:
- Gene Symbol
- Log Fold Change
- P-value (adjusted P-value recommended)
2. iPathwayGuide accepts several file formats for RNA-Seq, microarrays, and proteomic profiling. Refer to the full list of accepted data formats on the next page.
3. Submit the entire list of genes, not just the significant genes.
This is important because we need to calculate the background to provide you with a comprehensive analysis of your data without false positives.
4. You will have the opportunity to customize thresholds for the significant genes after you upload.
5. Each dataset takes about 15 minutes to analyze. You will get an automated email as soon as your analysis is complete.
6. Don’t have your data ready? We have sample datasets available for each data format. Grab a sample file and try it… it’s easy!
7. Uploading data is easy. Here are two quick video tutorials on how to upload data.

Step-by-step guide on uploading data
How to customize thresholds and select D.E. genes
Disease Analysis
The differential expression data can yield insights on potential diseases enriched in the sample data. Such conclusions can be drawn by observing the number of differentially expressed genes or proteins in your data. One such computational approach is described below.
iPathwayGuide
iPathwayGuide provides a comprehensive analysis of differential gene/ protein expression data that includes disease analysis.
For each disease, the number of differentially expressed (DE) genes annotated to it is compared to the number of genes expected just by chance. iPathwayGuide uses an over-representation approach to compute statistical significance of observing more than the given number of DE genes. The p-value is computed using the hypergeometric distribution that can be corrected using False Discovery Rate or Bonferroni method.
Register for iPathwayGuide today and try this feature for free.
Understanding Gene Ontology Analysis

What is Gene Ontology (GO)?
The confusion about gene ontology and gene ontology analysis can start right from the term itself. There are actually two different entities that are commonly referred to as gene ontology or “GO”:
- the ontology itself, which is a set of terms with their precise definitions and defined relationships between them, and
- the associations between gene products and GO terms, which are used to capture the existing knowledge about what each gene is known to do.
But the term gene ontology, or GO, is commonly used to refer to both, which is sometimes a source of potential confusion. In order to avoid this, here we will use the term “GO ontology” to describe the set of terms and their hierarchical structure and “GO annotations” to describe the set of associations between genes and GO terms.
There are 3 types of terms, or domains if you wish, in the gene ontology:
- Biological Processes (BP)
- Molecular Functions (MF)
- Cellular Components (CC)
GO structure and data representation
In general, an ontology such as the gene ontology consists of a number of explicitly defined terms that are names for biological objects or events. These terms are depicted as nodes (also called vertices) in a graph that describe the relationships between the nodes. For instance, “cytoplasm” is a node, which is linked by an edge to its parent “intracellular part“. The type of this edge is “is a” and this structure simply means that cytoplasm is an intracellular part.
But there is more to it. That graph formed with these nodes and edges is not just any kind of graph. It is a so called “directed acyclic graph” or DAG. There are several important features of DAGs. Firstly, the edges are directed i.e. there is a source and a destination for each edge. In the gene ontology, the source is referred to as the parent term and the destination is referred to as the child term. This tells us that the cytoplasm is an intracellular part rather than an intracellular part is a cytoplasm.
Secondly, unlike a general graph, a DAG does not have cycles, which is to say that one cannot complete a loop by following the directed edges. Among other things, this restriction means that two terms can not be both parents and children of each other (otherwise they would form a loop between themselves), and that there must be at least one node that has no children, ie. a root. A DAG is similar to a tree with the difference that in a tree each node can have only one parent, while in a DAG a node can have multiple parents. In the figure below, A shows a tree (each node has only one parent), B shows a DAG (node 3 has 2 parents), and C shows a general graphs (nodes 1, 2 and 3 form a loop). So let us remember that the structure of the GO ontology is a DAG like in panel B, below.
Here is an example showing the biological process of “negative regulation of programmed cell death” and its various relationship with all its ancestors.
What is a gene ontology analysis?
Fundamentally, the gene ontology analysis is meant to answer a very simple question:
“Given a list of genes found to be differentially expressed in my phenotype (e.g. disease) vs. control (e.g. healthy), what are the biological processes, cellular components and molecular functions that are implicated in this phenotype?”
In a nutshell, the premise here is that if many of the genes associated with a given biological process are differentially expressed in the given disease, that biological process is implicated in that disease. Essentially, the gene ontology analysis aims to identify those biological processes, cellular locations and molecular functions that are impacted in the condition studied.
But…the question now becomes, how do you decide whether or not a given gene ontology term is important or not? After all, any biological term can end up with some genes that are differentially expressed just but chance or just because those genes are also associated with other biological processes that could be more germane to the condition studied.
Therefore, I will briefly outline the main approaches used to address this problem. Some of these are better, some are worse. In fact, some are completely wrong. Nevertheless, I will review them here so you can understand how the thinking evolved about this problem and be in a position to choose wisely the approach you want to use.
Keep in mind that the following is just a brief outline with no mathematical details. If you really want a thorough discussion, you can lean more about these approaches in Chapter 16 of this book.
The simplest gene ontology analysis: Over-representation analysis (ORA) or enrichment analysis
If the processing of the list of differentially expressed (DE) genes were to be done manually, one would take each accession number corresponding to a DE gene, search various public databases and compile a list with, for instance, the biological processes that the gene is involved in. The same type of analysis could be carried out for other functional categories such as biochemical function, cellular role, etc. This task can be performed repeatedly, for each gene, in order to construct a master list of all biological processes in which at least one gene is involved. Further processing of this list can provide a list of those biological processes that are common between several of the DE genes. It is intuitive to expect that those biological processes that occur more frequently in this list would be more relevant to the condition studied. If 200 genes have been found to be differentially expressed and 160 of them are known to be involved in, let us say, mitosis, it is intuitive to conclude that mitosis is a biological process important in the given condition. Right?
Wrong!!
As we shall see in the following example, this intuitive reasoning is incorrect and a more careful analysis must be done in order to identify the truly relevant biological processes.
Let us consider that we are using a panel containing 2,000 genes to investigate the effect of ingesting a certain substance X. Let’s say that there are 200 differentially expressed genes. Let us focus on the biological processes for instance, and let us assume that the results for the 200 differentially regulated genes are as follows: 160 of the 200 genes are involved in mitosis, 80 in oncogenesis, 60 in the positive control of cell proliferation and 40 in glucose transport.

A common mistake in gene ontology analysis is to use raw counts. Here, mitosis seems to be the most important biological process in this experiment.
If we now look at the functional profile described above, we might conclude that substance X may be related to cancer since mitosis, oncogenesis and cell proliferation would all make sense in that context. However, a reasonable question is: what would happen if all the genes on the panel used were part of the mitotic pathway? Would mitosis continue to be significant? Clearly, the answer is no. Therefore, in order to draw correct conclusions, it is necessary to always compare the actual number of occurrences with the expected number of occurrences for each individual category.
This comparison is shown in the figure below by the line showing the ratio between what was observed vs what was expected (percentage shown on the vertical axis on the right). In this light, the same data tells a completely different story. There are indeed 160 mitotic genes but, in spite of this being the largest number, we actually expected to observe 160 such genes so this is not better than chance alone. The same is true for oncogenesis. The positive control of cell proliferation starts to be interesting because we expected 20 and observed 60. This is 3 times more than expected. However, the most interesting is the glucose transport. We expected to observe only 10 such genes and we observed 40, which is 4 times more than expected. Taking into consideration the expected numbers of genes radically changed the interpretation of the data. In light of these data, we may want to consider the correlation of X with diabetes instead of cancer.

Gene ontology analysis: the need to compare raw counts with expected values. Even though mitosis had the highest number of differentially expressed genes, this was no more than what was expected by chance. In contrast, glucose transport, even though it had the lowest absolute count, had the most significant enrichment at 4x the number expected by chance.
This example illustrates that the simple frequency of occurrence of a particular functional category among the genes found to be regulated can be misleading. In order to draw correct conclusions, one must analyze the observed frequencies in the context of the expected frequencies.
The problem is that an event such as observing 40 genes when we expect 10 can still occur just by chance. This is unlikely, but it can happen. The bottom line is that one needs to assess the significance of these categories based on the probability of the observed values appearing just by chance. This can be done with various statistical models including hypergeometric, Fisher’s exact test, or chi-square. If you are interested in details, I explained the formulae elsewhere (Chapter 23 in this book) but here, the bottom line is that you should never try to draw conclusions from a count graph as above, but rather use software that calculate a p-value for each term and don’t forget to correct for multiple comparisons. Whatever you do, please do not publish graphs showing just raw counts of gene ontology terms. As shown in the example above, they are not only uninformative but they can also be completely misleading. And if I am one of the reviewers of your paper or grant, you will hear strong complaints from me.
A step further in gene ontology analysis: Functional Class Scoring (FCS)
The simplest approach that would provide sound scientific results is the over-representation approach described above. However, more sophisticated methods have been developed over time. An important category of methods includes functional class scoring methods. The best known methods in this category is the Gene Set Enrichment Analysis or GSEA.
As a first step, GSEA ranks the genes based on the association of each gene with the phenotype. This association is established using an arbitrary test, for example a t-test. Once the ranked list of genes L is produced, an enrichment score (ES) is computed for each set in the gene set list. The list L is walked from the top to the bottom, and a statistic is increased every time a gene belonging to the set is encountered, and decreased otherwise. The value of the increment (or decrement) depends on the ranking of the gene. If you imagine a situation in which all genes at the top of the list are associated to a given biological process, the score for that process will increase with every gene. At the end of the list, the enrichment score is the maximum distance from zero encountered during the walk.

An example of the statistics calculated by the gene set enrichment analysis (GSEA).
Image from Aravind Subramanian, Pablo Tamayo, Vamsi K. Mootha, Sayan Mukherjee, Benjamin L. Ebert, Michael A. Gillette, Amanda Paulovich, Scott L. Pomeroy, Todd R. Golub, Eric S. Lander, and Jill P. Mesirov PNAS October 25, 2005 102 (43) 15545-15550; first published September 30, 2005; https://doi.org/10.1073/pnas.0506580102
In this figure, the upper graph shows the enrichment values during the walk through the gene list. The vertical lines represents the genes belonging to the set S at the positions they appear in the ranked list. The lower graph shows the degree to which each gene is correlated with the phenotype.
In principle, higher enrichment scores are yielded when the graph departs considerably from zero. However, the enrichment score by itself cannot be used to assess significance much like the raw counts of genes cannot be used that way. The reason is the same: in principle, any score can appear with a given non-zero probability. We have to focus only on those that appear more often than expected by chance. This is done with a bootstrap approach. In essence, the bootstrap approach assesses the frequency with which something appears just by chance by randomly permuting the labels. There are two possible permutation criteria: permutation of the phenotype samples or permutation of the gene labels. In general, the label permutation method is preferred as it preserves gene-gene correlations. This step of the algorithm produces a null distribution which allows the computation of an empirical p-value. The empirical p-value is calculated as the number of random bootstrap runs which resulted in an enrichment score equal to or larger than the one observed for the correct labels.
Next and last step, the significance levels are adjusted for multiple hypotheses testing. Each Enrichment Score is normalized with the size of the set obtaining a Normalized Enrichment Score (NES), and then the false discovery rate (FDR) of each NES is computed.
More sophisticated gene ontology methods: elim and weight
The approaches described above focus on the problem of accurately interpreting the number of differentially expressed genes associated with a gene ontology term. However, these approaches ignore the structure of the gene ontology and the relationship between various terms. In order to understand more sophisticated GO analysis methods we need to learn a few more things about the gene ontology.
The GO is organized in a hierarchical structure that uses the types of relationship described above: “is a”, “part of” and “regulates”. For instance, “induction of apoptosis by extracellular signals” is a type of “induction of apoptosis” which in turn is a “positive regulation of apoptosis” which in turn is a kind of “regulation of apoptosis”, etc. Generally speaking, traversing the DAG following “is a” relationships as above can be seen as moving across levels of abstraction. The root, BP (or “All”), would correspond to the highest level of abstraction, or the lowest level of details. In contrast, leaf nodes such as “induction of apoptosis by hormones” would correspond to the lowest level of abstraction, with the most details. Similarly, the “cell outer membrane” is part of the “cell envelope”, etc. Traversing “part of” relationships could be interpreted as changing scales. In general, terms closer to the roots (BP, CC, MF) are more general, while the ones closer to the leaves are more specific. In GO Consortium’s terminology, the children are more specialized than the parents. When annotating genes with GO terms, efforts are made to annotate the genes with the highest level of details possible. In a general way, this corresponds to the lowest level of abstraction. For example, if a gene is known to induce apoptosis in response to hormones, it will be annotated with the term “induction of apoptosis by hormones” and not merely with one of the higher level terms such as “induction of apoptosis” or “apoptosis”.

When a gene is annotated with a term, all inferences that can be inferred from the structure of the GO must also hold true. In other words, if the child term describes a gene product, then all its parent terms must also apply to that gene product. Let’s revisit the ontology from above (figure is repeated here for convenience). For instance, if a gene is annotated as having a role in “regulation of programmed cell death” then it will necessarily be involved in “cellular process” because “regulation of programmed cell death” regulates “programmed cell death,” which is a type of “cell death,” which is a type of “cellular process.” This property is known as the “True Path Rule.” But, because of this, if the GO analysis is done independently, for each term, each differentially expressed gene will be counted multiple times, once for every term from the lowest term it is annotated with, all the way up to the root. This has two consequences. First, it injects a great deal of redundancy in the process and second, it tends to report as significant lots of general terms which are really not very informative with respect to the phenotype studies.
Two methods have been proposed to deal with this by Alexa et al (2006). Elim starts at the lowest level of GO, with the most specific terms and calculates their enrichment p-value. If that is not significant, the approach moves up in the hierarchy and calculates the p-value as usual. If “induction of apoptosis by extracellular signals” is not significant, all DE genes associated with it will also be counted for its parent, “induction of apoptosis”, as if the two terms were independent. However, if the more specific term is indeed significant, elim will eliminate all genes associated with it from all its ancestors, thus eliminating the redundancy and giving a chance to the more specific terms to be reported as significant.
The second method proposed by Alexa is called weight. The idea behind this approach is that if many very specific terms are significant, and there is a term slightly more general that would encompass all those terms, it may be useful to identify this more general term.
Pitfalls in gene ontology analysis
Gene ontology analysis is a powerful tool. Like any powerful tool, it is subject to misuse and misunderstanding.
The most common mistake in gene ontology analysis is choosing the incorrect background (or not choosing an explicit background). What is this about?
Well, let’s go back to the enrichment analysis. Let us assume that Mary measured 1,000 genes in a panel, and she found 200 genes to be differentially expressed. That’s a proportion of 20%. Let’s consider a biological process that is associated with 100 genes and let us assume that she has found that 30 of these 100 genes are differentially expressed. In this case, if this process is not related to the phenotype, she would expect to find about 20% of its genes being differentially expressed just by chance, or about 20. We found 27 instead of 20, we can calculate the probability of this happening just by chance as being about 0.076 or 7.6%. This is not meeting the usually accepted significance threshold of 1% or even 5% so there is not enough evidence to indicate that this process is involved in the phenotype and Mary should not spend too much time studying it.
But… most people do this analysis with some software, not manually as above. And sometimes, such software only requires the user to upload the list of differentially expressed genes. This would be the list of 200 genes. Now, Mary is using such software and she does not specify that she only used a panel of 1,000 genes, ie. the statistical background of this analysis, the software might think that she has selected those 200 genes from a genome-wide RNA-Seq experiment. In that case, the numbers would be rather different. Now, the proportion of DE genes appears to be 200/30,000 or 2/300 (or 0.0066), and a biological process with 100 genes is only expected to get 1 gene just by chance, at most (actually 0.666…). The probability of having 25 genes just by chance, or the reported p-value, is essentially zero. With such hugely significant p-values, Mary gets super excited and there is a real danger that she would spend weeks or month trying to understand or validate the involvement of this biological process in this experiment, where in fact, this extremely significant p-value was only due to the incorrect choice of the background set.
The moral of this story is that an enrichment analysis must always be done by specifying explicitly the set of genes measured. The safest bet is to use a software platform that allows you to upload the entire list of genes measured and specify which of those you want to consider as differentially expressed. If that is not an option, you need to be able to either upload separately your reference set, or specify it in another way. Otherwise, your p-value may lead you to perdition…
The second most common mistake in this area is failing to correct for multiple comparisons. The need to do this correction is explained here or in Chapter 16 of this book. Since ontologies have a hierarchical relationships, it is important to apply appropriate correction factors to minimize errors. For instance, Using a False Discovery Rate (FDR) or Family-wise Error Rate correction factor may not be appropriate for GO analysis.
The third most common mistake is failing to detect the cross-talk between various biological phenomena. Many genes are involved in several processes, in addition to the redundancy stemming from the gene ontology hierarchy itself. Sometime, the same group of differentially expressed genes make several processes appear as significant. Unless this overlap and cross-talk is detected and eliminated, a lot of time may be wasted. A detailed mathematical approach to detect and eliminated cross-talk has been proposed here. A more practical and user friendly way to identify cross-talk is also included in iPathwayGuide.
Other, more subtle mistakes related to gene ontology analysis include:
- Misinterpreting annotations such as
- ND = no biological data available. If an ND evidence code surfaces in your analysis, you do not need to waste time on additional literature searches.
- NOT = it can be confusing to interpret the negative annotation NOT. NOT is not a complete list, just a list of where the NOT was a surprise.
- Misinterpreting the direction of arrows in the output of GO – the directed acyclic graph (DAG). Clear labeling is a must here.
A more detailed discussion of these and other pitfalls can be found in this paper:
Rhee, S., Wood, V., Dolinski, K., Draghici, S. (2008). Use and misuse of the gene ontology annotations Nature Reviews Genetics 9(7), 509-515. https://dx.doi.org/10.1038/nrg2363.
And most importantly, keep in mind that the ultimate purpose of gene expression experiments is to produce biological knowledge not numbers.
Sorin Draghici (2012)
Using iPathwayGuide for gene ontology analysis
Performing GO Analysis with iPathwayGuide
1 minute learning in iPathwayGuide: Redundancy in GO analysis
1 min learning – Eliminate false positives and identify most affected biological processes
Page Contents
Get Started!
Get in touch with Advaita to learn how our software will improve quality and efficiency for your Core Facility, Enterprise Bioinformatics team, or Research Lab.
Advaita has teamed up with SCIEX, the leader in Data Independent Acquisition (DIA) Mass Spectrometry for the collection and analysis of proteomics-based data. Through this collaboration, users can now bring their SWATH data from the SCIEX protein expression workflow and analyze it in the context of pathways, gene ontologies, microRNAs, and diseases. The power of iPathwayGuide allows you to combine protein expression experiments and contrast them with other platforms including RNA-Seq, microarrays, and targeted panels. Watch the video below to see how SCIEX SWATH proteomics data was juxtaposed to mRNA data from an RNA-Seq experiment.
iPathwayGuide – Affy CEL file uploading
Affymetrix microarrays are one of the most widely used gene expression platforms in the industry. iPathwayGuide supports the most common platforms. Look at the FAQ for the latest list of supported platforms.
The resulting file from an Affymetrix microarray is commonly known as a CEL file because of the CEL extension place on the file name. To upload your CEL files, simply drag and drop the corresponding files for the condition group and the control group. iPathwayGuide requires at least 3 – unique files for each group. We recommend at least 4 per group in case one of the samples is rejected during QC and normalization.
Once your files are identified, click upload to begin the process. iPathwayGuide will upload your files, QC check them, reject and highlight any that do not pass, normalize the files, and calculate differential expression. Depending on the number of files, this process can take 2 to 5 minutes or more.
Once the QC metrics are available, iPathwayGuide will present the QC stats, QC Density Box Plot, and QC Density Plot. Any samples identified for removal will be highlighted in red.
Once the Normalization is complete, iPathwayGuide will present the Normalized Box Plots and the Normalized Density Plot. If you wish to include any of these graphs in a paper or report, you can download the graphs using the download button.
If you are satisfied, you may proceed to the Contrasts Intake page to set the number of significant differentially expressed genes along with title and description of the report.
Yes. Please download it here.
iPathwayGuide supports analysis of Human, mouse, and rat. It supports the following files formats:
CuffDiff
DESeq
EdgeR
SAS/JMP Genomics
nSolver (NanoString Technologies)
Generic tab delimited .txt file (must contain gene symbol or uniprot ID, log2FC, p-value)
SCIEX SWATH 2.0 proteomics data files
Select Affymetrix CEL files*
*Supported Affy CEL Files may take several minutes to upload
Human
Human Genome U133
Human Genome U133A 2.0
Human Genome U133 Plus 2.0
Human Genome U95
Human Genome U35K
Mouse
Mouse Expression Set 430
Mouse Expression Set 430 2.0
Mouse Genome 430A 2.0
Rat
Rat Expression Set 230
Rat Genome 230 2.0
Rat Genome U34
Please make sure your are using the “…gene_exp.diff” file that comes from CuffLinks. There are some applications that claim to emulate CuffDiff output (e.g. Galaxy). If you are using one of these applications, please make sure the output file has all columns populated. See below for specific columns that must be present. Also, use this link to view the Cuffdiff manual.
Column number | Column name | Example | Description |
---|---|---|---|
1 | Tested id | XLOC_000001 | A unique identifier describing the transcipt, gene, primary transcript, or CDS being tested |
2 | gene | Lypla1 | The gene_name(s) or gene_id(s) being tested |
3 | locus | chr1:4797771-4835363 | Genomic coordinates for easy browsing to the genes or transcripts being tested. |
4 | sample 1 | Liver | Label (or number if no labels provided) of the first sample being tested |
5 | sample 2 | Brain | Label (or number if no labels provided) of the second sample being tested |
6 | Test status | NOTEST | Can be one of OK (test successful), NOTEST (not enough alignments for testing), LOWDATA (too complex or shallowly sequenced), HIDATA (too many fragments in locus), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing. |
7 | FPKMx | 8.01089 | FPKM of the gene in sample x |
8 | FPKMy | 8.551545 | FPKM of the gene in sample y |
9 | log2(FPKMy/FPKMx) | 0.06531 | The (base 2) log of the fold change y/x |
10 | test stat | 0.860902 | The value of the test statistic used to compute significance of the observed change in FPKM |
11 | p | value 0.389292 | The uncorrected p-value of the test statistic |
12 | q | value 0.985216 | The FDR-adjusted p-value of the test statistic |
13 | significant | no | Can be either “yes” or “no”, depending on whether p is greater then the FDR after Benjamini-Hochberg correction for multiple-testing |
Generally, each analysis takes about 15 minutes to complete. If there are other analyses queued ahead of yours, it may take a bit longer. You will receive an email as soon as the analysis is complete.
It is a great idea to use the software and explore the capabilities before making a purchase. The way to do this is to create a free account on our web site www.advaitabio.com. Once you create an account, you will have full access to several analysis results. These datasets cover a range of experiment types, conditions, and analysis outcomes. You will be able to fully use the software to explore these data sets in whichever way you choose. You will have full access to interact with all of the demo analyses and experience all of the features of iPathwayGuide. We also provide sample data files, which will show you some of the formats that can be used to upload data for analysis. An alternative would be to setup a time and let one of us to show you around the software. You will learn more in a much shorter time, and you will also be able to get immediate answers to any questions you may have.
Yes! From the dashboard, just click share on any completed report. Then enter the email address for the person you wish to share it with. If they do not have an account, they will be prompted to create on. Once registered, they will be able to view the report.

We report the ‘Creation Time’ based on Coordinated Universal Time (UTC).
iPathwayGuide is designed to work with all the latest major browser platforms:
- Google Chrome
- Mozilla Firefox
- Apple Safari (Mac only, iOS not supported yet)
- Microsoft Internet Explorer 11 – Some image download capabilities may not function
Yes. From the login menu, click reset password. You will receive an email with the new password.
A list of databases and versions is available from within each report. See our Release Notes to see the latest data.
Citing iPathwayGuide
Using Advaita Bio’s products or content for any form of publication (e.g. print, electronically) requires researchers to cite them. Please use one of the options below for citations:
“The Data (significantly impacted pathways, biological processes, molecular interactions, miRNAs, SNPs, etc.) were analyzed using Advaita Bio’s iPathwayGuide (Draghici,2007, Donato, 2013).
A systems biology approach for pathway level analysis
S Draghici, P Khatri, AL Tarca, K Amin, A Done, C Voichita, C Georgescu, …
https://genome.cshlp.org/content/17/10/1537.short
Genome research 17 (10), 1537-1545, Analysis and correction of crosstalk effects in pathway analysis
M Donato, Z Xu, A Tomoiaga, JG Granneman, RG MacKenzie, R Bao, …
Genome research 23 (11), 1885-1893
https://pubmed.ncbi.nlm.nih.gov/23934932/
GEO2R does not perform normalization for Affymetrix CEL files. The Advaita iPathwayGuide CEL file uploader currently utilizes the Gene-chip Robust Multi-array Average (GCRMA) normalization method. As such, there can be discrepancies between CEL files processed with GEO2R vs. iPathwayGuide.

Click on a Category to see the associated topics.
- iVariantGuide Tutorials
- iVariantGuide Webinars
- iVariantGuide Videos
- iVariantGuide Release Notes
- iVariantGuide FAQs
Here is a step-by-step guide to find your old analysis:
- Click on the little “i” icon to the right of your file. That “i” stands for information:
2. In the pop-up window, select the previous version you wish to open and click on the date. That will open that particular set of analysis results:
Explore all of the options for customizable graphical filtering in the brand new iVariantGuide. (2:03)
HOW TO SAVE MONEY & INCREASE CUSTOMER LOYALTY
In a 45-minute presentation, Dr. Cordelia Ziraldo recently covered all how service providers and core facilities are taking advantage of the all-new iVariantGuide: from interactive reporting to automatically-generated PDFs; from graphical filters to pathway and GO analysis, and so much more. Follow the link to watch the webinar and see what iVariantGuide can do for you.
We’ve released a brand new version of iVariantGuide. Here are step by step instructions for uploading and analyzing your VCF files. (2:05)
The Pathway Analysis module in iVariantGuide allows you to explore the pathways that are impacted by high-priority variants. See how you can use this powerful module to identify biological links between variants or make new functional hypotheses and design experiments to test them.
In this video, Dr. Cordelia Ziraldo walks you through the steps needed to do a Case v Control analysis in iVariantGuide using RNAseq-based variant data in breast cancer subtypes. Dr. Ziraldo shows you how to identify which systems (pathways, biological processes, molecular functions, and cellular components) and the mechanisms that may be implicated in these breast cancer subtypes.
Gene Ontology (or GO) Analysis identifies the biological processes, molecular functions, and cellular components that are likely affected by your high-priority variants. See how iVariantGuide leverages state-of-the-art algorithms to drill down to the specific biological phenomena relevant to your data.
iVariantGuide’s dynamic, graphical filters help you take your variant analysis to the next level. Find hidden correlations when visualizations of every annotation source update with every new selection you make.
Check your email inbox and spam folder. Look for an automatic activation email sent from noreply@apps.advaitabio.com. If it did land in your spam folder, you should add noreply@apps.advaitabio.com to your address book so that future emails are routed correctly (including notifications when your analyses are complete).
You can generate API credentials on your Advaita Profile page. Once generated, your API ID will always be displayed, but your API secret will only be shown once, so write it down in a safe place! If you lose your API secret, you can reset it by revoking and generating new API credentials. Keep in mind that this will generate a new API ID in addition to the new API secret.
Yes, every new account comes with several demo analyses shared by the Advaita Team. Once you complete your free registration and log in, click ‘Accept Share’ on the demo card, and the demo report will appear in your Reports Table along with any reports you generate. You can use all functionality of iVariantGuide within these demo reports— including creating new filter Presets, exploring Pathway and GO Analysis, and generating the Printable Report. You may upload and analyze your VCF at any time, but you will not be able to access the results until you unlock them via a subscription.
Your VCF needs to adhere to the standard format for VCF v4.1 or above. Here is the Specification File, produced by SAMtools. There is no limit to the number of variants or samples in your file, but very large files (> 1M variants) could have a slower browsing experience.
EXAMPLE
This VCF file meets the minimum specifications for iVariantGuide.
TROUBLESHOOTING
If your VCF is rejected by iVariantGuide, here are a few things to check:
- The file should be tab-delimited. If your columns are separated with spaces, do a find/ replace to make sure you have one tab separating each column.
- All header lines should begin with ##, except the last line of the file header, which contains column headers and should begin with #.
- The last line of the file header must contain every column shown in the example (line 15), including at least one Sample Name column header.
- The columns CHROM, POS, ID, REF, ALT give identifying information about the variant. CHROM and POS are mandatory. The others will accept . (period) in place of missing data.
- iVariantGuide uses the values from the QUAL column as the quality score for each variant call.
- The FILTER and INFO columns are required to preserve the integrity of the VCF format. FILTER information is displayed in the variant table, and data from the INFO column is ignored, with one exception: if excluded from FORMAT, read depth (DP) will be read from the INFO column. Regardless of how it is used in iVariantGuide, every field in the INFO column needs its own definition line in the header. In the example above, each field that appears in the INFO column is defined in lines 3-11 (shown with ##INFO).
- The FORMAT column contains the key to parsing the data in the sample columns. iVariantGuide will prioritize genotype data in the order of: PL, GL, then GT.
The Sample Name column headers will be used as the sample names in iVariantGuide. In the example above, there is one sample and it will be called Sample_1 in iVariantGuide.
There are several free downloadable tools that can convert between those formats and .vcf. They each come with documentation that makes them simple to use.
Yes! Advaita Cloud Systems adheres to the highest industry standards for data security. All data is encrypted during transfer and only you have access to your data, unless you share it. For those customers requiring HIPAA compliance, Advaita offers a HIPAA compliant environment. Please contact sales@advaitabio.com for additional information.
iVariantGuide currently supports human hg19 and hg38 (GrCh37 and GrCh38).
We provide annotations from dbSNP, ClinVar, and 1000 Genomes in addition to all sources contained in our KnowledgeBase, iBioGuide. We also provide links to additional information in iBioGuide as well as external sites such as OMIM, MedGen, and more. At this time iVariantGuide does not support user-defined annotation sources. If there is a specific database you would like us to support, please let us know!
FILTER Variant Class Clinical Significance Functional Class Impact Region Zygosity Allele Frequency Depth Distribution Quality Length of Indel Substitution Types Chromosomal Location Pathways GO Terms |
SOURCE SnpEff ClinVar SnpEff SnpEff SnpEff iVariantGuide dbSNP/1000 genomes input vcf file input vcf file iVariantGuide iVariantGuide input vcf file KEGG Gene Ontology |
Yes! You may share a report with anyone you wish. They must register a free account to view it, but will have the level of access that you have. If you are sharing a purchased analysis, your sharee will also be able to see premium features such as pathways and GO analyses. You may also associate a filter preset to any purchased analysis and share that preset with the analysis. When sharing, you may also control which of your recipients may re-share the report or whether you want to maintain control over its dissemination. We also provide a stable public link direct to your report in case you wish to share it publicly (e.g. publication).
Yes! Everywhere you see a download arrow within a report, there are data and/or images that may be exported. iVariantGuide also provides a comprehensive summary report that can be printed or downloaded as a pdf. (Paid accounts only.)
Using Advaita Cloud Services’ products or content for any form of publication (e.g. print, electronically) requires researchers to cite them. Please use one of the options below for citations:
- “The Data (SNPs, insertions, deletions, etc.) were analyzed using Advaita Bio’s iVariantGuide (http://ivariantguide.advaitabio.com)”.
- LaTeX users may use the following code in a bibtex file: ~\cite{advaita2016}
@ONLINE{advaita2016,
author = {Advaita, Corporation},
title = {Variant Analysis with iVariantGuide},
month = Apr, year = {2016},
url = { http://ivariantguide.advaitabio.com }
}
On January 12, 2018, Advaita released a major update to its platform, with improvements to iPathwayGuide, iVariantGuide, and iBioguide.
IMPROVEMENTS
The Advaita Knowledgebase was updated to version 1711 and now includes:
- 3 organisms: homo sapiens, mus musculus, rattus norvegicus
- 213,390 Genes
- 1,933 Diseases
- 44,976 GO terms
- 4,791 Drugs
- 955 Pathways
- 5,710 miRNAs
- 3,161,730 References
- For a complete list of databases and versions, please see report information within each application.
NEW FEATURES
- iVariantGuide: API Client now accepts multi-sample analyses
- Improvements to account registration page to ensure proper organization affiliation.
BUG FIXES
- iPathwayGuide: Improvements to parsing of CuffDiff-formatted files to maintain association of phenotype labels. Fold changes and p-value parsing remains untouched.
Uploading and analyzing data is easy. Here is a quick video tutorial explaining how.
1. SELECT/ UPLOAD FILE
The first step is to upload your VCF file containing all of the variants and samples you want to analyze. Here are a few tips:
- Make sure the file meets the specs for VCF v4.1 or higher. Especially:
- It contains each of the following columns: CHROM, POS, REF, ALT, FILTER, QUAL, INFO, FORMAT, and at least one Sample
- All INFO and FORMAT tags are defined with their own line in the header
- There is no limit to the number of variants or samples in your file, but very large files (> 1M variants) could have a slower browsing experience.
- Don’t have your data ready? We have sample datasets available. Grab a sample file and try it… it’s easy!
Once uploaded, select the checkbox next to the file you’d like to analyze. You will then be prompted to verify the reference assembly and select the type of analysis you wish to perform, including:
- Case v Control (Group vs Group)
- Tumor/ Normal (Paired Samples)
- Pedigrees (Trio, Quad, and larger families)
- Individual Samples
Lastly, iVariantGuide allows you to pre-filter your variants by quality, read depth, and FILTER flags. If there are certain quality control measures you know you’ll apply anyway, this step will help to focus the variants in your analysis to only those you are confident of, while ensuring a more favorable browsing experience.
2. ADD SAMPLES TO GROUPS
You may assign information to each sample in the file (sex, group, parents) in the page or by uploading a file containing the necessary information. You may also re-name samples (in case the VCF sample names are not easy to read). iVariantGuide accepts two formats for sample information: ped for pedigree analysis and txt for group vs group and tumor/ normal analyses. For a description and example of each file format, see below.
File Formats for Specifying Sample Info
- PED: a space or tab-delimited file with at least 6 columns, and one row per sample. Read more here and here. Download an example file.
- TXT: a tab-delimited file with one header row and one row per sample.
- To use this format, download the example file and open it in Excel or another spreadsheet program. Then replace the example values with the following sample information from your own data. The columns are as follows:
- sample: the sample names from the VCF file
- name: the sample names to display in iVariantGuide (if blank, will default to values in sample column)
- sex: male or female. case-sensitive, if blank will be unknown.
- paternal: sample name of father (if known)
- maternal: sample name of mother (if known)
- group: name of group (for group vs group and tumor/ normal analyses, this column must contain exactly two different group names)
- To use this format, download the example file and open it in Excel or another spreadsheet program. Then replace the example values with the following sample information from your own data. The columns are as follows:
IMPORTANT NOTE: Check the order of your samples! The first sample in the PED file is always the proband, and the first phenotype found is Affected. The second phenotype found (the first row with a phenotype different from that of the proband) is Unaffected, and the third is Unknown. For TXT files, the first group found is Tumor/ Case and the second is Normal/ Control.
3. CREATE REPORT
On the last page you can review the selections you made so far, and give your analysis a Title and Description. Once satisfied, click submit. Each dataset takes about 15 minutes to analyze. You will get an automated email as soon as your analysis is complete.
With Advaita’s latest update to its applications and knowledge base, Advaita updated its API for iVariantGuide and iPathwayGuide.
An API or (Application Program Interface) is a set of routines, protocols, and tools for building software applications. An API specifies how software components should interact. Advaita’s API is designed for advanced users of iPathwayGuide and iVariantGuide who would like to streamline their data processing and bypass the UI for submitting data.
Advaita’s API is designed to take advantage of the AWS EC2 environment and allow users to submit one to several datasets in rapid succession. Results from the application are still viewed in the application and are just as informative.
The API documentation and links are viewable at: https://hub.docker.com/r/advaitabio/api-client/
API access and support is available to subscribers who have opted for API access to either iPathwayGuide or iVariantGuide customers. If you would like to add API access to your existing annual subscription, please contact us at sales@AdvaitaBio.com for additional information.
On February 27, 2017, Advaita released a major updates to its platform. These are the release notes.
IMPROVEMENTS TO: iPathwayGuide, iVariantGuide, iBioguide, and the Advaita Knowledge Base
- Changes to AWS services in preparation for HIPAA compliance
- Updated knowledge base to version Advaita KB v1702, which includes the following data sources and versions:
Database | Version | iPG Annotations | iVG Annotations |
---|---|---|---|
KEGG | Release 81.0+/01-20, Jan 17 | Pathways, Diseases, Drugs | Pathways |
Gene Ontology | 2016-Sep26 | GO Terms | GO Terms |
Targetscan | Targetscan v7.1 | miRNA Target Genes | miRNA Target Genes |
MIRBASE | MIRBASE v21,06/14 | miRNA Sequences | |
dbSNP (incl 1k genomes) | Build 149 | Minor Allele freq. | |
RefSeq | Release 71 July 2016 | Impacted Transcripts | |
ClinVar | Dec 1, 2016 | Clinical Significance | |
SNPEff | v4.1L | Predicted Impact |
IMPROVEMENTS TO iPATHWAYGUIDE
- NEW FEATURE! Onboarding carousel with top user benefits
- NEW FEATURE! API (Premium feature)
- Bug fix: genes selected in Genes Table on Pathways page are now highlighted on pathway map
IMPROVEMENTS TO iVARIANTGUIDE
- Improved error messaging for sample upload & report creation
- NEW FEATURE! Versioning: each report now shows which version of the Advaita Knowledgebase was used to annotate the sample. Outdated reports may be updated when viewing Report Info: either on the Reports page or from within the report itself. As is true for other Advaita applications, only the report owner may update it.
IMPROVEMENTS TO iBIOGUIDE
- Updated to use AKB v1702
11/1/2016
The Commercial Release of iVairantGuide is here! With this commercial release we now have the following enhancements from the last Beta:
- Redesigned uploading and intake navigation work flow with onboarding queues
- Improved navigation and selection on visual filters/graphs
- Improved sharing capabilities:
- View share history
- autofill prompts for often-used email addresses
- Ability to associate filter presets to shared report
- Ability to associate filter presets to public link (anyone can see what you see)
- Improved tooltips and onboarding
- Improved pathway and GO analysis with p-value ranking and advanced correction factors
- UI improvement
- User profile page
- API credentials
- Initial API (Premium Feature)
- Printable Summary (Premium Feature)
- Improved Pathway and GO Analysis (Premium Feature)
- Harmonization of labeling and naming conventions
- Numerous bug fixes and security enhancements
9/6/2016
In the pre-commercial release of iVariantGuide the following issues have been addressed:
- Redesigned uploading and intake navigation work flow with onboarding queues
- Improved navigation and selection on visual filters/graphs
- Improved sharing capabilities:
- View share history
- autofill prompts for often-used email addresses
- Ability to associate filter presets to shared report
- Ability to associate filter presets to public link (anyone can see what you see)
- Improved tooltips
- Improved pathway and GO analysis with p-value ranking and advanced correction factors
- UI improvement
- User profile page
- Introduction of subscriptions (free during beta) to see premium features
- Redesign of processing engine to paralellize analyses
- Improved filtering speed
- Harmonization of labeling and naming conventions
- Numerous bug fixes and security enhancements
- Redesigned uploading and intake navigation work flow
- Improved navigation and selection on visual filters/graphs
- Improved tooltips
- Redesigned notification of filter presets when navigating away from variants page
- Harmonization of labeling and naming conventions
- Numerous bug fixes and security enhancements
4/11/2016
- Analyses are currently limited to single sample analyses. Multi-sample analyses are in development.
- Input file size is limited to ~100mb for now.
- Input files must be .vcf or .vcf.gz version 4.1 or later; reference genomes hg19 (GRCh37) and GRCh38 are supported.
- Supported filters include by quality, read depth, genomic region, variant class, predicted effect, clinical significance, and impact score.
- Filter combinations may be saved as “Presets” for later use with new data sets.
- Detailed variant view includes links to: iBioGuide, dbSNP, OMIM, MedGen, PubMed, and more.
- View variants in context of impacted pathways. Pathway view highlights affected genes and provides ability to model miRNAs and drugs.
- View variants in relation to GO terms. Navigate upstream and downstream to identify specific ontology terms.
- Share reports with individuals or publicly.
Generally, each analysis takes about 15 minutes to complete. If there are other analyses queued ahead of yours, it may take a bit longer. You will receive an email as soon as the analysis is complete.
Click on a Category to see the associated topics.
On June 21, 2019, Advaita released a major update to its platform, with improvements to iPathwayGuide, iVariantGuide, and iBioGuide.
IMPROVEMENTS
The Advaita Knowledgebase was updated to version 1906 and now includes:
- 3 organisms: homo sapiens, mus musculus, rattus norvegicus
- 216,544 Genes
- 2,307 Diseases
- 45,049 GO terms
- 5,694 Drugs
- 985 Pathways
- 5,396 miRNAs
- 92,745 Proteins
- 29,761,157 References
- 3,042,479 Interactions
- 477,289 Experiments
For a complete list of databases and versions, please see report information within each application.
NEW FEATURES
- iBioGuide: Users can now log in to iBioGuide using their Advaita credentials, the same account they use for iPathwayGuide and iVariantGuide.
- iPathwayGuide: Experiment and references for Network Analysis are now shown in a paged view, improving load time.
BUG FIXES
- iPathwayGuide: Meta-Analysis icon was updated to stay consistent with menu selections.
iBioGuide is a free browser and search tool based on Advaita’s extensive knowledge base of over 100 million relationships. Search for any term and find all the related genes, microRNAs, pathways, biological processes, molecular functions, cellular components, drugs, diseases, and references.
Example 1: You are interested in identifying the pathways associated with the CDK4 gene. Enter the gene symbol and find a list of related pathways. Exploring any one of the pathways allows you to see the genes that interact with CDK4 and the miRNAs and drugs that target them along with relevant references.
Example 2: You are interested in learning about the regulation of cell cycle process. Enter this as your search term and discover the various entities related to this process. Exploring one of the GO terms, you quickly identify the genes annotated to the process and the miRNAs and drugs that target these genes and possibly this processes.
On January 12, 2018, Advaita released a major update to its platform, with improvements to iPathwayGuide, iVariantGuide, and iBioguide.
IMPROVEMENTS
The Advaita Knowledgebase was updated to version 1711 and now includes:
- 3 organisms: homo sapiens, mus musculus, rattus norvegicus
- 213,390 Genes
- 1,933 Diseases
- 44,976 GO terms
- 4,791 Drugs
- 955 Pathways
- 5,710 miRNAs
- 3,161,730 References
- For a complete list of databases and versions, please see report information within each application.
NEW FEATURES
- iVariantGuide: API Client now accepts multi-sample analyses
- Improvements to account registration page to ensure proper organization affiliation.
BUG FIXES
- iPathwayGuide: Improvements to parsing of CuffDiff-formatted files to maintain association of phenotype labels. Fold changes and p-value parsing remains untouched.
The following components were added or addressed in this release.
- Extensive databases updates including:
- KEGG pathways, drugs, and diseases
- NCBI genes
- TargetScan miRNAs
- Gene Ontologies
- PubMed references
- Improved search results
- Several bug fixes
- Changes to UI backend
The databases contained in iBioGuide were updated with the following releases:
- Pathways, Drugs, Diseases – KEGG – Release 73.0, March 16, 2015
- Gene Ontology Terms – Gene Ontology Consortium – September 19, 2014
- MicroRNA
- TargetScan – Release 6.2, March 2015
- MIRBase – Version 21, June 2014
- Genes – NCBI – March 2015
iBioGuide connects several databases and their annotated contents. As such, iBioGuide will perform best if you use life-science terms such as genes, diseases, biological process, etc.
Yes. Results are delivered in a global sense, but can quickly be narrowed to a specific domain by clicking one of the filters at the top of the page or selecting a specific organism.
We’re working on it. In an upcoming release we will add gene set enrichment analysis capabilities.
Yes. Just record or save the url. You can share the url as needed. Search history, unfortunately, is not savable at this time.
Not specifically. Gene ontologies are not organism specific and are applicable to all organisms. If you have a specific organism you would like to see us support, please let us know.
We use a variety of public and semi-public databases, like NCBI, KEGG, among others. These databases are updated at the same time as the databases in iPathwayGuide.
iBioGuide is meant to allow users to browse relationships between various biological entities and concepts. iPathwayGuide is designed to identify which entitles and systems are impacted in the context of experimental data.
iBioGuide is not have this capability. If you need to save an image, we encourage you to use a screenshot.
iBioGuide is 100% free. We don’t even ask you for your email address.
A “Parent’ term will be a more generalized term for that ontology domain. A “Child” term will be a more specific form of the currently selected term.