Your VCF needs to adhere to the standard format for VCF v4.1 or above. Here is the Specification File, produced by SAMtools. There is no limit to the number of variants or samples in your file, but very large files (> 1M variants) could have a slower browsing experience.
This VCF file meets the minimum specifications for iVariantGuide.
If your VCF is rejected by iVariantGuide, here are a few things to check:
- The file should be tab-delimited. If your columns are separated with spaces, do a find/ replace to make sure you have one tab separating each column.
- All header lines should begin with ##, except the last line of the file header, which contains column headers and should begin with #.
- The last line of the file header must contain every column shown in the example (line 15), including at least one Sample Name column header.
- The columns CHROM, POS, ID, REF, ALT give identifying information about the variant. CHROM and POS are mandatory. The others will accept . (period) in place of missing data.
- iVariantGuide uses the values from the QUAL column as the quality score for each variant call.
- The FILTER and INFO columns are required to preserve the integrity of the VCF format. FILTER information is displayed in the variant table, and data from the INFO column is ignored, with one exception: if excluded from FORMAT, read depth (DP) will be read from the INFO column. Regardless of how it is used in iVariantGuide, every field in the INFO column needs its own definition line in the header. In the example above, each field that appears in the INFO column is defined in lines 3-11 (shown with ##INFO).
- The FORMAT column contains the key to parsing the data in the sample columns. iVariantGuide will prioritize genotype data in the order of: PL, GL, then GT.
The Sample Name column headers will be used as the sample names in iVariantGuide. In the example above, there is one sample and it will be called Sample_1 in iVariantGuide.