3 * Jalview - A Sequence Alignment Editor and Viewer ($$Version-Rel$$)
4 * Copyright (C) $$Year-Rel$$ The Jalview Authors
6 * This file is part of Jalview.
8 * Jalview is free software: you can redistribute it and/or
9 * modify it under the terms of the GNU General Public License
10 * as published by the Free Software Foundation, either version 3
11 * of the License, or (at your option) any later version.
13 * Jalview is distributed in the hope that it will be useful, but
14 * WITHOUT ANY WARRANTY; without even the implied warranty
15 * of MERCHANTABILITY or FITNESS FOR A PARTICULAR
16 * PURPOSE. See the GNU General Public License for more details.
18 * You should have received a copy of the GNU General Public License
19 * along with Jalview. If not, see <http://www.gnu.org/licenses/>.
20 * The Jalview Authors are detailed in the 'AUTHORS' file.
23 <title>Importing Variants from VCF</title>
27 <strong>Importing Genomic Variants from VCF</strong>
30 <p>Jalview can annotate nucleotide sequences associated with
31 genomic loci with features representing variants imported from VCF
32 files. This new feature in Jalview 2.11, is currently tuned to work
33 best with tab indexed VCF files produced by the GATK Variant
34 Annotation Pipeline (with or without annotation provided by the
35 Ensembl Variant Effect Predictor), but other sources of VCF files
38 If your sequences have genomic loci, then a <strong>Taxon
39 name</strong> and <strong>chromosome location</strong> should be shown in
40 the Sequence Details report and the Sequence ID tooltip (providing
41 you have enabled it via the submenu in the <em><strong>View</strong></em>
42 menu). Jalview matches the assembly information provided in the VCF
43 file to the taxon name, using an internal lookup table. If a match
44 is found, Jalview employs the Ensembl API's lift-over services to
45 locate your sequences' loci in the VCF file assembly's reference
46 frame. If all goes well, after loading a VCF, Jalview will report
47 the number of variants added as sequence features via the alignment
48 window's status bar. These are added by default when loci are
49 retrieved from Ensembl.
52 <strong><a name="attribs">Standard Variant Attributes</a></strong>
54 <p>Jalview decorates variant features imported from VCF files with
55 attributes that can be used to filter or shade variant annotation
56 including the following:
59 <li><em>POS</em> - Chromosomal position as recorded in VCF</li>
60 <li><em>ID</em> - in GNOMAD releases specifies rs identifier of
61 a known dbSNP variant.</li>
62 <li>QUAL is the 'phred-scaled quality score' for the ALT
63 assertion (or quality of SNP call if there are no alternate
64 alleles). Higher is more confident.</li>
65 <li><em>FILTER</em> is 'PASS' if all filters have been passed,
66 else a list of failed filters for the variant (e.g. poor quality,
67 or insufficient sample size).</li>
69 <p><em>Standard attributes were introduced in Jalview 2.11.1.0.</em> VCF field semantics are highly dependent on the source of your VCF
71 href="https://www.internationalgenome.org/wiki/Analysis/vcf4.0">https://www.internationalgenome.org/wiki/Analysis/vcf4.0</a>
75 <strong>Working with variants without CSQ fields</strong>
78 <a name="computepepvariants">Jalview 2.11.1's new virtual
79 features</a> mean that peptide sequences are no longer annotated
80 directly with protein missense variants. This makes it harder to
81 filter variants when they do not already include the CSQ field. You
82 can rescue the pre-2.11.1 functionality by:
85 <li>Download the script at
86 https://www.jalview.org/examples/groovy/ComputePeptideVariants.groovy</li>
87 <li>Executing the script via the <a href="groovy.html">Groovy
88 Console</a> on a linked CDS/Protein view to create missense and
89 synonymous peptide variant features.
93 <strong>Working with variants from organisms other than
97 <li>Look in your VCF file to identify keywords in the
98 ##reference header that define what species and assembly name the
99 VCF was generated against.</li>
100 <li>Look at ensembl.org to identify the species' short name,
101 and the assembly's unique id.</li>
102 <li>Add mappings to the <strong>VCF_SPECIES</strong> and <strong>VCF_ASSEMBLY</strong>
103 properties in your .jalview_properties file. For example:<pre>
104 VCF_SPECIES=1000genomes=homo_sapiens,c_elegans=celegans
105 VCF_ASSEMBLY=assembly19=GRCh37,hs37=GRCh37</pre><br /> <br />These allow
106 annotations to be mapped from both Human 1000genomes VCF files and
110 <strong>Work in Progress!</strong>
111 <p>VCF support in Jalview is under active development. Please get
112 in touch via our discussion forum if you have any questions, problems or
113 otherwise find it useful !</p>