SoyBase Gene and Genome Nomenclature

The second Williams 82 genome assembly was released by JGI in February, 2013. The existence of multiple assemblies and annotations prompted JGI and key user groups to develop the following nomenclature conventions.

Gene Model Version Glyma 1.1 to Glyma 2.0 Correspondence Lookup:

Use this quick and easy tool to look up the name correspondences between the Wm82.a1.v1.1 and Wm82.a2.v1 annotations

Genome assembly nomenclature:

• Glyma.Wm82.a1 <-- the original JGI assembly, previously known as Glyma1.01, aka assembly 1
• Glyma.Wm82.a2 <-- assembly 2
As other cultivars are sequenced, the genotype prefix (Wm82) will be changed as needed, e.g.:
• Glyma.Forr.a1
Similarly a G. soja genome assembly would be:
• Glyso.PI25345.a1
Gene annotation nomenclature:
An annotation version number is appended to the assembly name
• Glyma.Wm82.a1.v1 <-- initial JGI annotation
• Glyma.Wm82.a1.v1.1 <-- the JGI 1.1 annotation
• Glyma.Wm82.a2.v1 <-- the current annotation

Gene model nomenclature:

Since many papers have been published for the assembly 1, annotation 1 or 1.1 the names will remain unchanged
• Glyma10g34560 <-- no change from previous usage
Genes in the 2nd and future JGI assemblies and annotations are distinguished by a period after "Glyma" and 6 rather than 5 digits after the "g"
• Glyma.01g123450.1.Wm82.a2.v1 (Note: the .1 after the locus name is used to distinguish a transcript or splice variant)
• Glyso.01g112340.1.PI125345.a1.v1

An example:

A locus in Williams 82 assembly version 2, annotation version 1, with 2 transcripts:
• locus (i.e. gene model) identifier = Glyma.01g123499.Wm82.a2.v1
• locus name = Glyma.01g123499
• transcript identifiers = Glyma.01g123499.1.Wm82.a2.v1 and Glyma.01g123499.2.Wm82.a2.v1
• transcript names = Glyma.01g123499.1 and Glyma.01g123499.2

In publications, the full locus identifier can be used as part of each gene name, or the locus name can be provided separately to describe a set of genes if this can be done unambiguously in the context of the paper. For example "We studied Glyma.01g123450 in genotype, assembly, and annotation version Glyma.Wm82.a2.v1." Subsequent uses in the paper could then use only the shorter locus names (e.g. Glyma.01g123450).

At SoyBase we will use the brief locus name in the genome browser, with the annotation/assembly/annotation metadata being provided in the track name. Text report pages will use the long and short names as appropriate to the particular data being displayed.