GrapeTree: visualization of core genomic relationships among 100,000 bacterial pathogens

Current methods struggle to reconstruct and visualize the genomic relationships of large numbers of bacterial genomes. GrapeTree facilitates the analyses of large numbers of allelic profiles by a static “GrapeTree Layout” algorithm that supports interactive visualizations of large trees within a web browser window. GrapeTree also implements a novel minimum spanning tree algorithm (MSTree V2) to reconstruct genetic relationships despite high levels of missing data. GrapeTree is a stand-alone package for investigating phylogenetic trees plus associated metadata and is also integrated into EnteroBase to facilitate cutting edge navigation of genomic relationships among bacterial pathogens.

GrapeTree is a fully interactive, tree visualization program within EnteroBase, which supports facile manipulations of both tree layout and metadata.It generates GrapeTree figures using the neighbor-joining (NJ) algorithm, the classical minimal spanning tree algorithm (MSTree) similar to PhyloViz, or an improved minimal spanning tree algorithm which we call MSTree V2.
• GrapeTree is also available as a stand-alone version.
• Installation instructions, Manuals and Tutorials are available on this site • The source code for GrapeTree is available • GrapeTree is also available as a live online demo Here are materials to help you use GrapeTree:

Installing stand-alone GrapeTree
The stand-alone version emulates the EnteroBase version through a lightweight webserver running on your local computer.You will be interacting with the program as you would in EnteroBase; through a web browser.We recommend Google Chrome for best results.There are number of different ways to interact with GrapeTree, the easiest is to install via pip, or download the ready-built software here: https://github.com/achtman-lab/GrapeTree/releases

EnteroBase Documentation, Release
If at anytime you want to restart the page you can visit http://localhost:8000 in your web browser, as shown below.
Let's apply these concepts with data from a previously published study of Salmonella enterica serovar Agona.Click here for the first tutorial

Tutorial 1A: Basic Usage of GrapeTree (EnteroBase version)
The procedure for working with GrapeTree in EnteroBase and stand-alone are identical, once you have loaded your data.This tutorial assumes you are using EnteroBase.There are other tutorials specifically for the Stand-alone version.
You will need to be a registered user of EnteroBase, see Getting Started -Registering and logging in.

About this dataset
To learn the basic usage of GrapeTree we will be using data presented in Zhou et al. "Neutral genomic microevolution of a recently emerged pathogen, Salmonella enterica serovar Agona".PLoS genetics 9.4 (2013): e1003471.We will try to replicate the tree presented in Figure 1, which shows phylogeny of serovar Agona including a number of outbreaks (green diamonds) across the world.

Searching EnteroBase
This tutorial assumes you are already familiar with finding records in EnteroBase.If you are not, please read Searching EnteroBase.In the case of this example, the genomes are listed under Bio Project ID PRJEB1944.
1. Perform the search in EnteroBase.This will load 71 records.Before we get to far ahead, there is some custom metadata that describes the outbreak clusters from the paper.To load it, Open the EnteroBase panel: 1. Click import fields 2. Under "add custom column" select 'agona_cluster' and click add 3. agona_cluster should show up in the column list on the right.Click OK

Basic Orientation
Let's stop and orientate ourselves with the GrapeTree interface: 1.These are links to important webpages (Top left).The Left icon opens a new browser page to EnteroBase, while the right opens the GrapeTree GitHub page.
2. The text here (Top left) shows may show the filename of the file we just loaded in.
3. This set of panels (Left) contains all the options for customizing our tree.Currently the Input/output panel is open and gives options to load trees/profiles, load metadata, and options to save our work.
4. The GrapeTree Tree itself (Centre), the figure is interactive.Each circle is a Node and each line is a branch.Node size is dependent on the number of strains within that node.Branch length varies on the distance between nodes.
5. The Key/Legend (Right) for the colour coding.You can change some settings by right clicking on it.

Basic Navigation
GrapeTree has a rich suite of tools to help you navigate and manipulate your tree.  5. Move a node and its children by Click and holding the left mouse button down on a Node and then drag.
6. Move the key/legend by Click and holding the legend and then drag to move it around.
7. Rotate the entire tree by Click and holding the root node and then drag.
8. Select some nodes by holding SHIFT key and dragging over some nodes in the tree.You can also select individual nodes by holding the SHIFT key and clicking on nodes one-by-one.9. Add more nodes to your selection by holding SHIFT key and dragging over other nodes in the tree.
10. Deselect some nodes by holding SHIFT key and dragging over some already selected nodes.Try removing nodes from your current selection.
11. Deselect all selected Nodes by double-clicking any whitespace around the tree, or by right-clicking and choosing Unselect all from the contextual menu.This is enough to get started, let's tidy up our tree.

Modifying the Tree Layout
The Tree Layout panel allows global changes to tree layout, nodes and branches and has some important navigation features.Try playing around with each of the settings to see what they do.Generally: • You can drag the sliders to change the value.
• You can also directly modify the value by clicking on the value box, typing in a new value, and pressing enter or clicking out of the box.
• Click the refresh icon (the rewind icon) to reset the value to default.
Specifically under Tree Layout > Branch Style: • Scaling: will uniformly increase the scaling for all branches.For instance, setting it to 200% will double the length of all branches relative to the default setting (100%); whereas 50% would halve it.
• Collapse Branches: will collapse all branches under a certain length.The length value shown is the real branch length for the tree.To see the lengths for all branches, check the Branch Labels under Branch Style.The slider is scaled relatively, so moving it all the way to the right will collapse all the nodes giving you a pie graph.
• Log Scale: if this is checked (has a tick in the box), all branches will be Log-scaled.This is useful for trees with a wide variety of branch lengths.
Specifically under Tree Layout > Node Style:

GrapeTree Documentation
• Node Size: will uniformly increase the scaling for all Nodes.For instance, setting it to 200% will double the size of all Nodes relative to the default setting (100%); whereas 50% would halve it.
• Node Scaling: This will exaggerate differences in node size.In the case of the of the Agona dataset, all nodes include only one strain so there will be no effect.
For this tutorial set the Branch length to 150% (as shown below) and we shall continue.

Styling the Branches
Under Branch Style we can also modify the look of the Branches in our Tree.We can show branch labels by checking the Branch Labels option and change the font size with the Font Size slider or by entering a new value in the box.If the Mouseover info box is checked we can see the branch length when we have the mouse cursor over a particular branch.
The tree of the Agona dataset has very long branches.Enter 100 in the box next to For branches longer than and set it to 'Shorten'.This will shorten the branch length and change the line to be dashed, which indicates the branch length is not to scale.

Node settings
Under Tree Layout > Node Style we can modify the look of the Nodes in our Tree.We can show nodes labels by checking the Node Labels option and change the font size with the Font Size slider or by entering a new value in the box.If the Mouseover info box is checked we can see details for that node when we mouse over.
We can also set the colour coding of the Nodes.For the Agona dataset, Set Colour by to "Agona_Clusters" to show the outbreaks as defined the original paper.See if you can your figure to look like mine.

Final modifications
The tree is looking pretty good, but we can make it a bit clearer.Try playing around with all of the different options to come up with the best looking tree.Here's what I came up with: These are my settings: • Tree Layout > Branch Length: 150% • Tree Layout > Collapse Branches: 10 • Branch Style > Shorten branches longer than: 100 • Node Style > Colour by: Clusters • Node Style > Node Labels: Unchecked/Off

Exporting our work
Your tree can be save in either GrapeTree's JSON format, as a Newick tree that can be loaded into other phylogeny programs and as a Scalar Vector Graphic (SVG), which is an image format that you can edit in publishing software like Inkscape or Adobe Illustrator.If you would like a raster image (JPEG, PNG, BMP) of the Tree, just use the screenshot feature of your computer.

Tutorial 1B: Basic Usage of GrapeTree (Stand-alone)
The procedure for working with GrapeTree in EnteroBase and stand-alone are identical, once you have loaded your data.This tutorial assumes you are using the stand-alone version.There are other tutorials specifically for the Enter-oBase version.
You will need to install GrapeTree, Installing stand-alone GrapeTree.

About this dataset
To learn the basic usage of GrapeTree we will be using data presented in Zhou et al. "Neutral genomic microevolution of a recently emerged pathogen, Salmonella enterica serovar Agona".PLoS genetics 9.4 (2013): e1003471.We will try to replicate the tree presented in Figure 1, which shows phylogeny of serovar Agona including a number of outbreaks (green diamonds) across the world.

Loading data into GrapeTree with the stand-alone version
Download the sample data as shown below. https://raw.githubusercontent.com/martinSergeant/EnteroMSTree/master/examples/Grapetree_Agona.profilehttps://raw.githubusercontent.com/martinSergeant/EnteroMSTree/master/examples/Grapetree.Agona.meta.tsv The profile file includes *Salmonella* cgMLST data from EnteroBase for the strains described in the paper.The first row contains the headers for each column "Name", "ST" (Sequence Type), and each subsequent column are the names of locus within the cgMLST scheme.
If you wish to use your own profile: • Profiles must be tab or comma delimited.
• You MUST include a "#" symbol as the start of the row.Note in the example below that "Name" is in fact "#Name".
• You may use a SNP matrix, which is the same format as the Agona example, with single nucleotides (A,T,G or C) substituting for the numbers.

The metadata file is slightly modified version of what is available in EnteroBase.
If you wish to use your own metadata: • Metadata must be tab or comma delimited.
• One column must be labelled ID and these values should correspond to the names in the profile file.

Install and Start GrapeTree
Start GrapeTree as described here, Installing stand-alone GrapeTree.

Load in the Profile file and Metadata
You should now see the GrapeTree interface with the splash screen.You can either drag-and-drop the profile file into the window or click Load Files and navigate to the file through the file browser.
You will be prompted to select the Parameters For Tree Creation.The method should be set to MSTTreeV2 in the dropdown and then click OK.
Repeat the process with the metadata file.Either drag-and-drop the file into the window or click Load Files and navigate to the file.You should now see the tree colored with a metadata field as shown below.Chapter 1. Citation

EnteroBase Documentation, Release
Node size is dependent on the number of strains within that node.Branch length varies on the distance between nodes.
5. The Key/Legend (Right) for the colour coding.You can change some settings by right clicking on it.

Basic Navigation
GrapeTree has a rich suite of tools to help you navigate and manipulate your tree.Try these out! 5. Move a node and its children by Click and holding the left mouse button down on a Node and then drag.
6. Move the key/legend by Click and holding the legend and then drag to move it around.
7. Rotate the entire tree by Click and holding the root node and then drag.
8. Select some nodes by holding SHIFT key and dragging over some nodes in the tree.You can also select individual nodes by holding the SHIFT key and clicking on nodes one-by-one.
9. Add more nodes to your selection by holding SHIFT key and dragging over other nodes in the tree.
10. Deselect some nodes by holding SHIFT key and dragging over some already selected nodes.Try removing nodes from your current selection.
11. Deselect all selected Nodes by double-clicking any whitespace around the tree, or by right-clicking and choosing Unselect all from the contextual menu.
This is enough to get started, let's tidy up our tree.

Modifying the Tree Layout
The Tree Layout panel allows global changes to tree layout, nodes and branches and has some important navigation features.Try playing around with each of the settings to see what they do.Generally: • You can drag the sliders to change the value.
• You can also directly modify the value by clicking on the value box, typing in a new value, and pressing enter or clicking out of the box.
• Click the refresh icon (the rewind icon) to reset the value to default.

Exporting our work
Your tree can be save in either GrapeTree's JSON format, as a Newick tree that can be loaded into other phylogeny programs and as a Scalar Vector Graphic (SVG), which is an image format that you can edit in publishing software like Inkscape or Adobe Illustrator.If you would like a raster image (JPEG, PNG, BMP) of the Tree, just use the screenshot feature of your computer.

Tutorial: Making your own GrapeTree Links
GrapeTree offers a choice to publish your interactive GrapeTree analysis online.This can be done by using two URL parameters. Parameters: • tree = <online file for newick tree or json saved GrapeTree> • metadata = <tab-delimited or comma-delimited table> Due to the CORS restrictions in jscript codes, only three sources have been tested as working: • Files from the same domain as the GrapeTree server.
• Files in a GitHub public repository.
• DropBox files that have been publicly shared via links.
There are different ways of publishing data.

Owners of websites.
Please run the standalone version of GrapeTree and serve the URL under the same domain as your main website.GrapeTree can read links from your website, as long as they were under the same domain.
You can either host the tree files in a different local link, or fetch them from external link at the backend and redistribute them via a proxy link.

GrapeTree Reference Manual
This page explains each of the features in GrapeTree.

Interaction With Enterobase
• Load Selected: Any strains selected in the tree will be loaded into the main search page of Enterobase.If The main search page is not open, then a new page will open in the browser.
• Highlight Checked: Any strains checked (selected) in the Enterobase main search page will become highlighted (large yellow halo) in the tree.
• Import Fields: Shows a dialog box which allows the selection of experimental fields and custom fields (columns) to be imported into the tree.
• Save:Saves the tree layout and any metadata in the tree.Changed metadata is only associated with tree and will not be updated in Enterobase.Data in custom columns, however, which you have permission to edit, will be updated in Enterobase and you will be notified if this is the case.
• Update: Will update the tree with any metadata that has changed in Enterobase since the tree was created or the last update.Also any data from custom columns, which has changed in Enterobase will also be updated in ther tree.
• Info: Shows information about the tree such as the parameters used for construction, number of strains, last modified etc.

Loading data
To get started, Drag and drop files into the browser window

Trees or Profile Data
• Phylogenetic trees: These can be Nexus or Newick (nwk) format.
• Profile Files: These are tab delimited text with columns as alleles and strains as rows see Tutorial 1B: Basic Usage of GrapeTree (Stand-alone).header line is required, in which the strain columns and metadata columns need to start with a '#'.This requires the local server to be running.
• Custom Format(.json):Files generated by this program which contain the tree data exactly as displayed and any metadata.• Scaling: Increase/Decrease length of all branches.Click rewind icon to revert to default value.Use the slider to change the value, or enter a specific value into the box.

GrapeTree Documentation
• Collapse Branches: All branches shorter than specified length will be collapses and nodes will merged together.Branch length value is scaled to the branch lengths defined in the original tree data.Use the slider to change the value, or enter a specific value into the box.
• Log Scale: All length of all branches will be scaled logarithmically.
Branches that are over the specified length can be rendered in a particular way based on settings in this panel.Branch length value is scaled to the branch lengths defined in the original tree data.Enter a specific value into the box or use the arrows.
• Display: Long Branches will be show as normal • Hide: Long branches will be transparent.They are interactive, but will not be shown on the tree.
• Shorten: Long branches will be cropped back to the specified branch length cutoff.Lines will be dashed to indicate affected branches.

Layout Rendering options
Layout Rendering gives options on how nodes are positioned on the tree.
• Dynamic: Nodes are positioned dynamically similar to a Force Directed Layout.Nodes will try to fan out and distance themselves from neighbours.This may improve the aesthetic of the tree but will modify branch length scaling.Branch lengths are NOT to scale when this is used.The dynamic positioning can applied only to selected nodes if the "Selected Only" option is checked.
• Static: Tree layout is calculated when the tree is initially created and remains static.Relative branch length scaling (as specified in the original tree data) will be maintained if "Real Branch Length" option is checked.

Context menu
Provides quick links to contextual menus, which are usually accessed by right click; this is for devices that do not have an easy right-click option such as tablets and mobile devices.
• GrapeTree: Presents the same menu as when right-clicking on the tree itself.
• Metadata: Presents the same menu as when right-clicking on the metadata table.
• Figure Legend: Presents the same menu as when right-clicking on the Legend.

Metadata window
Provides a table that showing loaded metadata.
• Download: Export the metadata as a tab delimited file.
• Add Metadata: Click this to add a new column, specify the field name in the column.
• Filter: Shows filtering text boxes below each column header, when checked.
• Hypo Nodes?: Shows hypothetical nodes in the metadata table, when checked.

About GrapeTree
GrapeTree is named after the clusters of related bacterial strains that tends to be presented in minimal spanning trees.Our GrapeTree GUI is available within EnteroBase once you have created a workspace or connected to somebody else's workspace.fire It is also available here as a stand-alone version.The integrated EnteroBase version interacts directly with EnteroBase data.The stand alone version calculates trees from character data, visualizes pre-calculated trees and annotates them with information from supplied metadata.
(a) Show sub strains box should be checked (b) Search terms should be Bio Project ID (Field) contains (Operator) PRJEB1944 (Value) 1.From the Experimental Data dropdown on the right select cgMLST 2. Click the Create MSTree button, once the cgMLST data has loaded.3. Leave the Algorithm option as MSTreeV2 and give the tree a title and click Submit.Make sure your browser allows pop-ups for EnteroBase!You should now have a new window open showing a tree similar to the one below.

4 .
Move the Tree by click & hold on any of the whitespace around the tree, and then drag.

1. 2 . 3 . 4 .
If you get lost click Centre Tree under Tree Layout.Click Tree Layout to open or close the Layout panel.If you've messed up the tree Click Static Redraw under Tree Layout to reset the layout.Zoom in/out using the mousewheel or the Zoom buttons under Tree Layout.Click Tree Layout to open up the Layout panel and then click the magnifying glass (+) to zoom in or magnifying glass (-) to zoom out.Move the Tree by click & hold on any of the whitespace around the tree, and then drag.

•
Font Size: Choose font size of node labels.Use the slider to change the value, or enter a specific value into the box • Node Size: Increase/Decrease size of all nodes.Click rewind icon to revert to default value.Use the slider to change the value, or enter a specific value into the box • Kurtosis: Increase/Decrease relative size of all nodes.Nodes with large number of members will look more distinct.Click rewind icon to revert to default value.Use the slider to change the value, or enter a specific value into the box • Show Pie Chart: Shows breakdown of members contained within a node, categorized on "Colour by" setting Branch Style • Show Labels: Check to show node labels • Font Size: Choose font size of node labels.Use the slider to change the value, or enter a specific value into the box.
Try these out! 1.If you get lost click Centre Tree under Tree Layout.Click Tree Layout to open or close the Layout panel.2. If you've messed up the tree Click Static Redraw under Tree Layout to reset the layout.