Gene Neighbourhood Analysis Tool

About

GNAT (Gene Neighbourhood Analysis Tool) is a generalizable tool used to visualize the genomic neighbourhoods of protein(s) of interest and any homologous proteins found within a specified database. Either a protein FASTA file containing the protein(s) of interest or a PSI-BLAST .out file generated by GNAT can be used as input (for more information, see the help page). This tool is particularly useful for exploring the genomic neighbourhoods of genes that tend to cluster, such as those in bacterial operons. Thus, the database on the server contains Staphylococcus aureus sequences from NCBI's ftp site.

Upon submitting a sequence to GNAT, the user will be redirected to a page with their job ID, which can be used to access the results when they are ready. GNAT uses the submitted FASTA sequence(s) as PSI-BLAST queries against the Staphylococcus aureus database. The matches found are then used to generate genomic neighbourhoods consisting of the (default=20) genes upstream and downstream of the match proteins. These genomic neighbourhoods are then filtered based on match percent identity (default=30%) and size (default=20%) versus the query protein. The filtered genomic neighbourhoods are then clustered into groups of approximately 20 and visualized using a modified version of clinker to generate interactive plots with the matches labelled. Phylogenetic trees and pie charts are also generated at every step of the pipeline to show the distribution of the matches across the database.