Novel pathogens have the potential to become critical issues of
national security, public health and economic welfare. As demonstrated
by the response to Severe Acute Respiratory Syndrome (SARS) and
influenza, genomic sequencing has become an important method for
diagnosing agents of infectious disease. Despite the value of genomic
sequences in characterizing novel pathogens, raw data on their own do
not provide the information needed by public health officials and
researchers. One must integrate knowledge of the genomes of pathogens
with host biology and geography to understand the etiology of
epidemics. To these ends, we have created an application called
Supramap (http://supramap.osu.edu) to put information on the spread of
pathogens and key mutations across time, space and various hosts into
a geographic information system (GIS). To build this application, we
created a web service for integrated sequence alignment and
phylogenetic analysis as well as methods to describe the tree,
mutations, and host shifts in Keyhole Markup Language (KML). We apply
the application to 239 sequences of the polymerase basic 2 (PB2) gene
of recent isolates of avian influenza (H5N1). We map a mutation,
glutamic acid to lysine at position 627 in the PB2 protein (E627K), in
H5N1 influenza that allows for increased replication of the virus in
mammals. We use a statistical test to support the hypothesis of a
correlation of E627K mutations with avian-mammalian host shifts but
reject the hypothesis that lineages with E627K are moving westward.
Data, instructions for use, and visualizations are included as
supplemental materials at: http://supramap.osu.edu/sm/supramap/publications.
http://www3.interscience.wiley.com/journal/123345005/abstract