MICROBE FORECASTING
It was a large city. And it was hit hard. The first cases emerged in late August, and
the victims suffered terribly. The earliest symptoms were profuse diarrhea and
vomiting. They experienced severe dehydration, increased heart rate, muscle
cramps, restlessness, severe thirst, and the loss of skin elasticity. Some of the
cases progressed to kidney failure, while others led to coma or shock. Many of
those who came down with the disease died.
Then on the night of August 31, the outbreak truly broke. Over the next three
days, 127 people in a single neighborhood died. And by September 10 the number
of fatalities would reach 500. The epidemic seemed to spare no one. Children and
adults alike were killed. Few families did not have at least one member who came
down with the disease.
The epidemic led to intense panic. Within a week, three-quarters of the
neighborhood’s residents fled. Stores closed. Homes were locked. And you could
walk down a formerly bustling urban street without seeing a single person.
Early in the outbreak, a forty-year-old epidemiologist began an investigation to
determine its source. He consulted community leaders and methodically inter-
viewed families of the victims and made careful maps of every single case. Fol-
lowing his hunch about a waterborne disease, he studied the sources of the
community’s water and determined that it came from only one of two urban water
utilities. He conducted microscopic and chemical analyses of specimens from the
water system, which proved inconclusive.
In his report to the responsible officials, he presented his analysis and con-
cluded that contaminated water was to blame. Despite the lack of definitive results
from the analyses, the mapping of cases strongly supported his conclusion that
one particular water outlet was the source of the outbreak. He recommended shut-
ting down the water supply, and the officials agreed. And while the outbreak may
have already been in decline because of the mass exodus, that investigation and
water closure proved pivotal.
What was unusual about this outbreak was not the procedural investigation that
followed. Modern epidemiologists in countries throughout the world conduct ex-
actly this kind of investigation regularly. They enlist the help of local leaders, study
the distribution of cases, conduct analyses on potential sources, and then often
argue with officials as to the best course of action. What was unusual was that the
outbreak was in 1854—before the field of epidemiology existed.
As you may have guessed, the investigator responsible for cracking the outbreak
was none other than John Snow, the now famous London physician and clergyman
considered one of the founders of contemporary epidemiology. The culprit was, of
course, the bacteria Vibrio cholerae, or cholera. By finding that water was the source
rather than “foul air,” Snow contributed to the modern germ theory of infectious
diseases—that communicable diseases are caused by microbes. To this day, you
can see a replica of the famous Broad Street pump that Snow identified as the
source of the 1854 outbreak, in Soho, London.
It seems intuitive to us today, but the way that Snow used interviews, case iden-
tification, and mapping to chart the origin of the Broad Street cholera outbreak of
1854 was revolutionary in its time. While maps had certainly been used extensively
prior to 1854, the map he made of Soho is considered the first of its kind, not only
in epidemiology but also in cartography. He was the first to utilize maps to analyze
geographically related events to make a conclusion about causality—namely, that
the Broad Street pump was the source of the outbreak. By doing so he has been
credited with using the first geographic information system, or GIS, a now com-
monly used cartographic system for capturing and analyzing geographic infor-
mation.
In contemporary GIS, layers of information are added to maps like Snow’s to
provide depth of geographic information and to suggest patterns of causality.
While Snow’s map included streets, homes, locations of illness and water sources,
a contemporary version could include many more layers—genetic information
from cholera specimens collected in different locations, dimensions of time that
track changes spatially with an added weather layer or social connections between
the individuals in the various homes.
Modern GIS is among a range of contemporary tools that is radically changing
the way that we investigate outbreaks and understand the transmission of diseases.
When used in a coordinated and comprehensive way, these tools have the poten-
tial to fundamentally change the way that we monitor for outbreaks and stop them
in their tracks.
We now have multiple scientific and technical advantages that Snow lacked in
the mid-nineteenth century. Among the most profound is that we have significantly
improved our capacity to catch the bugs we’re chasing and to document their
diversity. The revolution in molecular biology, in particular the techniques for
capturing and sequencing genetic information, has profoundly changed our ability
to identify the microbes that surround us.
The map of London used by John Snow to find the source of the cholera outbreak.
Miraculous but now standard techniques like the polymerase chain reaction
(PCR), which resulted in the Nobel Prize for its discoverer Kary Mullis, allow us to
snip out tiny pieces of genetic information from microbes and create billions of
identical copies, whose sequences can then be read and sorted out according to
the family of microbes to which they belong. Yet standard PCR requires that you
know what you’re looking for. If, for example, we want to find an unknown malaria
parasite, we can use PCR designed to identify malaria-specific sequence, since all
malaria parasites have genetic regions that look similar enough to each other. But
what if we don’t know what we’re looking for?
In the early 2000s, intent on finding unknown microbes, a bright young molec-
ular biologist, Joe DeRisi, and his colleagues adapted an interesting technique
developed by DeRisi’s doctoral adviser, Pat Brown, a Stanford biochemist. The
DNA microarray chip consisted of thousands of tiny bits of distinct artificial genetic
sequence distributed in an orderly fashion across a small glass slide. Since genetic
information sticks to its mirror image sequence, if you flush solution from a
specimen containing genetic information across a slide like this, the bits that
match the designed sequences on the slide will fuse. You can then determine what
was in the specimen by determining which of the sequences on the slide trapped
their natural siblings. The technique had already provided thousands of scientists
with a new way of characterizing the bits of genetic information that flow through
living systems by the time DeRisi got his hands on it.
Prior to DeRisi’s innovation, the microarray chips had been used primarily to
help determine the internal workings of the genes of humans and animals, but De-
Risi and his colleagues realized that the technique could be modified to create a
powerful viral detection system. Instead of designing the chips with bits of artificial
human genetic information, he and his colleagues designed chips with bits of viral
genetic information. By carefully reviewing the scientific databases for genetic
information on all of the viruses known to science, they crafted chips that had bits
of genetic information from a whole range of viral families lined up in neat rows. If
they introduced genetic information from a sick patient, and it contained a virus
with a sequence similar to one on the chip, the sequence would be trapped and—
bingo!—we’d know the bug we were dealing with.
The viral microarray, as these specialized chips became known, have proliferated
and spread to labs throughout the world. They’ve helped quickly identify the micro-
bial villain responsible for new pandemics, like the coronavirus that causes SARS.
Yet they are not perfect. These chips can only be made to capture viruses from
families of viruses already known to science. If there are groups of viruses out
there whose sequences we are completely unaware of, and there certainly are, then
we have nothing with which to engineer the chips. Truly unknown viruses would
slide right by.
Within the past few years, viral microarrays have been supplemented with a series
of bold new genetic sequencing approaches. New machines churn out mammoth
amounts of sequence data from specimens—amounts of sequence that previously
would have been prohibitively expensive or time consuming. These machines are
permitting an entirely new form of viral discovery.
Rather than look for particular bits of information, the approach is to take a
specimen—say a drop of blood—and sequence every bit of genetic information it
contains. Technically, it’s more complicated than that, but the result is similar to
what you’d expect. We are approaching a moment when we will be able to read
every single sequence within a given biological specimen. Every bit of DNA or RNA
from the host specimen, and critically, every bit from the microbes that are riding
along with them.
One of the central problems becomes the bioinformatics—how to sort through
all of the billions of bits of information that are produced by these incredible tech-
nologies. Fortunately, in an enlightened move, scientists at the NIH picked up and
nurtured an electronic repository of sequencing information developed at the
famed Los Alamos National Laboratory and now called GenBank. Since scientists
are required by funding sources and journals to submit sequences to GenBank
prior to submitting academic papers, we collectively contribute billions of bits of
genetic information each year. GenBank right now holds over a hundred billion bits
of sequence information. And it’s growing rapidly. When a new sequence is iden-
tified from a sequencing run, it can be rapidly compared electronically to what’s in
GenBank to see if there’s a match.
In late 2006 and early 2007 these techniques were used to good effect. In early
December 2006 the organs of a patient who had died of a brain hemorrhage in
Dandenong hospital in Australia were harvested for transplantation. A sixty-three-
year-old grandmother received one of the kidneys, another unnamed recipient re-
ceived the other kidney, and a sixty-four-year-old lecturer in a local university re-
ceived the man’s liver. By early January all three had died.
The local hospital and collaborating labs looked for all of the usual suspects.
They utilized PCR and tried to grow up the microbe on culture media. They even
tried one of the viral microarrays, to no avail. A virus was only found when the
specimen was subjected to massive sequencing. The team that found it, led by Ian
Lipkin, a world-class laboratory virologist at Columbia University, had to sort
through over a hundred thousand sequences to find the fourteen sequences be-
longing to the mystery virus. Truly a needle in a haystack! The mystery virus ended
up being in a group of viruses called arenaviruses that often live in rodents. With-
out massive sequencing, the virus would not likely have been found.
But while identifying what’s actually in a small new outbreak is vital, it’s only the
beginning. As we get better and better at understanding what’s out there, we will
have to start asking a tougher question: where is it going? Will it become a pan-
demic?
There are three primary objectives to the emerging science of pandemic preven-
tion:
1. We need to identify epidemics early.
2. We need to assess the probability that they will grow into pandemics.
3. We need to stop the deadly ones before they grow into pandemics.
The viral microarray and sequencing techniques give us a snapshot of what is
causing an epidemic, but more is needed to assess the possibility that a new agent
in a limited outbreak has the right stuff to go pandemic. This is exactly the objec-
tive of a new program being developed by DARPA, the U.S. Department of De-
fense’s Advanced Research Projects Agency. DARPA has had a stunning impact on
the contemporary world of technology, including sponsoring early research that
has contributed in substantive ways to the development of modern computing, vir-
tual reality, and the Internet itself.
DARPA is developing a program called Prophecy, whose objective is to “suc-
cessfully predict the natural evolution of any virus.” Prophecy seeks nothing less
than to use technology to predict where an outbreak will go by combining it with
the support of a team of local on-the-ground experts in hotspots around the world.
Predicting the future trajectory of a virus seems like science fiction, but DARPA
does not shy away from high-risk/high-payoff ideas, and Prophecy falls clearly in
that mold. Fortunately, what we know about pandemics and the technologies avail-
able today bring the objectives it seeks within the realm of possibility.
Cutting-edge experimental virologists like Raul Andino, at the University of Cali-
fornia, San Francisco, works to determine rational predictions of the evolution of
viruses. Viruses reproduce rapidly, so any viral infection, even if it’s the result of
infection with a single viral particle, will rapidly develop into a swarm,¹ a group of
viruses, some identical, but mostly mutants differing in one way or another from
the parental strain that created them. By documenting and studying the way that
the overall viral swarms respond to different environments, Andino and his col-
leagues have worked to develop rational strategies for the production of vaccines
that use live viruses, a subject we will return to in chapter 11. He also hopes to use
the same information to determine the boundaries within which a swarm can
evolve. Swarms can’t go in every direction, and getting a sense of what a swarm is
composed of will help us get a sense of what it can evolve into.
Another scientist working to change the ways we can forecast microbial evolu-
tion is not a microbiologist at all but rather a physics-trained bioengineer. Steve
Quake, an awardee of the same NIH Pioneer Program that has funded my own re-
search, develops technology that permits us to study and manipulate life in sur-
prising and incredibly useful ways. In the past ten years this jeans-wearing ski bum
has spun off multiple companies, developed handfuls of patents, and published
scores of papers in some of the highest-ranking journals—all while maintaining a
successful teaching program at Stanford University. Among the useful innovations
coming from Quake’s group are microfluidic platforms. Essentially, he’s produced
entire laboratories on small laboratory chips.
In one particularly notable application, he’s taken the tedious and complex work
of cell culture, where cells from mammals and other organisms are grown under
laboratory conditions, from the bench to the chip. The chips he and his team have
created, just a few centimeters long, house ninety-six separate compartments
where cells grow for weeks at a time and can be carefully measured and manip-
ulated. While there are many applications for having cell culture on an automated
and compact chip, one of them is the speed and efficiency for evaluating new
viruses from large numbers of specimens. It’s not difficult to imagine a chip-based
system that quickly tells us in what kind of cells a new agent can survive and there-
fore how it’s most likely to spread (e.g., by sex, blood, sneezes, and so on).
When we see an outbreak, there are a number of questions we’d like to have an-
swered. First, what’s the microbe behind it? Techniques like viral microarrays and
high throughput sequencing are increasing the speed at which we can identify new
agents and also helping us to find things that we’d have missed through older
techniques. But once we’ve identified a microbe, we want to know where it’s going.
We’ll return in chapter 12 to a vision of what the ultimate pandemic prevention sys-
tem will look like, but it would certainly involve approaches like those developed by
the Andino lab to assess the potential evolutionary directions that a virus can take.
And the tools that Quake’s group has developed might one day form a set of
high-speed chips that quickly evaluate how it’s likely to spread.
Modern information and communication technology provides us with another set
of tools that does something distinct and complementary to the biotech advances
discussed above. In fact, some of this technology is sitting in your pocket as you
read this sentence.
In one of our research sites in southwest Cameroon sits a rubber plantation
called Hevecam, where we conducted an experiment. This experiment represents
one of the exciting new trends in public health. And it’s all based on simple cell
phones.
In Hevecam, a plantation with nearly a hundred thousand inhabitants, when
individuals get sick they go to a local clinic. If they’re sufficiently ill, they then move
from that local clinic to the referral hospital in the center of the plantation. Yet
traditionally there has been no good way for the referral hospital to monitor what’s
happening in the local clinics. A few years ago Lucky Gunasekara, who now heads
up our program on digital epidemiology, and his partners at the nonprofit Frontli-
neSMS:Medic that he co-founded, set up a simple system based on text messages
to allow the referral hospital to monitor what was occurring in the local clinics. By
simply texting a series of preset codes, the vast majority of vital clinical information
could be communicated up the medical hierarchy clearly, instantly, and efficiently.
Using predetermined codes and simple text message forms, the local clinics could
rapidly inform everyone else of how many cases of malaria, diarrhea, and other ill-
nesses they were seeing.
Simple technologies can have dramatic impact. With a few simple techniques,
medical conditions at Hevecam could be monitored not only in the referral hos-
pital but also remotely over a web dashboard for anyone with appropriate access.
By allowing local clinicians or patients themselves the capacity to communicate,
information can be accumulated, organized, and analyzed, leading to a much more
rapid and localized sense of what’s going on during a health emergency.
Something just like this occurred during the earthquake in Haiti in 2010. Im-
mediately after the earthquake, organizations like Ushahidi² set up short, free
codes to which people could text “help” messages. They then turned to the local
DJs who, along with popular word of mouth, publicized the numbers. Amazingly,
when the dust cleared, the statistical analysis of the text message distributions
mapped accurately onto high-resolution aerial imagery of damage. Effectively, peo-
ple’s text messages gave highly informative clues as to where the greatest damage
occurred. More importantly for those in Haiti, the messages saved lives, with the
critical information transmitted to the heroic rescue workers on the scene.
Similar systems have been used during outbreaks, such as the cholera outbreak
in Haiti in the fall of 2010. The ultimate hope is that outbreak detection can be
crowdsourced, with small bits of information provided by sufferers that converges
into a real-time picture of the beginnings of outbreaks and their subsequent
spread. The short codes are only the start. As more and more countries adopt elec-
tronic medical records, people around the world will increasingly link to them
including the critical week of February 3, 2008, when a 5.9 magnitude earthquake
occurred in the Lake Kivu region. By establishing a baseline for the frequency of
calls, Eagle and his team were able to see telltale clues of unusual calling patterns
during the period immediately following the earthquake. They were able to detect
the time of the quake through a peak in call numbers. They were also able to estab-
lish the epicenter of the quake by using location data from cell towers, placing the
epicenter central to the locations of the heaviest call volumes.
The idea that using data derived from cell phones can detect an earthquake in
space and time is amazing. It also suggests a range of different applications. Indi-
viduals who are ill may have fundamentally different call patterns than those that
are not, and call patterns may also alter as a new outbreak spreads. Analyses of call
data records alone might not provide perfect early detection of a new outbreak, but
combined with other sources of outbreak data from organizations like ours and
other health institutions, it might help us chart early epidemic spread.
Cell phones are growing more ubiquitous by the day and will likely be critical tools
in helping to detect and respond quickly to outbreaks before they become pan-
demics. Yet they are not the only technology-heavy solutions being used in the
growing field of digital surveillance. In 2009 my colleagues at Google³ published a
fascinating paper showing that individuals’ online search patterns also provide a
sense of what people are becoming infected with.
With the vast stores of search data kept by Google and US influenza surveil-
lance data collected by the CDC, the team was able to calibrate their system to
determine the key search words that sick people or their caregivers used to indicate
the presence of illness. The team used searches on words related to influenza and
its symptoms and remedies to establish a system that accurately tracked the in-
fluenza statistics generated by the CDC. In fact, they did better. Since Google
search data is available immediately, and CDC influenza surveillance data lags
because of time needed for reporting and posting, Google was able to beat the
CDC in providing accurate influenza trends before the traditional surveillance sys-
tem.
Early data on seasonal influenza, as provided by the Google Flu Trends system,
is interesting and potentially important. This early data provides health organi-
zations time to order medications and prepare for different triage needs. But early
detection of seasonal influenza is not the Holy Grail. That honor would go to a sys-
tem that could detect a newly emerging pandemic. Google is now working to ex-
tend its influenza findings to other kinds of diseases. As more and more people
use search engines like Google, and more and more data is acquired, the hope is
that better and better trend analyses will be developed for agents other than in-
fluenza. Perhaps at some point a community experiencing the beginning of a pan-
demic will signal its arrival just by Googling.
The explosion of online social media provides another set of big data in which
weak but potentially valuable signals of a coming plague may be found. Computer
scientists, like Vasileios Lampos and Nello Cristianini from the University of Bris-
tol, have taken a similar approach as the scientists at Google, sorting through hun-
dreds of millions of Twitter messages. Like their colleagues at Google, Lampos and
Cristianini used key words to watch trends in Twitter and find associations with in-
fluenza statistics, in this case provided by the UK’s Health Protection Agency.
In 2009 they tracked the frequency of tweets related to influenza during the
H1N1 pandemic and found they were able to track the official health data with 97
percent accuracy. As with the findings by the Google Flu Trends team, this work
provides a rapid and potentially inexpensive way to supplement traditional epi-
demiological data gathering. It also has the potential to be extended to more than
just influenza.
While online social media can be scanned to see what people are communi-
cating about, online social networking may provide a richer and subtler range of
possible uses. In fascinating recent work, two leading social scientists, Nicholas
Christakis and James Fowler, have studied how social networks can inform surveil-
lance for infectious diseases.
In a clever experiment, these two scientists followed Harvard students who were
divided into two groups. The first group was randomly selected from the Harvard
student population. The second group was chosen from individuals that the first
group named as friends. Because individuals near the center of a social network
are likely to be infected sooner than those on the periphery, Christakis and Fowler
hypothesized that during an outbreak the friend group would become infected
sooner than the random, and therefore on average less socially central, group. The
results were dramatic. During an influenza outbreak in 2009, the friend group be-
came infected on average fourteen days ahead of the randomly chosen group.
The hope is that social science can identify novel kinds of sentinels to monitor
for new outbreaks and catch them early.⁴ Determining friends would be time con-
suming, however—something we could accomplish on a single college campus
but perhaps not nationally. Now self-identified friends in massive online social net-
works may make this task much easier. Online social networks like Facebook,
while not designed to help monitor for outbreaks, have created relatively easy-to-
mon-itor systems that can be mined to determine the frequency of illness, identify
social sentinels, and perhaps eventually provide predictions for spread of a new
agent within a community.
When John Snow created the first Geographic Information System in 1854, he took
actions that would seem very logical and straightforward to us today. He took a
map, he plotted where sick people were, and he plotted possible sources of conta-
gion. Snow could not have predicted the directions in which his first tentative step
would lead or the data that would eventually become available for today’s GIS.
In the end it may be that no single data source reigns supreme. If Snow were
alive today and investigating an outbreak, he’d want it all. He’d want to know where
the sick people were, and he’d be glad to get the data more quickly and easily
through text messages or Internet searches. He’d like to know exactly what cases
were infected with, down to the very specific microbial genetic strain. He’d seek to
use call data records to monitor people’s movements in order to track the move-
ment of the disease or where it was seeded. He’d like to know how people were
connected socially, and he’d certainly follow individuals who were likely to become
infected first or show signs earlier than the rest.
You can imagine the ultimate outbreak GIS, or in terms more familiar to Silicon
Valley, what Lucky Gunasekara, the head of my data team, calls the ultimate out-
break mash-up: a map with layer after layer of critical information—where people
are, what they’re concerned about, what they’re infected with, where they’re mov-
ing, and who they’re connected to. Developing and maintaining this combined dig-
ital and biological mash-up is the precise objective of Lucky’s team and something
to which we’ll return in the final chapter of this book. Ideally, over time the data can
be analyzed jointly, the various factors can be trained on actual outbreaks, and all
the technology can be weighted optimally to maximize predictive power.
When people ask me whether or not I’m optimistic about the future of predicting
pandemics, the answer is always a resounding yes. Given the first two-thirds of this
book, you may wonder if my optimism is warranted. A steady wave of intercon-
nectedness among humans and animals has created a perfect storm for new pan-
demics. That is true. Yet the interconnectedness among humans that now exists
through communication and information technology gives us unprecedented
capacity to catch outbreaks early, which, when combined with amazing advances in
our ability to study the diversity of the tiny life forms that cause epidemics, cer-
tainly makes optimism warranted.
What will win out in the end? Will pandemics sweep through the human popu-
lation destroying millions of lives? Will technology and science ride in to the res-
cue?